Friday, February 29, 2008

Server DOWN!!!

The server is a dedicated machine hosting one of Singapore's most happening website for tertiary students, with lots of interesting undergrad students being interviewed on a regular basis, and a large/active community in the forums... (hint: f*n*ygrad.com, and no, it's not a dirty four letter word and I shouldn't reveal it here).

Initially, the site experienced a problem with php connecting to the database. The error was that php cannot connect to the database because the file XXX.MYD cannot be located. I told my friend (who owns the hosting infrastructure) that if the problem is serious enough, the site owner will call him directly. That was yesterday afternoon. And nothing much happened. They only traded a couple of emails. I provided some advice, offhandedly, that most probably it is a corrupted database (as the MYD file contains the data table indices which I found out after some googling).

Well, nothing happened until evening. This morning, my friend told me that he was called and SMS at 12am, 1am, 2am... well, you know the drill, the owner is really jumping and reality has finally set in for him (the first few hours are usually denial, then requests for rebooting the machine, then testing, usually by vigorously pressing the "refresh" button on the browser, as if that will solve all the world's problems).

The web server is down.

Ditto to the email server.

Next, came the threats to remove the server and host it elsewhere. Because if the server is hosted with you, then its your duty to ensure that its up and running, despite the fact that we do not have any access to it (no passwords).

A blunt analogy is this: if you really have cancer, no matter how many doctors you go to, you still have it, changing doctors do not solve the problem.

After offering him my blunt opinion, I was told that I need to chill down and we are not pushing blame... What the ****???!!!

And so... the negotiation begins, to provide emergency server rescue, what cost? can we guarantee 100% recovery (if there's no backup, how can I guarantee all the data will be back?)? "confirm don't have the root password", so we have to reset it for him too.

Wow... a totally unmaintained server can be online for so long, too! I set the server up for the owner a couple of years back and left it to him as he wasn't interested in managed services. I can't believe it can survive so long in today's world! The server hardening I put it through was helpful, after all... :P hahaha!!!

So next, the final nego... and going down for a site visit and recovery effort estimation... meanwhile, I'm preparing a couple of LiveCDs (Knoppix, Ubuntu), installers (CentOS 5.1), System Rescue CD 0.4.3, and mentally prepping up for the tasks ahead. Hmm.. what else do I need to bring along? Once I'm there, it's in the middle of nowhere, not much chances to come out and buy anything I missed.

Friday night seems to be burnt for small change... I don't mind if someone else can do it for cheap actually, I need my rest... :|

*zzz*

No comments: