Friday, February 29, 2008

Server DOWN!!!

The server is a dedicated machine hosting one of Singapore's most happening website for tertiary students, with lots of interesting undergrad students being interviewed on a regular basis, and a large/active community in the forums... (hint: f*n*ygrad.com, and no, it's not a dirty four letter word and I shouldn't reveal it here).

Initially, the site experienced a problem with php connecting to the database. The error was that php cannot connect to the database because the file XXX.MYD cannot be located. I told my friend (who owns the hosting infrastructure) that if the problem is serious enough, the site owner will call him directly. That was yesterday afternoon. And nothing much happened. They only traded a couple of emails. I provided some advice, offhandedly, that most probably it is a corrupted database (as the MYD file contains the data table indices which I found out after some googling).

Well, nothing happened until evening. This morning, my friend told me that he was called and SMS at 12am, 1am, 2am... well, you know the drill, the owner is really jumping and reality has finally set in for him (the first few hours are usually denial, then requests for rebooting the machine, then testing, usually by vigorously pressing the "refresh" button on the browser, as if that will solve all the world's problems).

The web server is down.

Ditto to the email server.

Next, came the threats to remove the server and host it elsewhere. Because if the server is hosted with you, then its your duty to ensure that its up and running, despite the fact that we do not have any access to it (no passwords).

A blunt analogy is this: if you really have cancer, no matter how many doctors you go to, you still have it, changing doctors do not solve the problem.

After offering him my blunt opinion, I was told that I need to chill down and we are not pushing blame... What the ****???!!!

And so... the negotiation begins, to provide emergency server rescue, what cost? can we guarantee 100% recovery (if there's no backup, how can I guarantee all the data will be back?)? "confirm don't have the root password", so we have to reset it for him too.

Wow... a totally unmaintained server can be online for so long, too! I set the server up for the owner a couple of years back and left it to him as he wasn't interested in managed services. I can't believe it can survive so long in today's world! The server hardening I put it through was helpful, after all... :P hahaha!!!

So next, the final nego... and going down for a site visit and recovery effort estimation... meanwhile, I'm preparing a couple of LiveCDs (Knoppix, Ubuntu), installers (CentOS 5.1), System Rescue CD 0.4.3, and mentally prepping up for the tasks ahead. Hmm.. what else do I need to bring along? Once I'm there, it's in the middle of nowhere, not much chances to come out and buy anything I missed.

Friday night seems to be burnt for small change... I don't mind if someone else can do it for cheap actually, I need my rest... :|

*zzz*

Friday, February 15, 2008

World Class infrastructure for a World Class Event?

No, it wasn't to be so...

This is the headline from Straits Times article

Website booted him out three times
British Airways pilot Benterman takes 10 hours to get tickets for F1's first night race

To make it a double whammy, the permanent resident found, to his horror, that the seats he had reserved were lost when he was booted out of the website.

'It was absolutely frustrating and a disgrace,'' said the exasperated 39-year-old.

'I cannot accept not getting through to the website because it crashed. There is also no customer service number to call.''


Apparently, the website was supposed to be capable of handling 20,000 transactions per hour and the actual traffic apparently was way over what was expected.

Questions:
  1. Did anyone take the last F1 race's figures for a comparison and benchmarking?
  2. Was there a big change in the way the tickets are sold?
  3. Did the system incorporate proper transactions handling, queueing and all that?
  4. Was the system properly load tested before going live?
  5. Was someone even monitoring the system after it went live? Why was no action taken? (cf. MRT down, buses were deployed)
  6. Was there a contingency plan in place? (obviously not)
  7. Could the launch have been scheduled in phases? (online sales first, then outlet sales?)
  8. Were corners being cut in the system hardware so someone could save a few bucks? Or was the organize scammed by vendors who gave 3rd class hardware for 1st class prices? (which is normal)
According to the news on the radio this morning, a hardware upgrade should be sufficient to solve the problem. Shame on the SI (system integrator) and/or hardware vendors who supplied the "solution".

So, we'll see... :)

Monday, February 11, 2008

Understanding a geek :P

Found a very interesting blog entry (by the author of a book called "The Nerd Handbook"):
http://www.randsinrepose.com/archives/2007/11/11/the_nerd_handbook.html

In summary, these are the traits:
Understand your nerd’s relation to the computer.
Your nerd has control issues.
Your nerd has built himself a cave.
Your nerd loves toys and puzzles.
Nerds are f**king funny.
Your nerd has an amazing appetite for information.
Your nerd has built an annoyingly efficient relevancy engine in his head.
Your nerd might come off as not liking people.
The best part is this paragraph:

Understand your nerd’s relation to the computer. It’s clichéd, but a nerd is defined by his computer, and you need to understand why.

First, a majority of the folks on the planet either have no idea how a computer works or they look at it and think “it’s magic”. Nerds know how a computer works. They intimately know how a computer works. When you ask a nerd, “When I click this, it takes awhile for the thing to show up. Do you know what’s wrong?” they know what’s wrong. A nerd has a mental model of the hardware and the software in his head. While the rest of the world sees magic, your nerd knows how the magic works, he knows the magic is a long series of ones and zeros moving across your screen with impressive speed, and he knows how to make those bits move faster.

The nerd has based his career, maybe his life, on the computer, and as we’ll see, this intimate relationship has altered his view of the world. He sees the world as a system which, given enough time and effort, is completely knowable. This is a fragile illusion that your nerd has adopted, but it’s a pleasant one that gets your nerd through the day. When the illusion is broken, you are going to discover that…


Actually, I would prefer the word geek to nerd :P ... Nerd sounds too... nerdy :P


Read the blog to find out more! :D

Enjoy!




Can? How much? How fast?

When doing freelance IT projects, some questions from prospective local customers would sometimes be like this:

"Hi, can you do a SQL/web/intranet/(fill in with appropriate IT word) program?"
"How much har?"
"When can finish?"

This is very common and I am always very cautious when dealing with such customers because:
  1. They typically (>90% of the time) do not know what is the effort involved (most likely they learn of this requirement from an in-house IT guru, who might or might not have any experience in IT)
  2. They also do not know what they really want (they are just relaying someone else's words)
  3. They only look at the cheapest quote
Most of the time, I would quote them what I feel is reasonable (of course!) base on a lot of assumptions (the more assumptions and buffer, the more costly it is) as they can't provide me with a reasonable basis to work out a quotation. This is really a case of "you get what you pay for".

For customers whose only concern is cost, I would happily give them a miss as they are the "don't care, don't know, don't bother me unless it is delivered and working" type. These companies are the type that hobbles along on a barely working/functional and often broken IT infrastructure, going from vendor to vendor/supplier whenever the system is down, because they only look at the cost.

Vendor after vendor apply various patches, workaround and upgrades to the original system until it is barely recognizable, and maintainable. Most, if not all, of the time such companies do not have a documentation of the system and the database, resulting in tremendous efforts in tracing through the system and trying to figure out what it is supposed to be doing. And yes, most of the time this work is being performed on production systems too.

Despite the claims by the press, internet and local authorities, the majority of SME owners are rather IT illiterate and clueless (this is base on my limited experience). Most of the time, cost is the only concern. Some of them are happy with a halfway broken system because of the "I know it's broken but I have a staff doing it, fixing it will cost a lot of money lehhh..." way of life. They will devote a staff or two, or even three to perform some of the functions that the system should be performing alone if it was not broken.

Even worse are those who are semi IT-literate, certified as literate after attending a 3 day course in IT conducted by instructors who have barely have any experience in real life IT operations. They are adamant that their "IT Way" is correct and you should not attempt to advise them because they know better.

Let me provide an analogy. Suppose you need to buy shoes, you can either chose to buy a cheap one, or a slightly more expensive but durable one. So, would you rather buy a $30 pair of shoes that spoils every 3 months, or a $150 one that can last you at least a year or more? Some companies would avoid paying the $150 like the plague because the perceived cost is "high". That's sad, limited, and yet very real.

Therefore, in order to secure the job and to fix the problems properly, a lot of communication and persuasion is necessary. The customers must be convinced of the value of the solution, and invest his/her own time into it to help shape the final solution. IT is central to a lot of their operations and can be made to provide more assistance to their business, but yet is given very little priority and investment.

Having said that, there are a lot of moonlighters out there who over promise and under deliver, causing this vicious cycle to continue. I have personally seen some local e-commerce sites with extremely poor exception handling in their purchase and payment code. Yes, they have the usual certificates, logos and seals, but the certification process does not include the testing or validation of the source code itself.

Shop on local websites? Err... maybe not yet... let someone else be the guinea pig for these eBay wannabes :)

Maybe I will get a chance to help fix these borken sites once they get complained :P haha!

Tuesday, February 5, 2008

Torvalds on Microsoft's patent bluff

I have argued a few times with my friend (who is an MVP for Microsoft) over the position/perception of Microsoft, is it a monopoly? a big friendly giant? or what? He is more exposed to the technical folks who (mainly) talk about technologies and so on, and he has failed to see the legal, and business side of the games Microsoft plays.

I told him that yes, there may be nice, friendly peeps there at MS, but there are also people who has nothing to do but spread FUD (Fear, Uncertainty and Doubt) and people who play games on both sides.

In this article published by Linuxworld, Torvalds remarked that
"I think there are people inside Microsoft who really want to improve interoperability and I also think there are people inside Microsoft who would much rather just try to stab their competition in the back," he said. "I think the latter class of people have usually been the one[s] who won out in the end, but -- so I wouldn't exactly trust them."

Me neither. Keep your friends close, but your enemies closer :P

NYT article on the cut cables

NYT has an article on the borken internet cables off the coast of Egypt:

Telecommunications operators have been trying to diversify the routes used for transmissions, said Alan Mauldin, research director with TeleGeography Research, particularly since an earthquake in Taiwan in 2006 disrupted service in Asia.

The cable network contains “choke points” — like those off the coast of Egypt and Singapore where many cables run, Mr. Mauldin said.

I... uh... guess the seas here are safer? :D

Monday, February 4, 2008

a Ruby on Rails query with the LIKE syntax in :conditions

[ This is working on Rails 2.0.2 in Aptana Studio 1.1.0.007007 with RadRails plugin 0.9.3.6479 ]

In most rails apps, you would do either a simple .find(:all) or .find(:id) or .find(params[:id]). I needed to query my database based on a simple condition, matching all words in a column that starts with a particular alphabet.

select * from mytable where username LIKE a%;
(would return all usernames that start with a, like adrian, avin, etc).

To do that in RoR, I needed to add a :condition to the default query.

However, tagging it this way didn't help
:condition => 'username LIKE #{params[:uname]}%',
the sanitizer would convert that to username LIKE 'a'%, which gives an SQL exception.

After searching the web, scanning dozens of sites, including the rails API, the RoR forum, I accidentally stumbled upon the solution in the comments section of a blog (I can't remember where it was now, sorry).

My query is subsequently modified and tested to be

@results = User.find(:all, :conditions => ['username LIKE ?', params[:uname]+'%' ])

Sorry if this was an obvious thing to a lot of other people... :P

Friday, February 1, 2008

Elsewhere: the Internet is borken too

This is not a local event, but read the news yesterday at CNA.

Internet outage hits business from Cairo to Colombo
Posted: 31 January 2008 2208 hrs

CAIRO: Damage to undersea Internet cables hit business across the Middle East and South Asia on Thursday, including the vital call centre industry, prompting calls for people to limit their surfing.

Around 70 per cent of Internet users in Egypt have been affected since two submarine cables in the Mediterranean Sea were damaged on Wednesday, also rupturing connections thousands of kilometres (miles) away.

...

India's Internet-dependent outsourcing industry was also severely disrupted, with businesses saying it may take up to 15 days to return to normal.

There is another article here by Business Week:

Damage to the Flag Europe-Asia and the SeaMeWe-4 cables have left only the older SeaMeWe-3 system to provide service between Europe and the Middle East, research firm TeleGeography said.

The two cables, with 620 Gbps in capacity, are the prime direct links between Europe, the Middle East and south Asia.


This looks very serious and the cause is yet unknown. Imagine all the businesses relying on that 2 cables losing all access to free/cheap overseas calls, internet (email, websites, ecommerce, etc)!

I wonder what is causing the disruption. Underwater volcano eruptions due to continental movement or... playful whales? :P