Friday, December 19, 2008

vsftpd and selinux

Doesn't play together.

If SeLinux is enabled and in the "enforcing" mode, you have to change the selinux config files to allow the user's ftp processes to read/write files from his own directory.

*sigh* 

Friday, November 14, 2008

Fixing a broken... perl module

Whenever there is an update of MailScanner or manual updating of some perl modules from CPAN or updating of software from 3rd party RPM repositories, there is a chance that Scalar::Util will be borken.. broken, resulting in

Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Scalar/Util.pm line 30 

or similar.

The solution was found on a japanese website:
http://d.hatena.ne.jp/ksmemo/20071121/p1

$ wget http://search.cpan.org/CPAN/authors/id/G/GB/GBARR/Scalar-List-Utils-1.19.tar.gz 
$ cd Scalar-List-Utils-1.19 
$ perl Makefile.PL 
$ make 
$ make test 
$ make install 

After that, "perl -MCPAN -e shell" should start without error messages.

Tuesday, October 28, 2008

SQL Injection prevention tool

Interesting tool, should try it out soon:


GreenSQL is an Open Source database firewall used to protect databases from SQL injection attacks.

Wednesday, July 16, 2008

Wiki for *nix / *nux newbies :)

Compile Software From Source Code

http://www.webmonkey.com/tutorial/Compile_Software_From_Source_Code

For those adventurous, brave and creative souls who would like to have a choice and build your own :)

Tuesday, June 17, 2008

Compiling DBD::ODBC on Vista and XP

Background
On my machine(s), I need to use the Unicode-enabled version (specifically, the UTF-16 enabled version) of DBD::ODBC that is distributed on CPAN. However, the default packages distributed and installed by ActiveState Perl is not unicode enabled (tested up to v1.15).

Previously, I have successfully installed my own copy after Googling the web intensively for a day or two and compiling / testing / tweaking the information found on the articles online.

I have decided to post the steps here to "keep a copy" in case I need it again and I can't find it on my harddisk. I was lucky I made notes the previous time, otherwise I would have spent another day or two going through the same stuff again.


Systems / Setups tested against
  1. Windows Vista Business Edition x86, ActiveState Perl v5.8.8
  2. Windows 2003 Server x64, ActiveState Perl v5.8.8 (v5.10 is too buggy for me, esp the x64 edition)

Prerequisites
  1. Make sure you have VS.Net 2005/2008 installed and working. If not, I guess the Windows SDK with the VC++ compiler and headers is a viable alternative (not 100% tested on using Win SDK only).
  2. Make sure your SQL Server 2005 (Express or otherwise) is installed, working, and is patched (ODBC v1.16 fails with unpatched version of MSSQL 2005 due to the "old" SQL client driver provided).
  3. Create an ODBC connection with proper user credentials and permissions on the database for testing (as part of the ODBC installation procedure).
  4. Make sure the PERL modules (eg, DBI) are installed and updated using the PPM (perl package manager).

Procedure

  1. Download and unpack DBD::ODBC from http://search.cpan.org/~mjevans/DBD-ODBC-1.16/
  2. Start the Visual Studio 200X command prompt, navigate to the directory where the module sources are unpacked in Step 1.
  3. Set the ODBC connection parameters with valid information:
    1. set DBI_DSN=dbi:ODBC:test_db
    2. set DBI_USER=user
    3. set DBI_PASS=pass
  4. Run "perl Makefile.PL"
  5. Run "nmake"
  6. Run "nmake manifest"
  7. Run the following command as a single line:
    mt -manifest blib\arch\auto\DBD\ODBC\ODBC.dll.manifest -outputresource:blib\arch\auto\DBD\ODBC\ODBC.dll;#2
  8. Run "nmake test"*, check for errors or problems at this step.
  9. If nothing goes wrong, run "nmake install"**

* Note: The outpuyt from my "nmake test" step looks like this:
D:\tmp\DBD-ODBC-1.16\DBD-ODBC-1.16>nmake test

Microsoft (R) Program Maintenance Utility Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

D:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'bl
ib\lib', 'blib\arch')" t/*.t
t/01base................ok
t/02simple..............ok 1/62#
# Using DBMS_NAME 'Microsoft SQL Server'
# Using DBMS_VER '09.00.3042'
# Using DRIVER_NAME 'SQLNCLI.DLL'
# Using DRIVER_VER '09.00.3042'
t/02simple..............ok
t/03dbatt...............ok 3/26#
# N.B. Some drivers (postgres/cache) may return ODBC 2.0 column names for the SQ
LTables result-set e.g. TABLE_QUALIFIER instead of TABLE_CAT
t/03dbatt...............ok
t/05meth................ok
t/07bind................ok
t/08bind2...............ok
t/09multi...............ok
t/10handler.............ok
t/20SqlServer...........ok
t/30Oracle..............ok
2/4 skipped: Oracle tests not supported using Microsoft SQL Server
t/40UnicodeRoundTrip....ok
t/41Unicode.............ok
All tests successful, 2 subtests skipped.
Files=12, Tests=299, 24 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU)




** Note: The output from my "nmake install" step looks like this:

D:\tmp\DBD-ODBC-1.16\DBD-ODBC-1.16>nmake install

Microsoft (R) Program Maintenance Utility Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.bs
Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.dll
Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.dll.manifest
Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.exp
Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.lib
Installing D:\Perl\site\lib\auto\DBD\ODBC\ODBC.pdb
Installing D:\Perl\html\site\lib\DBD\ODBC.html
Installing D:\Perl\html\site\lib\DBD\ODBC\Changes.html
Installing D:\Perl\html\site\lib\DBD\ODBC\FAQ.html
Files found in blib\arch: installing files in blib\lib into architecture dependent library tree
Installing D:\Perl\site\lib\DBD\ODBC.pm
Installing D:\Perl\site\lib\DBD\ODBC\Changes.pm
Installing D:\Perl\site\lib\DBD\ODBC\FAQ.pm
Appending installation info to D:\Perl\lib/perllocal.pod

Wednesday, May 14, 2008

Intgeration issues? Just Smook 'em together!

Just saw this interesting project being mentioned in TSS:
http://www.theserverside.com/news/thread.tss?thread_id=49313

Really like the features once I see them, hope it goes well, like the other interesting projects at codehaus.

Smooks can be used to:

  • Perform a wide range of Data Transforms - XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, Java to XML, Java to EDI, Java to CSV, Java to Java, XML to Java, EDI to Java etc.
  • Populate a Java Object Model from a data source (CSV, EDI, XML, Java etc). Populated object models can be used as a transformation result itself, or can be used by (e.g.) Templating resources for generating XML or other character based results. Also supports Virtual Object Models (Maps and Lists of typed data), which can be used by EL and Templating functionality.
  • Process huge messages (GBs) - Split, Transform and Route message fragments to JMS, File, Database etc destinations.
  • Enrich a message with data from a Database, or other Datasources.
  • Perform Extract Transform Load (ETL) operations by leveraging Smooks' Transformation, Routing and Persistence functionality.
Looks great for those very "interesting" integration projects between legacy system A and legacy system B. Will have to keep it in mind if I ever encounter such requirements in future projects.

Check it out at:
http://milyn.codehaus.org/Smooks

Monday, May 12, 2008

Unscrupulous scammers

Those conmen who call up parents claiming to have kidnap their kids should be dealt with the same way kidnappers are dealt with, by caning, fine, jail and deportation if they are illegal immigrants.

'The first thing I heard was a man say in Mandarin 'Lao Siong (brother), I kidnapped your son.'

http://tnp.sg/news/story/0,4136,164397,00.html

Sunday, April 6, 2008

Divison in Python, versus Ruby

Just started on Python 101...

Invoked the command line interpreter, and noticed something interesting while playing with it...

Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 1/2.0
0.5
>>> 1/10.0
0.10000000000000001
>>> 1/3.0
0.33333333333333331
>>>


Hmm... why is there a "tail" left behind? :P

Started up my Ruby interpreter, just to peform a comparison...

C:\Users\xq>irb
irb(main):001:0> 1/2.0
=> 0.5
irb(main):002:0> 1/10.0
=> 0.1
irb(main):004:0> 1/3.0
=> 0.333333333333333
irb(main):005:0>

Cool... :P

Friday, April 4, 2008

eNets payment is so unfriendly + Vista sucks

Previously, I heard from my friend that the back-end for eNets payment gateway was "upgraded". Not sure how it went, but today is the first time I experienced it from the "front" when trying to pay for some coach tickets.

I am using my Vista laptop with Firefox as the web browser.

1st Attempt
Upon checking out my shopping cart, I was redirected to eNets' Welcome Gateway, which required me to enter my email address (why? no reason? happy? or so they can blast marketing material at me? do they have the right?), and my choice of my payment. Hey! If there is only 1 choice of payment, do I still have to select it from a dropdown list? Why so mafan? Why can't they just detect it at the server side and then present the item to the user?

Nevermind, I submitted my details and selection to the server and was promptly presented with the next page. I tried to enter text into the Name textbox, but it was disabled. I can't enter any data at all!

Seems like eNets do not like Firefox, or because I'm not using the latest and greated Java 6 JRE (where got payment gateway ask you to upgrade your bloody JRE in the middle of a credit card transaction one? this must be the World Class syndrome, you want World Class service, you better be prepared to have world class JRE installed!)

So ok, I cancelled the transaction, no choice. Went out, closed my login session on the coach website. Started IE 7 on Vista.


Attempt Duo,
IE7 started up, I went inside through the same action. Luckily the coach website kept my shopping cart across session! (YAY!). Went to eNets website, duly typed in my credit card details, and then clicked the "Submit" button.

*BOOM*

Vista has halted all my web surfing for no reason. I can't go to yahoo mail, gmail, blogspot, etc.. Nothing! All the pages time out on me. MSN messenger is still running fine. This is not the first time my Vista has done this to me, sometimes all TCP connections get shutdown too. In fact, on my Vista laptop, if I even plug in a LAN cable into the port it will blue screen immediately and die. Yes, this is my Vista experience so far. It sucks. Today, the suckiness has dropped to a new low. It hasn't been this low in ages. Thanks alot, Bill Gates, and take your fishes along when you go. I'm not impressed by Vista lor...


Attempt 3
Really fedup. Rebooted laptop. Logged onto my Windows 2003 Server R2 SP2, 64-bit OS. Started up the IE7 on the server.

Login to the coach website, ah ha! my cart is still intact! no need for retyping all the details and selecting the seats on the coach. YAY!

Checked out the cart, go go go!!! Redirected to eNets.. and...

*engine dies*

The redirection failed. It just hanged there. I can't even get into the page to fill in my email and select the only payment choice from the dropdown list.

*pfffft* grrr!!!


Attempt 4
My Vista laptop is back, while waiting for the startup activities to finish (yes, after you login, Vista takes another 3-5 minutes doing its own thing, the harddisk activity doesn't stop as it starts up the rest of the stuff in the background. I have since disabled autostart for mysql, mssql 2005, etc. Even though the starting up is fast, it just postpones the actual work to after you sign in, bleah...).

Anyway, I digress... I started up my IE7 and went through the usual, familiar motions ALL OVER AGAIN FOR THE FIRST TIME FOR THE LAST TIME, and this time it works. I was able to get my payment processed and confirmed.


*whew* Imagine if this was the opening day for some blockbuster hit, or if I was trying to book a great seat for my family for the F1 race. This whole fiasco would ruin my chances to get the seat that I want, man...

eNets, the service is pathetic, and instead of providing a service to the user by adapting yourself to the user's environment (browser type, browser version, java or the lack of it), you force them to do it your way. Must be IE7, must have JRE, must be patient to use your crap. Next time I encounter this, I will go down to the shop and pay for it. I do not see the increase in service level that corresponds to the increase in your payment processing surcharge.

Extremely horrible experience...

Wednesday, April 2, 2008

Updates from NS.sg


Right on the heels of the IPT notification email, I received another email from NS.sg, and this time, it is properly formatted!

Sigh, so why is it so different? :)

Tuesday, April 1, 2008

IPT notification email from NS.sg

I signed up for IPT (it's like IPPT, but you volunteer for it before they volunteer you for it :P) because I know my chin-up is not good enough. While signing up, I chose to be notified before the session starts... and as you can see for yourself (picture is below), the notification mail is really not impressive loh! :P



Eh, can test first then send or not? :P

Friday, February 29, 2008

Server DOWN!!!

The server is a dedicated machine hosting one of Singapore's most happening website for tertiary students, with lots of interesting undergrad students being interviewed on a regular basis, and a large/active community in the forums... (hint: f*n*ygrad.com, and no, it's not a dirty four letter word and I shouldn't reveal it here).

Initially, the site experienced a problem with php connecting to the database. The error was that php cannot connect to the database because the file XXX.MYD cannot be located. I told my friend (who owns the hosting infrastructure) that if the problem is serious enough, the site owner will call him directly. That was yesterday afternoon. And nothing much happened. They only traded a couple of emails. I provided some advice, offhandedly, that most probably it is a corrupted database (as the MYD file contains the data table indices which I found out after some googling).

Well, nothing happened until evening. This morning, my friend told me that he was called and SMS at 12am, 1am, 2am... well, you know the drill, the owner is really jumping and reality has finally set in for him (the first few hours are usually denial, then requests for rebooting the machine, then testing, usually by vigorously pressing the "refresh" button on the browser, as if that will solve all the world's problems).

The web server is down.

Ditto to the email server.

Next, came the threats to remove the server and host it elsewhere. Because if the server is hosted with you, then its your duty to ensure that its up and running, despite the fact that we do not have any access to it (no passwords).

A blunt analogy is this: if you really have cancer, no matter how many doctors you go to, you still have it, changing doctors do not solve the problem.

After offering him my blunt opinion, I was told that I need to chill down and we are not pushing blame... What the ****???!!!

And so... the negotiation begins, to provide emergency server rescue, what cost? can we guarantee 100% recovery (if there's no backup, how can I guarantee all the data will be back?)? "confirm don't have the root password", so we have to reset it for him too.

Wow... a totally unmaintained server can be online for so long, too! I set the server up for the owner a couple of years back and left it to him as he wasn't interested in managed services. I can't believe it can survive so long in today's world! The server hardening I put it through was helpful, after all... :P hahaha!!!

So next, the final nego... and going down for a site visit and recovery effort estimation... meanwhile, I'm preparing a couple of LiveCDs (Knoppix, Ubuntu), installers (CentOS 5.1), System Rescue CD 0.4.3, and mentally prepping up for the tasks ahead. Hmm.. what else do I need to bring along? Once I'm there, it's in the middle of nowhere, not much chances to come out and buy anything I missed.

Friday night seems to be burnt for small change... I don't mind if someone else can do it for cheap actually, I need my rest... :|

*zzz*

Friday, February 15, 2008

World Class infrastructure for a World Class Event?

No, it wasn't to be so...

This is the headline from Straits Times article

Website booted him out three times
British Airways pilot Benterman takes 10 hours to get tickets for F1's first night race

To make it a double whammy, the permanent resident found, to his horror, that the seats he had reserved were lost when he was booted out of the website.

'It was absolutely frustrating and a disgrace,'' said the exasperated 39-year-old.

'I cannot accept not getting through to the website because it crashed. There is also no customer service number to call.''


Apparently, the website was supposed to be capable of handling 20,000 transactions per hour and the actual traffic apparently was way over what was expected.

Questions:
  1. Did anyone take the last F1 race's figures for a comparison and benchmarking?
  2. Was there a big change in the way the tickets are sold?
  3. Did the system incorporate proper transactions handling, queueing and all that?
  4. Was the system properly load tested before going live?
  5. Was someone even monitoring the system after it went live? Why was no action taken? (cf. MRT down, buses were deployed)
  6. Was there a contingency plan in place? (obviously not)
  7. Could the launch have been scheduled in phases? (online sales first, then outlet sales?)
  8. Were corners being cut in the system hardware so someone could save a few bucks? Or was the organize scammed by vendors who gave 3rd class hardware for 1st class prices? (which is normal)
According to the news on the radio this morning, a hardware upgrade should be sufficient to solve the problem. Shame on the SI (system integrator) and/or hardware vendors who supplied the "solution".

So, we'll see... :)

Monday, February 11, 2008

Understanding a geek :P

Found a very interesting blog entry (by the author of a book called "The Nerd Handbook"):
http://www.randsinrepose.com/archives/2007/11/11/the_nerd_handbook.html

In summary, these are the traits:
Understand your nerd’s relation to the computer.
Your nerd has control issues.
Your nerd has built himself a cave.
Your nerd loves toys and puzzles.
Nerds are f**king funny.
Your nerd has an amazing appetite for information.
Your nerd has built an annoyingly efficient relevancy engine in his head.
Your nerd might come off as not liking people.
The best part is this paragraph:

Understand your nerd’s relation to the computer. It’s clichéd, but a nerd is defined by his computer, and you need to understand why.

First, a majority of the folks on the planet either have no idea how a computer works or they look at it and think “it’s magic”. Nerds know how a computer works. They intimately know how a computer works. When you ask a nerd, “When I click this, it takes awhile for the thing to show up. Do you know what’s wrong?” they know what’s wrong. A nerd has a mental model of the hardware and the software in his head. While the rest of the world sees magic, your nerd knows how the magic works, he knows the magic is a long series of ones and zeros moving across your screen with impressive speed, and he knows how to make those bits move faster.

The nerd has based his career, maybe his life, on the computer, and as we’ll see, this intimate relationship has altered his view of the world. He sees the world as a system which, given enough time and effort, is completely knowable. This is a fragile illusion that your nerd has adopted, but it’s a pleasant one that gets your nerd through the day. When the illusion is broken, you are going to discover that…


Actually, I would prefer the word geek to nerd :P ... Nerd sounds too... nerdy :P


Read the blog to find out more! :D

Enjoy!




Can? How much? How fast?

When doing freelance IT projects, some questions from prospective local customers would sometimes be like this:

"Hi, can you do a SQL/web/intranet/(fill in with appropriate IT word) program?"
"How much har?"
"When can finish?"

This is very common and I am always very cautious when dealing with such customers because:
  1. They typically (>90% of the time) do not know what is the effort involved (most likely they learn of this requirement from an in-house IT guru, who might or might not have any experience in IT)
  2. They also do not know what they really want (they are just relaying someone else's words)
  3. They only look at the cheapest quote
Most of the time, I would quote them what I feel is reasonable (of course!) base on a lot of assumptions (the more assumptions and buffer, the more costly it is) as they can't provide me with a reasonable basis to work out a quotation. This is really a case of "you get what you pay for".

For customers whose only concern is cost, I would happily give them a miss as they are the "don't care, don't know, don't bother me unless it is delivered and working" type. These companies are the type that hobbles along on a barely working/functional and often broken IT infrastructure, going from vendor to vendor/supplier whenever the system is down, because they only look at the cost.

Vendor after vendor apply various patches, workaround and upgrades to the original system until it is barely recognizable, and maintainable. Most, if not all, of the time such companies do not have a documentation of the system and the database, resulting in tremendous efforts in tracing through the system and trying to figure out what it is supposed to be doing. And yes, most of the time this work is being performed on production systems too.

Despite the claims by the press, internet and local authorities, the majority of SME owners are rather IT illiterate and clueless (this is base on my limited experience). Most of the time, cost is the only concern. Some of them are happy with a halfway broken system because of the "I know it's broken but I have a staff doing it, fixing it will cost a lot of money lehhh..." way of life. They will devote a staff or two, or even three to perform some of the functions that the system should be performing alone if it was not broken.

Even worse are those who are semi IT-literate, certified as literate after attending a 3 day course in IT conducted by instructors who have barely have any experience in real life IT operations. They are adamant that their "IT Way" is correct and you should not attempt to advise them because they know better.

Let me provide an analogy. Suppose you need to buy shoes, you can either chose to buy a cheap one, or a slightly more expensive but durable one. So, would you rather buy a $30 pair of shoes that spoils every 3 months, or a $150 one that can last you at least a year or more? Some companies would avoid paying the $150 like the plague because the perceived cost is "high". That's sad, limited, and yet very real.

Therefore, in order to secure the job and to fix the problems properly, a lot of communication and persuasion is necessary. The customers must be convinced of the value of the solution, and invest his/her own time into it to help shape the final solution. IT is central to a lot of their operations and can be made to provide more assistance to their business, but yet is given very little priority and investment.

Having said that, there are a lot of moonlighters out there who over promise and under deliver, causing this vicious cycle to continue. I have personally seen some local e-commerce sites with extremely poor exception handling in their purchase and payment code. Yes, they have the usual certificates, logos and seals, but the certification process does not include the testing or validation of the source code itself.

Shop on local websites? Err... maybe not yet... let someone else be the guinea pig for these eBay wannabes :)

Maybe I will get a chance to help fix these borken sites once they get complained :P haha!

Tuesday, February 5, 2008

Torvalds on Microsoft's patent bluff

I have argued a few times with my friend (who is an MVP for Microsoft) over the position/perception of Microsoft, is it a monopoly? a big friendly giant? or what? He is more exposed to the technical folks who (mainly) talk about technologies and so on, and he has failed to see the legal, and business side of the games Microsoft plays.

I told him that yes, there may be nice, friendly peeps there at MS, but there are also people who has nothing to do but spread FUD (Fear, Uncertainty and Doubt) and people who play games on both sides.

In this article published by Linuxworld, Torvalds remarked that
"I think there are people inside Microsoft who really want to improve interoperability and I also think there are people inside Microsoft who would much rather just try to stab their competition in the back," he said. "I think the latter class of people have usually been the one[s] who won out in the end, but -- so I wouldn't exactly trust them."

Me neither. Keep your friends close, but your enemies closer :P

NYT article on the cut cables

NYT has an article on the borken internet cables off the coast of Egypt:

Telecommunications operators have been trying to diversify the routes used for transmissions, said Alan Mauldin, research director with TeleGeography Research, particularly since an earthquake in Taiwan in 2006 disrupted service in Asia.

The cable network contains “choke points” — like those off the coast of Egypt and Singapore where many cables run, Mr. Mauldin said.

I... uh... guess the seas here are safer? :D

Monday, February 4, 2008

a Ruby on Rails query with the LIKE syntax in :conditions

[ This is working on Rails 2.0.2 in Aptana Studio 1.1.0.007007 with RadRails plugin 0.9.3.6479 ]

In most rails apps, you would do either a simple .find(:all) or .find(:id) or .find(params[:id]). I needed to query my database based on a simple condition, matching all words in a column that starts with a particular alphabet.

select * from mytable where username LIKE a%;
(would return all usernames that start with a, like adrian, avin, etc).

To do that in RoR, I needed to add a :condition to the default query.

However, tagging it this way didn't help
:condition => 'username LIKE #{params[:uname]}%',
the sanitizer would convert that to username LIKE 'a'%, which gives an SQL exception.

After searching the web, scanning dozens of sites, including the rails API, the RoR forum, I accidentally stumbled upon the solution in the comments section of a blog (I can't remember where it was now, sorry).

My query is subsequently modified and tested to be

@results = User.find(:all, :conditions => ['username LIKE ?', params[:uname]+'%' ])

Sorry if this was an obvious thing to a lot of other people... :P

Friday, February 1, 2008

Elsewhere: the Internet is borken too

This is not a local event, but read the news yesterday at CNA.

Internet outage hits business from Cairo to Colombo
Posted: 31 January 2008 2208 hrs

CAIRO: Damage to undersea Internet cables hit business across the Middle East and South Asia on Thursday, including the vital call centre industry, prompting calls for people to limit their surfing.

Around 70 per cent of Internet users in Egypt have been affected since two submarine cables in the Mediterranean Sea were damaged on Wednesday, also rupturing connections thousands of kilometres (miles) away.

...

India's Internet-dependent outsourcing industry was also severely disrupted, with businesses saying it may take up to 15 days to return to normal.

There is another article here by Business Week:

Damage to the Flag Europe-Asia and the SeaMeWe-4 cables have left only the older SeaMeWe-3 system to provide service between Europe and the Middle East, research firm TeleGeography said.

The two cables, with 620 Gbps in capacity, are the prime direct links between Europe, the Middle East and south Asia.


This looks very serious and the cause is yet unknown. Imagine all the businesses relying on that 2 cables losing all access to free/cheap overseas calls, internet (email, websites, ecommerce, etc)!

I wonder what is causing the disruption. Underwater volcano eruptions due to continental movement or... playful whales? :P

Wednesday, January 30, 2008

the mystery of the borken server, SOLVED

Acknowledgements

Thanks to maxsec (from MS's irc channel) and Jules (creator) of MailScanner!


Summary

Problem was two-fold:

  1. I did not notice that Mail::ClamAV and Mail::SpamAssassin packages were not installed properly when running the install script provided in install-Clam-0.92-SA-3.2.4.tar.gz (error information below)
  2. My system had /tmp mounted as "noexec" (is this a default BlueQuartz setting, or did I change this when the system was hardened previously?)


MailScanner diagnosis procedure

  1. After installing MailScanner, run MailScanner --lint, check for any errors that get thrown out.
  2. If there is any issue, run MailScanner -v to see the versions of the installed modules, make sure that they are correct.
  3. If it is not conclusive, run MailScanner --debug or MailScanner --debug --debug-sa (if you have SpamAssassin)
  4. If problem persists, Google it and search the MS Mailing List Archive (it is active).
  5. If there is still nothing conclusive, go to the IRC channel and ask for help.
  6. Subscribe and post the problem in the MS Mailing List too.


Error Information

An error was thrown during installation of Mail::SpamAssassin when I ran the install script in
install-Clam-0.92-SA-3.2.4.tar.gz. (Remind myself to maximize the Putty screen next time).


Setting a soft-link from spam.assassin.prefs.conf into the SpamAssassin
site rules directory.
spam.assassin.prefs.conf is read directly by the SpamAssassin startup
code, so make sure you have a link from the site_rules directory to
this file in your MailScanner/etc directory.
Perl could not find your SpamAssassin installation.
Strange, I just installed it.
You should fix this!

Making backup of pre files to /tmp/backup.pre.3457.tar
tar: *pre: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors
Now go and find your v310.pre and v320.pre files,
echo which may well be in the /etc/mail/spamassassin directory.
You need to save a copy of your old v320.pre file and rename
the v320.pre file to v320.pre.


Moving on

*sigh* :)

Now I have to keep reminding myself to be extra careful when updating this server in the future. Not sure about why the other servers are fine. Maybe the manual installation of SpamAssassin source helped but I didn't do it for this server due to its custom configurations.

In the future updates of MailScanner, I will need to:
  1. Download and unpack the new MS package / installer.
  2. Go into the perl-tar directories and list all the PERL modules.
  3. Open up CPAN (perl -MCPAN -e shell) and compare the version of the installed modules vs those with the MS package / installer.
  4. If the versions are not ok, unpack those files that came with the MS package / installer, manually update them via the usual perl Makefile.PL -> make -> make test -> make install as root.
Ok, that's it for now!

Thanks to the advice and help from the people in MS's IRC channel, and especially to maxsec and Jules!

Monday, January 28, 2008

the mystery of the borken server

Summary

The MailScanner processes on one of my server hangs, and it gets worse as the number of children is increased. Setting a very low number of Children helps, but the problem is not solved.


Background


Server Hardware (Dell)
  • CPU: AMD Dual Core Opteron (2210)
  • RAM: 2GB
  • 2 x 160GB SATA (configured with software RAID 1)
(Key) Server Software


The Problem

The customers (actually it is the customer of my customer) are fairly new, less than a year.

After a recent upgrading, the customers noticed a slowdown in the performance of the email server. Outgoing emails takes a long time to be sent after they hit the "Send" button in the email client. Sometimes it take up to 5 minutes.

So, the parties involved are:
us <-> customers <-> end-customers


Some Context Information

The end-customers are actually located in another country, but the email server is hosted and administered locally.

Network from end-customers to here is routed overseas (which could be contribute to instability at times).

Number of customers is not high, but the network is critical to their international operations.


The Conjecture/Guesses/Hypothesis

  1. Network is unstable or packed, causing upstream traffic to be slow (retrieving emails is fine though). Or their bandwidth is asymmetrical, with upload speed a fraction of the download speed.
  2. Data center network is unstable or does not have peering with customer's network provider, resulting in traffic being routed here indirectly.
  3. Server is under DDoS / spammer attack.
  4. Customer's network has p2p applications running, thereby causing bottlenecks in their internal networks. Or they are hosting web applications in-house, causing their outgoing traffic to be swamped.


Initial Observations

After logging on to the server in the dead of the night (with only some cats and cars passing on the street outside), I noticed that
  • the server load is high, with uptime of >3 (using uptime and top)
  • the email traffic is almost non-existent
  • only 1 user was accessing the server, as evidenced by the paucity of "pop3-login"s in /var/log/maillog
  • MailScanner --lint did not give any errors or warnings
Doesn't seem like the server was under attack (after checking with netstat, lsof), there was spam coming in, at least 1 per 2-3 minutes.

I looked at the MailScanner process and found that it was using the CPU at 100%. Doing a ps on it shows that the processes are hanging at "starting children". Restarting the processes is very slow, the master process dies before the children dies. It takes ages for the children to die (>1 minute, to a maximum of 4 minutes when I ran out of patience). Restarting is the same, the processes hang at the "starting children" stage for a long time with uptime exceeding 3. Once the MailScanner process starts properly, the CPU time consumed was already more than 3:00.00 (as shown in top), I guess that's 3 hours? WOW!!! :O


The Constraints

  1. Obviously, I can't just take the email server offline and play with it.
  2. The actual problem is not obvious and really going through the source code and debugging is tough, if not impossible.
  3. MailScanner is a huge piece of software, and its not easy to find out where the process is hanging (unless Linux has something like DTrace for Solaris and assuming I know how to use it).


The Experiment

The factors which I feel are likely to affect MailScanner load and processes are listed below:
  1. MailScanner, Max Children = X
  2. MailScanner, Virus Scanning = yes|no
  3. MailScanner, Use SpamAssassin = yes|no
  4. MailScanner, spam.whitelist.rules (turn off spam checking for certain domains)
At 12-1am at night, I wasn't too awake (besides I have been coding away for the whole day), so I couldn't come out with more...

1st set of Experiments

I tried out the easiest combinations by first setting #2 #3 to "no", and then played around with #1 from 2 to 5. Nope, the only observation was that as the number of children increases, MailScanner took an (almost) exponentially longer time to start. Actually, I couldn't bother to wait and time it, I just "killall MailScanner".

2nd set of Experiments

I tried to keep the number of children, #1, constant and tested with #2 and #3 on and off alternately. Didn't help either. It seems that the problem is tied to the number of children being started.

3rd Experiment

I reinstalled MailScanner. But, it doesn't work either.

MailScanner is dependent on a lot of PERL modules. The recent server upgrade might have installed/broken something. Or it could be that the CPAN-based modules (perl -MCPAN -e shell) that I have installed previously is affected MailScanner.

One of the questions that kept bugging me is, where does CPAN installed PERL modules go, and where does RPM install PERL modules go? Which one does PERL use if both exist?


[nothing works... *sob* 2:15am... and it's all not working... so... gotta think of something fast before end-customers get online and it's DOWN, then I'll really have early morning calls with people screaming and shouting into my ear]

As a last resort, I configured the server with
  • Max Children = 2 (this still takes a couple of minutes to start)
  • Use SpamAssassin = off (but Spam List = spamhaus-ZEN is retained)
  • insert customers' domain into spam.whitelist.rules (so that outgoing emails will not be checked, and hence, this will hopefully increase the speed at which emails are relayed)
  • Restart Every = 28800 (restart every 8 hours) as the killing and respawning of children processes will cause the hang, could be lengthened to 12 hours also, since I have 1GB of RAM free
So far, so good... it's been 15 hours since...

5 hours of sleep sucks...

To really troubleshoot the problem? I installed MailScanner on a VirtualPC with CentOS 4.6 plain (no GUI, no other services except sendmail). BlueQuartz crashes when installing into a VirtualPC environment so I can't test it.

And... everything works fine in the VirtualPC!!!

ARGHH... maybe I really have to remove all the CPAN-installed modules, remove all the RPM-installed modules and stick with the ones installed by MailScanner. *sigh* Will update if this really works, what else can I do? :D

system administration, you think you got what it takes?

Kudos for other system administrators out there!

Since one of my roles is a system administrator, would like to give an acknowledgment to fellow admins out there as I start this blog! :)

I picked up my passion for system administration 13 years ago, when one of my seniors conducted a talk on Linux and helped us made copies of Slackware onto a dozen floppy disks so we could play with it at home. Since then, I have also tried Gentoo, RedHat, CentOS (which is my favourite today), FreeBSD, Solaris 8 - 10 (never admin-ed live sites on Solaris yet), etc.

System administration is a tough job (you know I know but users don't), you have to know the hardware, software, network you are handling and take into account everything holistically . You have to know your users, how they access the server(s), what kind of environment they have, what kind of businesses they are running (dubious email marketers are the ones to avoid), their IT literacy level, etc. Troubleshooting requires a lot of eliminations and tests while users are screaming for their services to be up NOW!!!

The job demands a "big picture" understanding and a sharp eye on minor details. It's like being a car mechanic, when there is an extra rattle or squeak, you know something has got to give soon. Typically, you will have some preventive measures in place, but when it comes to the crunch, it all boils down to a quick identification of the problem and an even quicker fix. Who cares, as long as it works, right? ;) Actually, a quick fix doesn't work and in the end, we are ones who will have to solve the actual problem anyway.

When I start this blog, my intention is to use it to take notes on the problems I have and the way I go about solving it, maybe it would be of use to others should they stumble upon this blog.

Oh well, here goes...

Why is the Internet borken?

There are a few reasons for this...

I had a phone call near midnight to troubleshoot an under-performing server. Its not the first time that I receive urgent requests ("URGENT!" "HELP!") for help, but I really hate the early mornings calls or late night ones just when I'm about to sleep, and especially when the calls come in when I'm SLEEPING. I'm not paid to be on standby 24x7, my work / agreement clauses did not stipulate a 24x7 work hour.

isBorken is influenced by my geek.. err... programming background and LOLCat speak -ICHC

:)

will update this as and when i have the inspiration :D