Tuesday, January 4, 2011

FRED moves to the cloud

Many FRED users may have noticed that in the past half year or so, FRED has developed a progressively worsening case of narcolepsy. That is, he seems to fall unconscious at times, and fails to respond when you come calling. At first it happened only now and then, and not for very long. These days though, it seems to happen at least once a week, sometimes a couple times a day, for as much as an hour at a time. I have monitoring to alert me when this happens, but I'm not always in range of an internet connection to wake FRED up quickly.

The basic problem is this: FRED has outgrown his home again. For the curious (and geeky), here's some history:

Back in 2002, FRED started out hosted in a cheap shared server setup whose actual hardware specs I never knew. He quickly outgrew that and moved to a dedicated but wimpy Celeron 1.7Ghz box. After that came a 2.0Ghz Xeon single proc server and then a dual proc 2.5Ghz, and for the past couple years, FRED has lived in a dual 3.2Ghz box with 4GB ram.

Up till now, FRED has been hosted in a single server, running Apache/PHP, MySQL, memcached, email, DNS, etc all on that one machine. It currently serves around 1.2 million page views per month, which is not really all that huge, but it's not tiny either. FRED is also a pretty heavyweight application, with lots of database queries, some on tables with a few million records in them, and some complex view definitions. At times of highest traffic, FRED is CPU bound in his current home, especially when some of that CPU is taken by virus scanning and spam filtering over incoming email.

Another part of the problem is this: because of how much work it is to set up a new machine, when FRED outgrows one, it takes me a long time to move him to another one. But virtualization and cloud computing have made this much easier. FRED is now moving into Amazon's Elastic Compute Cloud (EC2). There, I'll be able to provision additional server resources in a matter of minutes or hours, not weeks. It also means FRED can buy more server power for a little less money than with traditional dedicated servers.

FRED's new setup:

Database:
One Standard Large instance (4 CPU units, 7.5GB ram)
Web2 (askfred.net):
One High-CPU Medium instance (5 CPU units, 1.7GB ram)
Web1 (foc.askfred.net, usfaroc.askfred.net, thebaycup.askfred.net, demo.askfred.net):
One Standard Small Instance (1 CPU Unit, 1.7GB ram)


Some of you have also noticed that FRED's email delivery success rate has dropped. This is most likely because FRED's server got on a spam blacklist somewhere (though I've never been able to find out for sure if this is true, nor which blacklist). In conjunction with the move to EC2, I have obtained the services of an outgoing email service which should improve the email delivery rate dramatically. This is a company whose whole job is to deliver email, so they are pros at making sure their servers remain off blacklists and available to send email to you.

Incoming email to @askfred.net (except for support@askfred.net) will remain on FRED's single dedicated physical machine for the time being. This will keep it separate from web and db service, so those two website-critical, latency sensitive services can't be slowed by the very CPU hungry virus and spam scanning processes.

Thanks to everyone for your patience and tolerance while I got this move done. It took many hours of designing and configuring AMIs (virtual server images) upon which to base FRED's new virtual machines. I hope that this will improve both uptime and response times.

Cheers
-Peet Sasaki
FRED admin/developer

3 comments:

  1. I noticed the quicker page loads immediately. Thanks!

    ReplyDelete
  2. Peet,
    Are you using the Elastic Block Store for the database file system? If not, you probably should look into it, or into the AWS MySQL service. Data stored on EC2 local disk is not reliable.

    ReplyDelete
  3. @Alan-

    Don't worry, the db files are on an EBS formatted with an XFS filesystem, with a scheduled (filesystem freeze / snapshot / unfreeze) cycle of the whole volume taken hourly, and rotated to retain hourly for 1 day, daily for 30 days, weekly for 1 year, and monthly for 3 years. I did look at the AWS RDS service, but I decided that I'd rather have a little more control over the RDBMS config, and the cost diff wasn't much. Also I don't yet have a need for their features for HA and replication, as cool as they are.

    ReplyDelete