Hello tournament seekers-
Over a year ago, I had to disable the "event size" and "expected event rating" search criteria in FRED's upcoming event list. At long last, they are back. These are one of the more valuable features of FRED's event search, so I'm very happy to have them back, and I'm sure you will be too.
You'd be surprised how much work it was. Admittedly, most of it was "under the hood" work that will be useful for lots of other features, so it's not just this one feature that caused all the headache. It's kind of like building a car just to drive to the store for milk, but we'll be able to use that car for so much other stuff, I swear!
Anyhow, thanks everyone for your patience while these features were "on vacation".
-p
Warning: extreme geek-ness follows:
The problem:
These two filters were implemented in SQL, using some big joins and subselects on the preregistration table, and (brace yourself...) the USFA event classification chart expressed as an SQL view. Yeah, I admit, I did that one just to prove it could be done. This all worked fine while there were only a few tens of thousands of preregs in the db. But there are now over half a million. Whenever someone used one of these two criteria (esp the rating search), the database server would slow to a crawl and web page loads would time out, bringing the whole site to its knees for all users, not just the one searcher.
Ouch.
The Solution:
The current prereg count and predicted event rating are precalculated and saved in the event table so those filters are now just a simple where clause. But that's the simple and obvious part. The hard part is keeping them in sync in near-real-time as people preregister for the tournament. One way to do this would be to recalculate these values and update them as part of the same transaction as the user's preregistration. This is less than ideal because that could slow the response time to the user's prereg submission, just so we can accomplish some housekeeping tasks. Not cool.
Instead, FRED now has a task queue based publish-and-subscribe system for deferring such processing to a background worker process. Each time someone preregisters for a tournament, a small message is sent to the queue in fire-and-forget style. A worker process is continually pulling messages from the queue and acting on them, in this case updating the event table's prereg count and rating prediction fields. The whole process takes about 5-10 seconds, so the search criteria are correct very soon after the preregistration happens.
This pub-sub system will be super-useful for decoupling cause and effect in FRED's processes, and for deferring costly processing to the background, to preserve front end performance.
Whew!
-p
FRED tek
The tech that runs FRED, the USA's Fencing Tournament resource.
Wednesday, December 14, 2011
Sunday, August 7, 2011
Turns out it was a Chinese bot.
As it turns out, FRED's recent downtime was caused by an ill-behaved crawler run by a Chinese search engine. When this issue first arose, one of the first things I did was to look for excessive numbers of requests coming from single IPs, and this bot had been among the top 3 or 4 clients over the better part of that day. But because it made fewer requests than other crawlers such as Google, Bing, and Yahoo, I discounted it as a cause of the issue. After all, it had made fewer requests than those other well-behaved bots, which FRED has no problem serving.
However, I had retrospectively counted the Chinese requests over the whole day in aggregate. When I had a chance to watch the server processes escalate in real time, I saw that one IP address was making as many as 100 concurrent requests! It was the IP of that Chinese spider.
Adding a line to FRED's firewall config fixed the problem by blocking that IP (their whole class B subnet actually). So FRED's search rank in this chinese search engine will suffer, but I'm ok with that. :^)
-p
However, I had retrospectively counted the Chinese requests over the whole day in aggregate. When I had a chance to watch the server processes escalate in real time, I saw that one IP address was making as many as 100 concurrent requests! It was the IP of that Chinese spider.
Adding a line to FRED's firewall config fixed the problem by blocking that IP (their whole class B subnet actually). So FRED's search rank in this chinese search engine will suffer, but I'm ok with that. :^)
-p
Thursday, August 4, 2011
Tweaked Apache config
Ok so I opted for the "tweak apache" option. I lowered the MaxClients setting and set MaxRequestsPerChild to 1000. Neither of those will directly stop the number of processes from escalating, but they might cause different behavior to occur when the number of processes gets too high.
We'll see.
We'll see.
Wednesday, August 3, 2011
FRED is flapping
FRED's webserver has been down for a number of short (~3 minute) periods all day today. Beginning with a few incidents on friday and a few over the weekend, escalating to 24 such incidents today (so far). Apache simply spawns gradually more and more child processes until it exhausts memory on the server, at which point it fails to respond to a probe from FRED's auto-restart monitor. At that point the monitor restarts Apache, and all is well until the next time it fills the available memory.
I may wave a dead chicken over some Apache settings, but since this is suddenly happening with no config changes or significant change in traffic on the site, I'm tempted to just spin up a new EC2 instance and see if that helps. Maybe the current instance is just going bad in some impenetrable way?
-p
I may wave a dead chicken over some Apache settings, but since this is suddenly happening with no config changes or significant change in traffic on the site, I'm tempted to just spin up a new EC2 instance and see if that helps. Maybe the current instance is just going bad in some impenetrable way?
-p
Thursday, April 21, 2011
Well *that* was painful....
FRED is back from today's EC2 outage. Amazon has three of the four availability zones in their us-east-1 region (virginia) datacenter operating. Unfortunately, FRED's database server was running in the one zone that is still sick. However, I was able to snapshot the database's EBS volume and create a new volume from that snapshot in one of the other zones, then fire up a new DB server instance in that zone, attach the new volume to that instance and get things rolling again.
While riding the bus home from work.
Yay for wifi on the bus. Boo for Amazon having a full day outage.
I was getting pretty proud of FRED's 99.99% 30 day uptime. Now it's all shot to hell: 97.58%!
Oh well. At least it wasn't on friday or saturday when everyone would be trying to download their preregs.
-p
While riding the bus home from work.
Yay for wifi on the bus. Boo for Amazon having a full day outage.
I was getting pretty proud of FRED's 99.99% 30 day uptime. Now it's all shot to hell: 97.58%!
Oh well. At least it wasn't on friday or saturday when everyone would be trying to download their preregs.
-p
EC2 outage
Hello faithful FRED users- Today Amazon EC2's us-east-1 region is experiencing a serious, sustained outage in EBS connectivity and creation. FRED lives in us-east-1a, so his database server (whose files live in an EBS volume) became inaccessible at about 1am PDT.
All I can do is wait for EC2 to fix the issue. Very curious or geeky folks can follow their progress here: http://status.aws.amazon.com
Please accept my apologies for the FRED outage.
-Peet
All I can do is wait for EC2 to fix the issue. Very curious or geeky folks can follow their progress here: http://status.aws.amazon.com
Please accept my apologies for the FRED outage.
-Peet
Saturday, March 26, 2011
Another try at handling the accented characters in fencers' names
FRED is getting used more and more in Canada these days, which is very cool. However, it's brought a long-standing problem with FRED closer to the surface: Multibyte characters.
FRED is written in PHP, a great language for quickly building complex web applications. However, PHP's support for multibyte encodings is less than awesome. Also, lots of the code in FRED was written in the early days of PHP4 when mb support was even worse.
Lots of our Canadian friends and their clubs have names with accented characters represented in the UTF-8 multibyte character encoding. Their names are accepted into FRED just fine, but when they are transmitted back and forth with Fencing Time, the XML-related PHP functions FRED uses to read and write preregistration and results handle the mb characters badly. There's a bunch of info out there on the web as to how to best handle this problem, and I've tried lots of them, with mixed results.
Today I deployed another attempted solution to write UTF-8 XML preregistration files for import into Fencing Time using the mb_convert_encoding() function to ensure that the stream output is valid UTF-8.
Given how many systems this data passes through (FRED's webserver, db server, your browser, your OS, Fencing Time, and back again...), it's hard to be sure everything works 100%, but so far this change has performed well in my tests. Hopefully the real world will behave similarly.
-P
FRED is written in PHP, a great language for quickly building complex web applications. However, PHP's support for multibyte encodings is less than awesome. Also, lots of the code in FRED was written in the early days of PHP4 when mb support was even worse.
Lots of our Canadian friends and their clubs have names with accented characters represented in the UTF-8 multibyte character encoding. Their names are accepted into FRED just fine, but when they are transmitted back and forth with Fencing Time, the XML-related PHP functions FRED uses to read and write preregistration and results handle the mb characters badly. There's a bunch of info out there on the web as to how to best handle this problem, and I've tried lots of them, with mixed results.
Today I deployed another attempted solution to write UTF-8 XML preregistration files for import into Fencing Time using the mb_convert_encoding() function to ensure that the stream output is valid UTF-8.
Given how many systems this data passes through (FRED's webserver, db server, your browser, your OS, Fencing Time, and back again...), it's hard to be sure everything works 100%, but so far this change has performed well in my tests. Hopefully the real world will behave similarly.
-P
Subscribe to:
Posts (Atom)