Monday, November 11, 2013

FRED's new payment processor


As some of you know, FRED has recently switched to a new credit card processor. The new service, called Stripe.com, is a much more modern and web-app-friendly service. It has two main advantages over the previous service:
  • Better customer security: Stripe uses some javascript and cryptographic magic to authorize credit cards without the sensitive data ever being sent to FRED. This means that card numbers (which have never been stored by FRED), now don’t even pass through FRED’s servers. Fewer “hops” means more secure. 
  • Automation: Stripe is a credit card service built for the web, with an API that will let us automate all kinds of good things. In particular, we’ll be able to automate ACH transfers of your tournament fees directly to your bank account. You’ll be able to enter bank account details via that same super-secure system mentioned above, so your account details will never pass through FRED’s servers, nor be stored there. 
After the switch to stripe, there were, however a couple problems. To anyone that was adversely affected by these issues, please accept my sincere apologies. There is nothing about the site that I take more seriously than executing financial transactions smoothly and correctly. In the interest of full transparency, here are the issues:

Issue: After the switch to Stripe, FRED failed to credit a payor for payments made prior to the switch (via the old payment processor).
Impact: Some customers were confused, and few were charged twice for the same event.
Status: Solved. This should not happen any more, and all known cases of double-charges have been refunded. FRED now correctly credits customers for payments made via the old processor.

Issue: Successful payments not recorded in FRED, due to FRED applying excessive validation of the payor email address.
Impact: 6 payments were not correctly recorded.
Status: Solved. The validation has been fixed, payment records have been repaired, and refunds given where necessary.

Issue: Browsers with Javascript turned off submitted incomplete payment form data to FRED.
Impact: A dozen or so users were unable to pay, and received bland "an error has occurred" type messages.
Status: Solved. The payment form is no longer displayed in browsers with javascript turned off. Instead a message prompts the user to enable javascript.
Issue: Payment CSV reports excluded payments made via Stripe.
Impact: Incomplete CSV reports.
Status: Solved. All payments are now included in CSV reports.

Issue: Because of a difference in the weekend funds settlement schedule between the new and old payment processor, funds didn't become available for disbursement on time.
Impact: In the two weeks following the switch, a small number of tournaments had to wait longer than usual to get their money.
Status: Solved. I have made some adjustments to our funds disbursement process (which occurs on the weekend) so that funds are disbursed predictably.
Issue: Javascript errors in a few users’ browsers prevent the payment form from being submitted at all.
Impact: The affected users can’t pay via FRED, unless they use a different browser.
Status: UNsolved. I’ve installed a new system to report javascript errors directly from users’ browsers to help diagnose this issue. In the meantime, if you or someone you know is affected, I recommend using a different browser. In particular, something other than Internet Explorer would be good.
If you have any questions about any of this, please don't hesitate to email me, or submit a helpdesk request at:
http://support.askfred.net
support@askfred.net


-Peet

Wednesday, December 14, 2011

Upcoming event rating and size searches are back

Hello tournament seekers-

Over a year ago, I had to disable the "event size" and "expected event rating" search criteria in FRED's upcoming event list. At long last, they are back. These are one of the more valuable features of FRED's event search, so I'm very happy to have them back, and I'm sure you will be too.

You'd be surprised how much work it was. Admittedly, most of it was "under the hood" work that will be useful for lots of other features, so it's not just this one feature that caused all the headache. It's kind of like building a car just to drive to the store for milk, but we'll be able to use that car for so much other stuff, I swear!

Anyhow, thanks everyone for your patience while these features were "on vacation".

-p

Warning: extreme geek-ness follows:

The problem:
These two filters were implemented in SQL, using some big joins and subselects on the preregistration table, and (brace yourself...) the USFA event classification chart expressed as an SQL view. Yeah, I admit, I did that one just to prove it could be done. This all worked fine while there were only a few tens of thousands of preregs in the db. But there are now over half a million. Whenever someone used one of these two criteria (esp the rating search), the database server would slow to a crawl and web page loads would time out, bringing the whole site to its knees for all users, not just the one searcher.

Ouch.


The Solution:
The current prereg count and predicted event rating are precalculated and saved in the event table so those filters are now just a simple where clause. But that's the simple and obvious part. The hard part is keeping them in sync in near-real-time as people preregister for the tournament. One way to do this would be to recalculate these values and update them as part of the same transaction as the user's preregistration. This is less than ideal because that could slow the response time to the user's prereg submission, just so we can accomplish some housekeeping tasks. Not cool.

Instead, FRED now has a task queue based publish-and-subscribe system for deferring such processing to a background worker process. Each time someone preregisters for a tournament, a small message is sent to the queue in fire-and-forget style. A worker process is continually pulling messages from the queue and acting on them, in this case updating the event table's prereg count and rating prediction fields. The whole process takes about 5-10 seconds, so the search criteria are correct very soon after the preregistration happens.

This pub-sub system will be super-useful for decoupling cause and effect in FRED's processes, and for deferring costly processing to the background, to preserve front end performance.

Whew!
-p

Sunday, August 7, 2011

Turns out it was a Chinese bot.

As it turns out, FRED's recent downtime was caused by an ill-behaved crawler run by a Chinese search engine. When this issue first arose, one of the first things I did was to look for excessive numbers of requests coming from single IPs, and this bot had been among the top 3 or 4 clients over the better part of that day. But because it made fewer requests than other crawlers such as Google, Bing, and Yahoo, I discounted it as a cause of the issue. After all, it had made fewer requests than those other well-behaved bots, which FRED has no problem serving.

However, I had retrospectively counted the Chinese requests over the whole day in aggregate. When I had a chance to watch the server processes escalate in real time, I saw that one IP address was making as many as 100 concurrent requests! It was the IP of that Chinese spider.

Adding a line to FRED's firewall config fixed the problem by blocking that IP (their whole class B subnet actually). So FRED's search rank in this chinese search engine will suffer, but I'm ok with that. :^)

-p

Thursday, August 4, 2011

Tweaked Apache config

Ok so I opted for the "tweak apache" option. I lowered the MaxClients setting and set MaxRequestsPerChild to 1000. Neither of those will directly stop the number of processes from  escalating, but they might cause different behavior to occur when the number of processes gets too high.

We'll see.

Wednesday, August 3, 2011

FRED is flapping

FRED's webserver has been down for a number of short (~3 minute) periods all day today. Beginning with a few incidents on friday and a few over the weekend, escalating to 24 such incidents today (so far). Apache simply spawns gradually more and more child processes until it exhausts memory on the server, at which point it fails to respond to a probe from FRED's auto-restart monitor. At that point the monitor restarts Apache, and all is well until the next time it fills the available memory.

I may wave a dead chicken over some Apache settings, but since this is suddenly happening with no config changes or significant change in traffic on the site, I'm tempted to just spin up a new EC2 instance and see if that helps. Maybe the current instance is just going bad in some impenetrable way?

-p

Thursday, April 21, 2011

Well *that* was painful....

FRED is back from today's EC2 outage. Amazon has three of the four availability zones in their us-east-1 region (virginia) datacenter operating. Unfortunately, FRED's database server was running in the one zone that is still sick. However, I was able to snapshot the database's EBS volume and create a new volume from that snapshot in one of the other zones, then fire up a new DB server instance in that zone, attach the new volume to that instance and get things rolling again.

While riding the bus home from work.

Yay for wifi on the bus. Boo for Amazon having a full day outage.

I was getting pretty proud of FRED's 99.99% 30 day uptime. Now it's all shot to hell: 97.58%!

Oh well. At least it wasn't on friday or saturday when everyone would be trying to download their preregs.

-p

EC2 outage

Hello faithful FRED users-  Today Amazon EC2's us-east-1 region is experiencing a serious, sustained outage in EBS connectivity and creation. FRED lives in us-east-1a, so his database server (whose files live in an EBS volume) became inaccessible at about 1am PDT.

All I can do is wait for EC2 to fix the issue. Very curious or geeky folks can follow their progress here: http://status.aws.amazon.com

Please accept my apologies for the FRED outage.

-Peet