Saturday, March 26, 2011

Another try at handling the accented characters in fencers' names

FRED is getting used more and more in Canada these days, which is very cool. However, it's brought a long-standing problem with FRED closer to the surface: Multibyte characters. 


FRED is written in PHP, a great language for quickly building complex web applications. However, PHP's support for multibyte encodings is less than awesome. Also, lots of the code in FRED was written in the early days of PHP4 when mb support was even worse. 


Lots of our Canadian friends and their clubs have names with accented characters represented in the UTF-8 multibyte character encoding. Their names are accepted into FRED just fine, but when they are transmitted back and forth with Fencing Time, the XML-related PHP functions FRED uses to read and write preregistration and results handle the mb characters badly. There's a bunch of info out there on the web as to how to best handle this problem, and I've tried lots of them, with mixed results. 


Today I deployed another attempted solution to write UTF-8 XML preregistration files for import into Fencing Time using the mb_convert_encoding() function to ensure that the stream output is valid UTF-8.


Given how many systems this data passes through (FRED's webserver, db server, your browser, your OS, Fencing Time, and back again...), it's hard to be sure everything works 100%, but so far this change has performed well in my tests. Hopefully the real world will behave similarly.

-P

No comments:

Post a Comment