Dear Japan,
I like my Japanese cow-orkers. I really do. Of course, I've never had any "face-time" with them which might explain this lack of animosity. But when I need to work with them I either have to be up at 3am (and sober enough to function) or I might as well send snail mail. One round-trip communication takes three days.
I needed a database in Shift-JIS, the most common Japanese encoding. It's crap compared to Unicode... hell, it's crap compared to anycode, it being a freaky Microsoft hack to enforce their idea of codepages and still work with previous Japanese standards like JIS X 0201 and 0208. Wacky stuff if you're one of the couple dozen codepage supergeeks. I know I'm lame. Haz a cat. In fact, take two; they're small.
So what the hell do I want with a Shift-JIS DB when its suckage quotient is so high? It seems we have a bug, one that I not only pointed out about eight fucking years ago but which also should've been dealt with by the time $OurBigApp supported Unicode.
ATTENTION AMERICAN DATABASE-PROGRAMMING INFIDELS: There is a huge fucking difference between "character" and "byte". Not for you normally, but for most of the rest of the fucking world. One byte per character works fine for English. ASCII is also sufficient for Latin, Swahili and Hawaiian. It is rumoured that there are other languages, many of which have more characters than can be addressed with a single byte.
It turns out a field length of 5,000 characters isn't actually 5,000 characters but 5,000 bytes. For the Japanese this means that they can only squeeze in around 2,200 characters, not quite enough for what this field is designed to contain. But only in UTF-8. In Shift-JIS and UTF-16 with their fucking surrogate pairs the number becomes even more grim -- around 1600 characters.
So why didn't I just install a fucking Shift-JIS database on my own if I'm such a Mr Smarty-Pants? Setting up the DB is easy but our installer which adds and shapes the schema sucks. It's overly complicated (more than 90 screens of text and clicky goodness). That alone isn't a problem. I don't speak or read Chinese and I can still not only install but administer Windows in Chinese, both Traditional and Standard. Microsoft sucks but at least their suckage is uniform across languages. Same dialogs, same layout, same buttons, same icons. Not so $OurBigApp. The Japanese installer is nothing like the English which is nothing like the German, so I can't even run a side-by-side installation and select the correct radio buttons or fill in the proper fields.
I can read some Japanese but with so little chance to use the language I've lost much of it over the past 12 years. A few smrt peepul might think, "Duh! Just select the dialog text, copy and then paste it into Teh Ghugel Translator!" Yeah, I thought of that. Our programmers had different ideas:
window.properties.AllowSelectText=0.Fuckwits.
Labels: codepage, database, I18N, Japanese, Microsoft, Shift-JIS, Unicode
1 non-"17"s have already commented. Click here and be the next.