huh!?

Dec 22 ’09

ruby 1.9 and unicode strings

I was playing around with some unicode strings to test string handling in Ruby 1.9, and realized a couple of things. Unicode string handling in 1.9 is fairly basic. For instance, the upcase method of a string, only knows how to handle letters from A to Z. I realized this when I was trying to run upcase on a string containing my name (Helge André Gudmundsen), and the resulting string was “HELGE ANDRé GUDMUNDSEN”.

This is by design, as different locales may handle case cases (pun intended) differently.

Stefan Lang has written a gem called Unicode Utils which handles unicode strings nicely. Using this gem I can call:

UnicodeUtils.upcase("Helge André Gudmundsen")

and have my name appear like “HELGE ANDRÉ GUDMUNDSEN”. The methods take an optional locale parameter to make sure specific locales are treated correctly. Installation is simple:

$ gem install unicode_utils

Highly recommended.