Ersin Er wrote a brief blog post about handling the Turkish language in Haskell. Because Turkish uses a character set that mostly looks familiar to Westerners, it is notorious for its ability to trip up the unwary programmer (see examples in PHP and PostgreSQL).

1

2

3

4

5

6

7

8 import Data.Text ( pack , unpack)

import Data.Text.ICU ( LocaleName ( Locale ), toLower )



main = do

let trLocale = Locale "tr-TR"

let upStr = "ÇIİĞÖŞÜ"

let lowStr = unpack $ toLower trLocale $ pack upStr

putStrLn ( "toLower " ++ upStr ++ " gives " ++ lowStr)



His example is quite nice, but we can write more compact version of his code using a few handy features of the text and text-icu packages:

In the text-icu library, we use the LocaleName type to describe the locale in which we want a function to operate. This type is an instance of the IsString class, so if we enable the OverloadedStrings language feature, we can write plain "tr-TR" to specify a Turkish locale.

The Text type is also an instance of the IsString class, so we can write a literal string like "foo" and the compiler will infer the correct type for it.

The Data.Text.IO module contains functions for performing locale-sensitive I/O using Text values.

This combination of features can let us write a less cluttered program, following the dictum that simple things should be simple:

1

2

3

4

5

6

7

8 {-# LANGUAGE OverloadedStrings #-}

import Data.Text.IO as T

import Data.Text.ICU as T ( toLower )



main = do

let upper = "ÇIİĞÖŞÜ"

lower = T.toLower "tr-TR" upper

mapM_ T.putStr [ "toLower " , upper, " gives " , lower, "

" ]



I've intentionally kept the number of lines the same to preserve clarity, but there are a few advantages to the rewrite:

Less clutter, more speed : we don't need to explicitly pack or unpack Text values to or from String values.

Performance : we're not performing I/O on String values. This would be a big deal if we were writing a real application: I/O with Text is much faster than with String .

Putting inference to work: the compiler correctly infers the type of "tr-TR" to be a LocaleName , and of the strings at the end to be Text , so we don't need to be so explicit.

Oh, and we still give the right answer (look carefully at upper and lower case dotted and dotless "I"):

toLower Ã‡IÄ°ÄžÃ–ÅžÃœ gives Ã§Ä±iÄŸÃ¶ÅŸÃ¼

The full documentation to the text and text-icu libraries is a little difficult to read on Hackage (in fact, the text-icu API docs are completely missing), so here are links: