An issue was recently raised on the ConTabs project that sent me down a bit of a localisation rabbit hole. You see, I’d written a load of conformance tests that included example output. What I hadn’t factored in was how many of these were dependent on my locale. These would potentially fail on computers with different locales. In today’s post, I’d like to explore exactly what went wrong and how we put it right.

What happened?

A couple of weeks ago, I was alerted to the fact that a conformance test in the ConTabs project was failing for one user. These tests use test data to run the whole end-to-end process of table generation. They compare the table generated with a hard-coded example. For instance, we have tests that say that a simple table should look like this:

+--------------+-----------+----------------+ | StringColumn | IntColumn | CurrencyColumn | +--------------+-----------+----------------+ | AAAA | 999 | 19.95 | +--------------+-----------+----------------+ 1 2 3 4 5 +--------------+-----------+----------------+ | StringColumn | IntColumn | CurrencyColumn | +--------------+-----------+----------------+ | AAAA | 999 | 19.95 | +--------------+-----------+----------------+

Looks harmless right? What could possibly differ between computers? Sure enough, this test has been passing merrily for over a year now on my laptop, in my CI system, and on the computers of the 7 other people who have contributed to ConTabs in that time-frame.

But it all went wrong when we were joined by a Slovak developer, Marek. Slovakia, I discovered, is one of the many countries where they don’t use a full stop (“period” to my American friends) as the decimal separator.

Since ConTabs uses the default string formatting methods in .NET, we get localisation for free. This means that ConTabs running on a Slovak (or Peruvian, or Danish…) computer renders the number as “19,95”. This seems like the right behaviour, but our tests fail.

When our program behaves correctly, but the tests fail, the tests are clearly wrong and need to be reviewed.

Sidebar: Why even write these tests?

I know that some people will have read the above and decided that this is the “wrong way” to go about testing. Conformance testing is inherently brittle and can be very difficult to make useful, so many projects simply don’t employ it. Unit testing is typically lots easier, so people, on the whole, will prefer them.

For the ConTabs project, we use conformance testing (alongside unit testing) for the following reasons:

The nature of the output of ConTabs (always a string) and its predictable behaviour make it very easy to write conformance tests. For us, a conformance test can actually be the easiest way to express some requirements. For instance, we can test that when we apply right alignment to a column, the decimal separators lines up properly. We want to avoid issues where unit tests pass, but the sum of the parts isn’t right. See this post for an example of that… So that the tests can function as spec and documentation. This is a bit more abstract, but having examples of the expected output in the tests alongside the code used to produce them can be a really helpful way to see what is (or should be!) going on.

So there we go. Whilst this sort of testing isn’t often a good fit, it has mostly worked pretty well for the ConTabs project. Anyway, on with the show…

How we fixed it (round 1)

Having identified that we needed to account for different ways of separating integers from their fractional friends, we began to think about ways we could do this. On one hand, we could make the exemplar variable and have the decimal separator for the current locale injected into it. This didn’t quite feel right to me. The examples should be static – particularly if they are to fulfil their role as documentation.

The other approach (and the one we opted for) is to explicitly pin the culture that the thread should be using. This way we can ensure a certain level of portability by taking a well-known locale with us wherever we are in the world. Something like this:

Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-GB"); 1 Thread . CurrentThread . CurrentCulture = CultureInfo . CreateSpecificCulture ( "en-GB" ) ;

And while we’re at it, let’s use this same method to parametrise our unit tests:

[TestCase("en-GB", "£1.91")] [TestCase("sk" , "£1,91")] public void CurrencyFieldCanBeFormatted(string culture, string expected) { // Arrange Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture(culture); var tableObj = Table<TestDataType>.Create(); // Act tableObj.Columns[2].FormatString = "£0.00"; var val = tableObj.Columns[2].StringValForCol(1.911M); // Assert val.ShouldBe(expected); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [ TestCase ( "en-GB" , "£1.91" ) ] [ TestCase ( "sk" , "£1,91" ) ] public void CurrencyFieldCanBeFormatted ( string culture , string expected ) { // Arrange Thread . CurrentThread . CurrentCulture = CultureInfo . CreateSpecificCulture ( culture ) ; var tableObj = Table < TestDataType > . Create ( ) ; // Act tableObj . Columns [ 2 ] . FormatString = "£0.00" ; var val = tableObj . Columns [ 2 ] . StringValForCol ( 1.911M ) ; // Assert val . ShouldBe ( expected ) ; }

Nice! A quick fix. Time for a beer.

What about dates?

Ah yes… I should have realised there’d be more to this.

I had (partly) anticipated the different date formats in use around the world by a) suppressing the date column throughout the conformance tests and b) providing date styles explicitly in the unit tests. Since we’re now more locale-aware, however, it feels like we should probably test that dates are being localised correctly. Thus, my Slovak collaborator sensibly added a parametrised test for a few date localisations too.

Job done. We’ve solved localisation. I merged the changes and we moved on with our lives.

Now it all gets weird…

Well, we would have moved on, if it weren’t for the fact that the test broke a few days later. Specifically, the test case using the Slovak localisation started failing because some machines were adding spaces that hadn’t been there previously. For instance, we’d been expecting dates to be formatted like:

31.1.2018 1 31.1.2018

But were now seeing them get formatted as:

31. 1. 2018 1 31. 1. 2018

This was a pretty unpleasant surprise. The tests had been working on my laptop, but now weren’t. They had passed in the CI system and on the computer of my Slovak collaborator. But now they were failing.

Baffled, I posted a question to Stackoverflow. Thanks to an explanation by a user called Jimi, I realised that locales aren’t something we can rely on remaining fixed. A pertinent quote:

A Windows update may change the default pattern for any of the Locales (without explicit notification).

Given that the tests had been working only days previously, it seems as though we’d happened to have experienced a change in the Slovak locale. In a sense, the timing was fortuitous. If the change happened much later, the tests would not have bee fresh in our minds and the sudden break would have been even more confusing.

So what to do?

OK, so we know that hard-coding the expected output of anything localised presents a risk. We have no control over how or when a locale might be changed, or even just be different between two computers. So how do we test that our program is handling localisation correctly?

Our solution is still being worked on, but in the process, I’ve found it helpful to consider the purpose of each of our tests: what behaviour is it that we’re trying to assert?

In the case of ConTabs, I think the tests affected by localisation should fall into one of two categories:

1. Conformance tests, which show the system as a whole is behaving as expected, as described explicitly by an example.

In these cases, we have to choose a locale and accept the risk. I hypothesise that some locales will be less volatile than others and that these should be preferred for this purpose. I have no proof for this, but my gut tells me that locales such as en-GB or fr-FR will be less likely to change than those of a less well-known locale such as sk-SK.

2. Unit tests, that assert that the user’s locale is applied appropriately.

It’s important that our program respects a user’s localisation preferences, but we don’t actually care what those settings are. In these cases, all we really care about is that the method that renders the contents of a cell outputs something that matches what we get when we apply the localisation. In other words, we can compare our method with one from .NET itself and avoid having to hard-code any of the expected values. Something like this:

// arrange var refDate = new DateTime(2018, 01, 31); var culture = CultureInfo.CreateSpecificCulture(cultureName); Thread.CurrentThread.CurrentCulture = culture; var dateString = refDate.ToString("d",culture); // act var tableObj = Table<TestDataType>.Create(); tableObj.Columns[3].FormatString = "d"; var val = tableObj.Columns[3].StringValForCol(refDate); // assert val.ShouldBe(dateString); 1 2 3 4 5 6 7 8 9 10 11 12 13 // arrange var refDate = new DateTime ( 2018 , 01 , 31 ) ; var culture = CultureInfo . CreateSpecificCulture ( cultureName ) ; Thread . CurrentThread . CurrentCulture = culture ; var dateString = refDate . ToString ( "d" , culture ) ; // act var tableObj = Table < TestDataType > . Create ( ) ; tableObj . Columns [ 3 ] . FormatString = "d" ; var val = tableObj . Columns [ 3 ] . StringValForCol ( refDate ) ; // assert val . ShouldBe ( dateString ) ;

That should do it!

Wrapping up

This week I’ve learnt that localisation is a tricky beast. On the one hand, it’s not safe to ignore the fact that things get localised. On the other hand, it’s also not a great idea to assume that locales are static. Locales should be treated as data. This means testing that they can be handled whilst avoiding (as much as possible) testing their contents.

The process of dealing with this has also forced me to think more carefully about the purpose of the tests we write. What are we testing? Would the failure of this test lead to us being able to fix something? The answer to the latter of these questions is, I feel, highly significant. Often I feel myself tending towards writing tests that cover every case as things are at the moment, but it’s important to give consideration for things changing that are beyond one’s responsibility. In other words, in hard-coding the output of methods subject to potentially volatile localisation data we were introducing an external dependency to our unit tests.

Of course, it’s possible that none of this would have happened had I been developer privileged by being a British speaker of English. My anglocentrism allowed me to quietly neglect the fact that other people do things differently. In software development, we primarily do business in English. I mean, think of the names of the built-in methods of your language of choice – I bet they’re English words. This makes it really easy to just forget that the rest of the world exists, which is a bit shameful in my opinion.

So I’m really glad that the international nature of open source projects has introduced me to Marek. He ran the tests on a computer set up with a locale that was sufficiently different from mine that we flushed out these issues. And whilst we’re still working out the full details of our final fix, I think that’s a pretty good outcome. Thanks, Marek!