The partial lifting of international sanctions, earlier this year, on doing business with Iran has dramatically increased the importance of and demand for translations to and from Persian. As the Middle East’s second-largest economy, and one that has suffered years of major underinvestment, Iran holds huge promise for many businesses.

Some sanctions are still in place, and the actual opening up of Iran’s economy to foreign trade and investment has been much slower than initially expected. Yet Iran is being reconnected to the global financial system, including to the Swift international payments system, and like with any market, companies that establish themselves early can look to benefit from a massive first-mover advantage. As always, language holds the key to unlocking the market’s potential.

Farsi or Persian?

There is a lot of debate around the “correct” naming of the language in English: should it be Persian or its endonym Farsi? The original name can be said to be Persian, reflecting the Persian word pars (someone who is Persian).

The name Farsi came into use in Arabic documents after the Muslim conquest of Persia in 651 AD due to a lack of the sound /p/ in Arabic. Thus, parsi became farsi. For political reasons, it was the latter form, farsi, that English-speaking diplomats adopted on March 21, 1935, to refer to the Persian language as it is spoken in modern Iran, which until that date was called Persia.

Today, there is a strong desire among Persian linguists for the English name of the language to be Persian. This has not been reflected in the localization world, where the Persian language is still officially called Farsi and is identified by the code fa-IR (farsi-IRAN).

Main facts about Persian

Persian and its dialects are spoken by approximately 110 million speakers worldwide. Farsi is frequently said to be the official language of Iran, Afghanistan and Tajikistan; however, this is statement is not entirely correct. The Persian variety native to (and official in) Afghanistan is called Dari and has the status of a distinct language. Admittedly, the justification of Dari’s independent language status may have political sides to it, since Dari and Farsi are mutually intelligible in both spoken and written form. Still, for localization purposes, Farsi and Dari count as two separate languages and necessitate two different translation teams.

Where Persian gets spoken

Similarly, the Persian variety native to (and official in) Tajikistan is Tajiki — again, a language that is very close to Farsi and Dari in its grammar and vocabulary. But there is a crucial difference — Tajiki uses the Cyrillic script, a heritage from its Soviet Union times. As in many post-Soviet states, there are debates about switching away from the Russian alphabet — either to Latin (bringing the country closer to the Western world) or to the Perso-Arabic alphabet (reestablishing the historically-close relationship to Iran).

Persian script is not Arabic

Until the ninth century, Persians used the Pahlavi script to write Persian. With the Muslim conquest of Persia and the subsequent Arabic influence on the language, Persian adopted the Arabic alphabet, adding a few more letters to represent Persian sounds that Arabic does not have (پ [p], چ [t͡ʃ], ژ [ʒ], and گ [g]). Another difference between Persian and Arabic scripts is that some letters and numbers are written slightly differently:

Persian script Arabic script Latin script ی ي i ک ك k ۴ ٤ 4 ۵ ٥ 5 ۶ ٦ 6

Until a few years ago, translation tools and checks only supported the Arabic script. This led to many false positive errors being reported, since the Arabic-based QA checks could not recognize words that contained any of the Persian-specific characters. Thousands of strings were marked as containing invalid characters. This has since changed, and today many tools and checks support the Persian script. Still, a lot of the Persian terminology remains encoded in Arabic script in official databases.

Perso-Arabic letters (with some exceptions) have up to four forms: one when standing as an isolated letter, one when at the beginning of the word, one for the middle of the word, and one for word-final positioning.

Sound Isolated At the beginning of the word In the middle of the word At the end of the word [h] ه ﻫ ﻬ ﻪ

One could speculate that the reason for the existence of all of these forms is that the Perso-Arabic script is solely written cursively. Even in typed text, the letters are joined with each other to form an uninterrupted line.

The BBC online news site’s Persian version (http://www.bbc.com/persian)

Computers nowadays do this automatically by auto-connecting the correct form of the letters. The automatic joining of letters in a continuous line is not always desired, however. In some prefixed words, the final letter of the prefix should not be connected to the first letter of the word.

The solution then is to use a nonprinting character called ZWNJ (zero-width non-joiner), which allows for an invisible break to be inserted between two letters, thus disabling the automatic joining. Translation tools used for Persian should necessarily support the ZWNJ character.

Further, before using any glossary, you should check to make sure that it does not contain terms that have been encoded without the ZWNJ character, which was not widely supported until recently.

Wrong-way drivers

The Persian script is written from right to left, thus opposite to the writing direction of English. This brings a lot of challenges for localizers. Firstly, users of a left-to-right script also expect the arrangement of software buttons, menu items, etc., to be left-to-right (think about your “HOME” button in Word being in the top right corner).

This rearrangement does not always take place. In turn, this leads to insecurity among translators on how to translate help text such as “Click the button on the top right.” The button may be on the left (if properly aligned), or on the right (left as in the English copy). The solution one typically settles on is to translate “Click the button at the top.”

Another adverse impact of the direction clash is caused by English words in Persian text. Usually these are unlocalizable product names and trademarks, whose presence often forces the reader to switch directions.

Such wrong-way words affect the user experience, especially for Persian users who are not fluent in English and expect to see branding names written solely in Persian script. In short, it is a good idea to avow mixing the two scripts.

Does it actually pay off to localize into Persian?

In the early 2000’s, the localization effort for Persian was quite limited. Whatever localized products there were, they usually suffered from various issues (often caused by the application of Arabic-supporting tools for Persian localization). As a result, Persian users avoided using localized products. This changed around the turn of the decade, when large companies such as Google, Microsoft and Nokia went for a larger-scale localization into Persian, leading to a remarkable improvement of glossaries, style guides, tools, and know-how.

Nowadays, Persian users favor localized products, webpages and software. The trend is visible even among young users fluent in English, who still have a clear preference for using localized versions.