What seems like a casual conversation to Lebanese humans is actually a nightmare for artificial intelligence, no matter how advanced, to understand. Even advanced natural language processing (NLP) software need some basic rules and heuristics to work. But the above paragraph casually breaks any rule you can come up with. But for kick’s sake, let’s try to see what’s going on.

This paragraph contains:

Transliterated Lebanese colloquialism and expressions

Informal conventions for turning Arabic Letters into latin numerals (3=ع)

Seamless mixing in of English and French phrases (and the odd Italian Ciao)

Modification of words (in all languages) to emphasize emotion (mannnnn, anywayzzz, naaaar)

Word shortening (ltr)

Reference to foreign popular culture (James Bond)

Swearing that is used as a linguistic filler (kess ekhta)

The use of “Yalla” which I would classify as a challenge in its own right with its many, many, possible meanings.

You can already begin to get a sense of the kind of hurdles a robot can run into to classify that short comment.

If you’re constantly cursing your iphone for “auto-correcting” your “walla” into “walls” and your “shu” to “shy”, you get an idea of how stupid robots are when it comes to transliterated Arabic.

But What About Machine Learning?

But ya Mustapha, isn’t the whole point of machine learning — the technology where AIs learn from millions and millions of datasets instead of rules — meant to solve exactly that kind of problems?

The thing is, there will never be enough datasets of “transliterated lebanese modified colloquialism peppered with foreign languages”. A machine needs millions and millions of datasets with a certain consistency to be of any use. But the subset of users is so small (Lebanese on the Internet) and the variations are so large between dialects (Baddi vs Baddeh vs Biddy), educational level, choice of foreign languages to mix in, not to mention the constant additions of new popular culture references(Lebanese Political developments, memes, celebrities, etc) that this, in my opinion is an impossible problem to solve even with a combination of smart algorithms and machine learning.

So, dear Facebook, Rou7ou balltou el ba7er…