Around two years ago, I checked out the pluggable transports system of Tor. This mechanism can be used to circumvent network censoring done by network administrators / authoritarian governments which deny their citizens to certains parts of the internet. In most censoring situations there will be a blacklist of website, e.g. a list that specifies you're not allowed to visit wikipedia.org.

In summary; there are multiple approaches available, from creating completly random encrypted datastreams (obfs), just using a VPN to hiding traffic in TLS packets going to common gateways (meek). However, the concept never changed.

Illustrated in the figure below, the idea is that you have a gateway running on a network without restrictions, so that the gateway can request and retrieve the information for you. The important thing to note here, is that the agent who controls the network inspects every bit of information being passed on the client. China is known to apply ever more agressive packet-checking techniques to block these kind of circumvention techniques [1].

This same concept is also being applied if you use a Virtual Private Network (VPN) or Tor. This concept will keep working, until the network censorship becomes so restrictive, you can no longer access your gateway. Goverments working on banning VPN connections for it's citizens and already banned WhatsApp [2, 3]. An important reason as of why they ban these kind of protocols, is because it's encrypted. The government can see that these protocols are being used, but cannot see what information is being exchanged.

Lately, news outlets also stated that western goverments are thinking of banning encryption. Considering this, the situation might look grim for censorship circumvention. Without encryption, it would become very hard to reach the gateway to the open internet, as these communication streams rely on encryption to hide your actual query for wikipedia.org. VPN, Tor, Obfs and Meek all rely on this notion of encryption to make the circumvention work.

Introduction to Format-Transforming Encryption

There is still some hope though, an interesting concept is using Format-Transforming Encryption (FTE)[4]. FTE is a cryptographic scheme which bases it's ciphertext (the result of encryption) on a regular expression. It does this by unranking the ciphertext with the help of an Deterministic finite automaton (DFA) created from the regular expression.

Example

For example, you feed FTE some plaintext, a key to encrypt it with and a regular expression:

Plaintext: "Hello open internet!"

Key: 0xff *(16)

Regular Expression: ^(a|b|c|d|e|f)+$

This results in:

61616463626564646461636262656161656162656363646363616564616261666161626365666461636366636161636265616664626163636561656366646565636661646162646364646166666265656365626362626164636662646463626266666161646562626666636363626365636563636263636362626566636365616561636464646663646561626364616562626164656562646663666364636563636161616165616266616162636261636366656664666463656365636461616265616261666566616664666563656561626665646263656664626562636465666162616166666565636562636363646164616465636663666665626366636466

Which, when converted to ASCII looks just like a normal string of text:

aadcbedddacbbeaaeabeccdccaedabafaabcefdaccfcaacbeafdbacceaecfdeecfadabdcddaffbeecebcbbadcfbddcbbffaadebbffcccbcececcbcccbbefcceaeacdddfcdeabcdaebbadeebdfcfcdceccaaaaeabfaabcbaccfefdfdcececdaabeabafefafdfeceeabfedbcefdbebcdefabaaffeecebcccdadadecfcffebcfcdf

(created by using fte_example.py available in /tools/ on Gitlab).

As you can see, the data we can sent over the wire doesn't make it obvious we used encryption. A unknowing person inspecting this data, just thinks it's someone who smashed their head onto the keyboard. It does not look random at all, as we could perfectly interpret it as ASCII.

With this concept, it becomes very easy to mimick protocols such as HTTP, SSH or basically any protocol you can express in a regular expression! This way, a network agent inspecting every network packet cannot determine the data is encrypted or not, making it very hard to block as protocols will be mis-matched [4]. Important to note that steganography can achieve the same property, but not with such ease.

Using a Source engine gameserver as a gateway

We've discussed how censorship circumvention works, how important encryption is and how FTE can help us with hiding the encryption. Now it's time to see it in practice with a silly example!

Imagine this: You're living in dictotarial country and want to access some Wikipedia articles from wikipedia.org. If you surf to the website, your browser will tell you the website isn't available. VPN's are no longer allowed and you cannot use WhatsApp anymore.

But, now you can query for Wikipedia articles via your Team Fortress 2 gameserver, meanwhile the government thinks you're playing a game!

Say welcome to FTE over Source engine A2S! This is a litte example project on how FTE can be applied to any protocol to circumvent censorship! Before we go into the details, first you need to understand some basics of A2S (the protocol used by Source Engine to query for gameserver information). The following figure illustrates the information flow:

A2S is a protocol over UDP (which is less reliable, but faster than TCP), which is common to be used for games. A2S can be used to query gameservers for the playerlist, general info about the server and so on [7]. With our proof of concept, we replaced the A2S_RULES request with our hidden gateway (as seen in the figure above). This hidden gateway responds to Wikipedia queries, and responds with the payload hidden in the A2S_RULES repsonse, which looks like any ordinary rule list for a gameserver.

Example of running fte-over-a2s

This is an running example of the code I wrote. First server.py needs to be started to listen for A2S requests. After this, I start client.py and pass an argument Self-Censorship , as I want to read that Wikipedia article.

You can see the server responds with the article which was retrieved via the hidden gateway. The gameserver listened to our request to retrieve the Self-Censorship Wikipedia article and now sent the article back to us.

Looking at the server side, you can see the server listens and checks for the special A2S_RULES request, which is our hidden gateway on the server. If the client requests this, we sent back a response which looks like this:

With a quick glance it looks like it's just a normal A2S_RULES , because it sort of is! It satifies the original protocol, which looks like this:

But it also hosts our encoded, encrypted FTE response that our client can use. To really check our encrypted gateway satifies the A2S protocol, we can also check with Valve Source Query, which is a library meant for quering source gameservers. Querying for both players and rules yields this result:

As you can see, the player list request is relayed to our real gameserver, and our rulelist contains our special payload!

Choosing the right regular expression

An important aspect of FTE is the unranking/ranking based on the regular expression. The regular expression should match the protocol as sound as possible to prevent detection, but sometimes the protocol is too restrictive to work efficiently. If you apply a regular expression of ^(a|b)+$ , the amount of ciphertext that you can encode would be very limited for a given slice (essentially just a block size of ciphertext). There needs to be an bijective relation between the ciphertext and alphabet for FTE work. One of the reasons of why I have chosen A2S_RULES instead of A2S_PLAYERS is because the rules_list allows for a much bigger potential alphabet. Players should be limited to only a maximum of 32 entries as servers cannot handle more players simultaneously.

Below is an example of a regular expression which can be used for A2S_RULES . Note that the alphabet is very limited (there are only 15 key names available), thus this means that you should keep your payload small or increase the slice size. Note that this regular expression only generates the RULES_LIST (which is x00 <key> x00 <value> x00 ). other protocol semantics are being processed by the code, such as RULES_COUNT , which counts the amount of key-value entries in the RULES_LIST .

^(\x00(sv_accelerate|sv_airaccelerate|sv_alltalk|sv_bounce|sv_cheats|sv_contact|sv_footsteps|sv_friction|sv_gravity|sv_maxspeed|sv_maxusrcmdprocessticks|sv_noclipaccelerate|sv_noclipspeed|sv_password|sv_pausable)\x00[0-9]{1,3})+$

I've tried different regular expressions to understand it a bit better. All of those are available in the source code. Its the balance between a bigger alphabet (so better mapping between the ciphertext and DFA) and being stealthy enough to hide that fact you're using a covert channel.

An example of a regular expression where you can notice something strange is going on.

An example of a regular expression which looks pretty legitimate.

Try it yourself!

The source code is available over at my Gitlab. It only uses Python 2.7 with some dependencies, so setting up wouldn't be too big of a hassle.

Can you imagine, running a piece of spy-tech like software running on your computer? Pretty cool if you ask me!

Conclusion

We've discussed how censorship circumvention works, how important encryption is and how FTE can help us with hiding the encryption. We even showed a working example on how to use FTE in combination with a SourceEngine gameserver to query Wikipedia articles!

Still there are some challenges ahead for FTE and any other censorship circumvention techniques out there, as there is no awnser yet to how to exchange keys without being detected by the censoring party. Tor solves this by setting up BridgeDB [8], but is not very cost-effective on a large scale.

Ofcourse the example of FTE and SourceEngine gameserver queries provided is not very practical, as the communication is over UDP and has very limited packet size. If you want to use FTE today, I'd recommend checking out Tor or FTE-Proxy [6], which hides it's encrypted data in normal everyday HTTP requests.

Sources

Project example fte over a2s: gitlab fte-over-a2s