Well, there is a mobile game name 747 with really simple rules. A player is paired up with another player on the game network and both the players are shown a simple math question to be solved. Along with question two options are shown one of which is the correct answer.

Player who submits the correct answer before other player, wins the game. Sounds good? There’s more. In this entire game real money is involved.

Players have to place bet amount and play the math game. Players placing same amount of bet are paired and the game is conducted. Winner gets 80% of total bet amount and game company keeps 20% as commission. 80% is huge. Within a month of release the 747 got 100k+ downloads.

So we can solve easy math fast and win lots of money 🤑

Sounds good so far?

After my convocation I was having time of my life at home with all the good food and everything. Having this knack of automating stuff and making machine work for me, I thought of developing a bot for playing this game 😈.

To automate playing this game, I had to understand how exactly its working. With working I mean, how is it communicating with server, how is the time taken being calculated, what is the format of question received etc..

I decided to look into the network communication that was done by the game. For that I downloaded Packet Capture app. It can sniff all the incoming and outgoing packets of any app installed. I found some endpoints being hit by the app.

I was curious to find how exactly the client(Installed app) is written. I extracted .apk file from installed app decompiled it using Android Apk decompiler.

Oh boy ! I got entire code-base of the application. I was all the way into multiple java files. I suck at writing java but I could make out almost all of the code-base. Maybe because I knew how app works. Nothing was obfuscated and I could understand implementation of the game perfectly.

I understood that there are two different servers being used by app company.

HTTP server — for all the information about played games and players

WSS server — for creating game rooms and conducting games

In the code-base I found all the API endpoints and methods of APIs . After understanding parameters of requests and flow of requests made, I wrote an exact same client using python. It could login and play games 👻

I found that the question received is not just a simple string but it is an image. So I could not calculate answer of math question. I kept it aside to add in next iteration of development.

I also found out that the 747 heavily relies on client side for calculating time taken by player to submit answer. I was able to get hold of it, I could set whatever time I want as time-taken 👻

Now I had power of submitting answer with time taken as low as 0 seconds 😬, but I couldn’t answer correctly. Now I had to get some more power of calculating the correct answer. As I knew that the question is an Image and once we submit answer in response of result we get the string of math that was in question. I thought of solving math as combination of two tasks.

Extract math expression out of Image Solved math expression

Second part is so trivial, it’s just using eval(str_math_expression) .

First part is classical Optical character recognition task. Initially I used pytesseract, but as usual it failed spectacularly 😂. I thought of developing my own model and started research.

With motivation from following I started work.(no need to go through these, but I would recommend the paper if you’re fan of ML and stuff ).

Question images were of fixed size. They were consisting of multiple random fonts, I thought of breaking image into parts and recognizing each part as a single character. I generated synthetic data using all the 2550 google fonts and trained a model on it. Retrained the model on actual image. Following steps explain it better.

Solving Maths of the game using CNN

Server sends image without background and white font color, convert it to black font and white background. Image size is 828x142 and there are only two possibilities either X=(a+b+c=?) or Y=(a+b=?). For X=(a+b+c=?) crop image into 7 different images. one for each character in image, similarly in 5 images for Y=(a+b=?) of size 92x142. Server sends question string in response when result is sent, so using that string, images cropped in above step can be labeled. Now This is a labeled data-set. For generating actual data-set we can play multiple games with bet-amount 0 rupees . Train a CNN based recognition model for recognizing the character in image. For training, first download all google fonts and generate synthetic images of all 12 characters [0–9] and [+-]. Train model on synthetic data, later train model on actual data-set, generated in step 4. For getting the expression, perform steps 1–3 and for each cropped image predict class using model trained in step 6. Create a string using all the predicted classes. Use eval function to get ans.There are two answers as input image can be X=(a+b+c=?) and Y=(a+b=?).

I reported this to game developers.

In doing this I was helped in one way or other by some of friends Shoeb, Avhirup, Saurabh, Divyani. Huge thanks to them :)

If you have any questions, ideas to discuss please feel free to ping me by any means. If you liked this, kindly share it with others and read my other posts.

— moghya

“A wise man can learn more from a foolish question than a fool can learn from a wise answer”.

— Raymond Red` Reddington

Thanks for reading.

Happy Hacking.