Tuesday, August 9, 2011

How I hacked an android game with Python and OCR!

Math Workout is a famous android game. In fact, it features in the top 5 of google listings for many math game + android related queries. The objective of the game is very very simple. It will fire simple math questions one after the other and you'll have to tap in the correct answer. Its a race against time among other users of the app in the world.

Here's how the app looks like and a few screenshots of questions:















As you can see, the game is fairly straigtforward. So its the time that you have to beat. A naive approach to that would be having a calculator or a computer near by and feeding in the questions to determine the answer and feeding it back to the phone. Totally manual!

Thats when the programming neurons of my brains started itching me that this could be automated and cheated by some mean. Come on think, think! So i sat on to solve this problem during my weekend and started thinking about ways i could attack this problem.

These are the steps that came into my mind in the first thought:

  1. Grab a screenshot of every question
  2. Crop the screenshot so that only the question is visible
  3. Run the cropped image through an OCR engine
  4. Parse the result and evaluate it
  5. Identify the co-ordinates of the resulting number and appropriately simulate touch events in the phone

Bummer! Every step looked a bit complex in itself at first sight. Then came along a bit of googling, and voila, i found the perfect tool that i needed to perform steps 1, 2 and 5. It is the monkeyrunner tool that comes along with the Android SDK. It opens up a Python API through which i can grab and crop screenshots, simulate touch events given an (x,y) co-ordinate. Exactly what i wanted.



Now, I have the cropped image that has the question in hand. Next step is to run it through an OCR engine. Again googling told me that ocrad is an useful OCR command line tool that was available as a part of the GNU project. I installed it and found that it cannot process png images. So i had to run the image through a converter before passing it to ocrad. This small piece of shell script helped me accomplish that:


To keep things simple, the shell script is invoked from python using os.popen(). Now, I have the actual expression as a python string. As you can see from the sample screenshots, few questions can be solved by a direct "eval" whereas others require some processing. Basic operations like addition, subtraction, multiplication and division can be solved using "eval". Whereas questions like "10% of 20", "square root of 9" needs some processing. Thats what this following if else block does:



Now that the expression is evaluated and we have the result in hand, all that's left is to go through the result character by character and simulate touch events in corresponding positions in the screen. I managed to identify the co-ordinates of each number in the screen by trial and error and hard coded those values within two functions named getx() and gety() which will take a character and return its corresponding x and y co-ordinates respectively, and the simulation happens. Here is the code snippet:



To orchestrate this whole process and play the game fully automatically other cosmetic additions like coping up with the frame rate of the phone and taking care of screenshot/ocr lags are to be considered. These are handled by minor if conditions and sleeps for very small amounts of time.

The end result is as you see in the below screenshot :-P



Here is a video of how the game looks like when it is being played by my script:


Though these steps seem like computationally a bit expensive, in practice i found them to be really fast. The script was able to answer approximately 2 questions per second (with an explicit sleep of 0.2 seconds between two questions - which leads to 2 questions every 0.8 seconds). A C/C++ program might run faster than this, but i stopped here as i have accomplished what i wanted. Overall it was a fun filled Sunday! :-)

Here is a link to the full source code of the automated script: auto_math_workout.py (you can find ocr.sh from the gist above in this page - rest of the source code is in the link)

Any comments/feedbacks are welcome! :-)

-Vignesh

9 comments:

  1. Now thanks to you the developer will have to resort to force users to enter a captcha after each answer..

    ReplyDelete
  2. Actually no. The problem with this app was that it has a white background with black text in a standard font. So this turned out to be the best case for the OCR resulting in 100% accuracy. If this is changed then the OCR will struggle and my script will start losing (as every wrong answer is penalised by the app)

    ReplyDelete
  3. What can I say? You code monkeys!!! :)

    ReplyDelete
  4. He he. Intelligent monkeys ;-)

    ReplyDelete
  5. Great job. Very impressive. Almost tempts me to buy an android phone (currently owning a dumb but real phone)

    Few doubts.

    Why have you used ocrad ? I have heard of some other OCR projects that directly work on PNG formats (like pytesser (from google), gocr, ImageMagick (iirc it has ocr support) etc.) But I _guess_ a big benefit of these modern libraries will be speed due to zero disk i/o. you can stream the in-memory image file to these libraries directly and get the string output, avoiding the intermediate disk storage.

    Where is the code to delete the temporary file for storing the screenshot image ?

    In the block 41-48, the 2nd is for % calculation and the third for eval-ing the expression. I understood that, but what is the first part ?

    Do we really need the block at line 54 ? For instance, if a question is "1.6 + 1.4" we are not going to answer it as "3.0" but just as 3, right ?

    What is the block at line 56 ?

    "getx and gety" - I believe these functions are hard-coded based on the formfactor of your device, Is it so ? Or will these work in a different screen size also (screen-agnostic like gotoxy () )?

    What is the exit condition of this program ? I am not able to follow how it terminates. Can you explain that a little ?


    To repeat again, amazing work; Congratulations :-)

    ReplyDelete
  6. > Great job. Very impressive. Almost tempts me to > buy an android phone (currently owning a dumb
    > but real phone)

    Thanks a lot. Get one soon. Being a linux kernel guy, an android phone is something you should definitely own! ;-)

    > Few doubts.
    >
    > Why have you used ocrad ? I have heard of some
    > other OCR projects that directly work on PNG
    > formats (like pytesser (from google), gocr,
    > ImageMagick (iirc it has ocr support) etc.) But
    > I _guess_ a big benefit of these modern
    > libraries will be speed due to zero disk i/o.
    > you can stream the in-memory image file to
    > these libraries directly and get the string
    > output, avoiding the intermediate disk storage.

    ocrad was simply very fast and very accurate. I got 100% success rate so far. Yeah, it doesn't involve any disk i/o. I tried using a pytesser, but it wasn't as fast as spawning ocrad. ImageMagick doesn't do OCR as far as i explored.


    And this is just a weekend script that is functionally working. Many of your below questions are from the perspective of generalizing it. I haven't focussed much on generalization. Anyways, i've tried to address all your questions below, but mostly the answer will be, yeah thats a hack and not a general form ;-)

    >
    > Where is the code to delete the temporary file
    > for storing the screenshot image ?

    The images are just replaced on the next run i am not deleting them.

    >
    > In the block 41-48, the 2nd is for %
    > calculation and the third for eval-ing the
    > expression. I understood that, but what is the
    > first part ?

    The first part is again a hack. Ocrad doesn't distinguish between normal characters and superscripts. So if the question is 4^2 (4 squared), ocrad recognizes it as 42. So i have considered texts that do not have any operators in it to be squares. So i am simply discarding the last digit (which is 2) and squaring the rest.

    >
    > Do we really need the block at line 54 ? For
    > instance, if a question is "1.6 + 1.4" we are
    > not going to answer it as "3.0" but just as 3,
    > right ?

    Nope. The game expects us to enter 3.0, hence that logic.

    >
    > What is the block at line 56 ?

    That is again a hack. When i use math.sqrt to find square root, it returns with a ".0" suffix (like 4.0) but the app expects us to return just 4. So i'm clipping off the ".0" with the regex.

    >
    > "getx and gety" - I believe these functions are
    > hard-coded based on the formfactor of your
    > device, Is it so ? Or will these work in a
    > different screen size also (screen-agnostic
    > like gotoxy () )?

    Yes. They are hard-coded to my screen. Although, it should be fairly straightforward to generalize it based on screen size.

    >
    > What is the exit condition of this program ? I
    > am not able to follow how it terminates. Can
    > you explain that a little ?

    For now the exit condition is Ctrl+C ;-) As you may have noticed the check on line 50, in order to cope up with the frame rate an screenshot capturing rate, if same question is repeated again, then i'm simply asking it to be ignored. So it is kind of like polling for the next question. So we cannot deterministically determine the exit condition based on the number of questions encountered so far. One way to exit the program is, the app shows a screen that has the text "complete", we can recognize this text and bail out of the loop.

    > To repeat again, amazing work; Congratulations :-)

    Thanks a lot again! It really means a lot when it comes from someone like you! :-) :-)

    ReplyDelete
  7. Excellent idea and well executed !!

    ReplyDelete
  8. Good work .. will try it on my android . ;)

    ReplyDelete
  9. Hi,
    Can someone put step by step instructions for performing the above act. i'm new 2 sdk. thanks in advance

    ReplyDelete