Cropping the detected text image

Vision can detect many texts on the screen, but for simplicity let’s deal with the biggest image, since that’s the one that usually contains the preferred expression. To get the frame for the cropped image, we multiple normalisedRect ’s coordinates with the captured image size.

The croppedImage should contain text — you can use Quick Look in Xcode to check by selecting the croppedImage variable in the bottom expression panel, then by clicking on the eye icon to open Quick Look for the image.

Text recognition with an OCR framework

I personally like solutions that work well with Swift, so I tried SwiftOCR first. It’s written purely in Swift and the API is easier to get started. . The benchmark in the README even states that it performs even better than Tesseract.

For some reason, SwiftOCR didn’t work well. It might because I used the font “Lato” in my initial Sketch. SwiftOCR allows custom training for new fonts, but because I was lazy, I tried Tesseract.

Source: Internet

Tesseract “is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006”.

The iOS port is open sourced on GitHub and has CocoaPods support. Simply pod install ‘TesseractOCRiOS’ in your Podfile and you’re good to go.

As explained in the README and in the TestsProject, tessdata is needed — it contains language information to make Tesseract work. Without this tessdata , the framework TesseractOCR will give you a warning about missing TESSDATA_PREFIX .

Strict requirement on language files existing in a referenced “tessdata” folder.

Download the tessdata here, and add it as a reference to your Xcode project. The color blue indicates that this folder has been added as a reference.

You may also need to add libstdc++.dylib and CoreImage.framework to your target. Additionally, this library isn’t compatible with Bitcode in Xcode, so you’ll need to disable Bitcode in your target settings.

Tesseract

Using Tesseract is easy. Remember to import TesseractOCR , not TesseractOCRiOS :

g8_blackAndWhite is a convenient filter to increase the contrast of the image for easy detection. For pageSegmentationMode , I use singleBlock , as our number should be in a uniform block of text (you can also try singleLine mode). Lastly, we set engineMode to tesseractCubeCombined , which is the most accurate, but it could take some time. You can set to tesseractOnly or cubeOnly to compromise for speed. In my test it recognizes well for handwritten texts and texts I put on the screen using popular fonts like Arial, Lato, and Helvetica.

If you need to support more languages and fonts, then head over to the Training Tesseract wiki page to learn more about it. I also hope that Apple provides proper OCR model to use together with Vision and Core ML, since text recognition is a popular task for mobile apps.

With the above captured image, Tesseract should be able to recognize the string “(1+2)*3”. Let’s validate it into a proper math expression and try to solve it.

Validating the expression

Sometimes Tesseract includes a new line or some other malformed characters in the result. As such, you should properly validate the expression. For now let’s perform a simple validation. In the demo, we support simple calculations for numbers from 0…9 and math operators +, -, *, / . You can base on the code to build even more complex expression like power, logarithm, sigma.

Solving the expression

Now to the fun part — the math 😀. To solve this expression, we’ll need to use Reverse Polish notation. According to wikipedia,

Reverse Polish notation (RPN), also known as Polish postfix notation or simply postfix notation, is a mathematical notation in which operators follow their operands, in contrast to Polish notation (PN), in which operators precede their operands.

Basically, an infix expression like (1+2)*3 needs to be transformed to a postfix expression, which would be 12+3* . Below is the pseudo-code:

for each token in the postfix expression:

if token is an operator:

operand_2 ← pop from the stack

operand_1 ← pop from the stack

result ← evaluate token with operand_1 and operand_2

push result back onto the stack

else if token is an operand:

push token onto the stack

result ← pop from the stack

The algorithm is pretty straightforward. Starting from the beginning of the string 12+3* , we first push 1 and 2 to the stack. Then we encounter the operator +, pop 1 and 2 from the stack, and evaluate as 1+2 . The result 3 is pushed back to the stack. We then push the next operand 3 to the stack. Finally, when we traverse to operator * , we pop the 2 operands of value 3 from the stack, evaluate 3*3 , and we get 9 as the result.

But how do we convert an infix expression to a postfix expression? There’s an algorithm called Shunting-yard for that.

In computer science, the shunting-yard algorithm is a method for parsing mathematical expressions specified in infix notation. It can produce either a postfix notation string, also known as Reverse Polish notation (RPN), or an abstract syntax tree (AST).

For now, let’s use a snippet from NSString-Reverse-Polish-Notation. It’s in Objective C, but we only need to add a MathSolver-Bridging-Header.h , since Swift and Objective C are compatible.

When we run the app, it recognizes all possible texts on the screen, but we only need to focus on our expression, which is the biggest text. Point the camera at the expression, and the app should be able to detect, recognize, and solve the expression. In case you’re still in doubt, the result of the expression (1+2)*3/5 is 1.8

We can of course add support for more operators such as power, sine, cosine, sigma, or even predefined functions.

While this particular app is used to demonstrate how this procedure is done, it could be modified to more practical usage, like checking meeting room availability, phone number tracing, order scanning at the post office.

Where to go from here

I hope this tutorial has provided some valuable insights about Vision and Tesseract. Here are a few more links to help you get started with your text detection journey on iOS:

Discuss this post on Hacker News and Reddit.