Let’s squash some bugs

Here’s a few gotcha’s that you better think of sooner than later, rather than believing everything’s working and then spending hours of debugging like I have. Error messages from Tesseract tend to be very cryptic and hard to even recognize as coming from Tesseract.

DataCloneError: The object could not be cloned

After all the work with downloading TesseractJS, language traineddata files and setting up a new Ionic project, this error message could mean anything. On the contrary, it actually indicate that everything’s working — almost at least. The Tesseract is alive and breathing, but you passed it some image data it didn’t understand. Woops, talk about clear error messages.

If you happen to get your image using @ViewChild('myImg') img; , make sure to provide Tesseract with this.img.nativeElement .

Tesseract.reconize does not return a Promise

It looks so familiar with its catch and then callback methods, but along with progress it is not quite a regular Promise. I had some mixed results trying to chain it into a regular Promise, so I ended up wrapping it like this:

return new Promise<string>((resolve, reject) => { this.tesseract.recognize(image)

.progress((v) => console.log(v))

.catch((err) => reject(err))

.then((result) => resolve(result.text)); });

Cannot enlarge memory arrays

Again, really hard to find any specific references on this error. My conclusion was, that the Tesseract core is compiled with some memory arrays of a fixed size. I tried to npm run build manually with various suggested arguments, supposed to configure the memory array size, but nothing helped.

In the end I just used a combination of scaling and cropping of the image taken by the camera. Obviously we don’t want to compromise the image quality too much, in an interest of keeping Tesseract predictions accurate, but in my case I knew the medium format that the user would be photographing, and build a frame around it. When the user adjusted the text within the frame, I could programmatically crop the image into three pieces and run each part through Tesseract one-at-a-time.

If anyone figures out how to enlarge the memory arrays, please let me know!

SyntaxError: Failed to load worker script at “assets/lib/tesseract.js-worker_1.0.10.js”

So this “SyntaxError” is one of those cryptic messages that will really have you pulling your hair out. When you copy/paste the path from the error the file is loading just fine. But Tesseract.js is trying to tell you that it prefers a full path to the local files.

This solution will get the host that you can prefix each of the three paths in Tesseract.create .

const host = window.location.protocol + '//'

+ window.location.hostname

+ (window.location.port ? ':' + window.location.port : '')

+ '/';

Woops, this doesn’t work on Android.

I’m personally struggling to find a good way to determine the full path to our beloved assets directory, across platforms! The above will work fine with ionic serve , but since Android’s “host” looks more like file://android_asset/ , the above host will just be file:/// .

Here’s an ugly host that works for Android, iOS and ionic serve :

const href = window.location.href; const hashIndex = href.indexOf('#'); const lastSlashIndex = href.substr(0,

hashIndex > -1 ? hashIndex : null

).lastIndexOf('/'); const host = href.substr(0,

lastSlashIndex > -1 ? lastSlashIndex : null

) + '/';

Traineddata language files disappear on Android

Another issue with very little documentation around it, not with regards to Tesseract, but Android in general! So seemingly Android doesn’t like .gz files, and some people had it prevent their project from building altogether.

In Ionic/Cordova it seems that this problem is fixed for us, by stripping files of their .gz extension. So our project builds successfully, only when we launch the app we realize the traineddata language file was renamed and cannot be loaded by Tesseract.

I looked everywhere on how to include .gz files in Android apps, eventually giving up and discovering some good news! The traineddata.gz files are not loaded by the monstrous Tesseract core, but rather by the non-minified worker.js .

It’s sitting right there in plain sight on line 8421:

8420: var lang = req.options.lang;

8421: var langfile = lang + '.traineddata.gz'; // Hello, friend!

8422: var url = req.workerOptions.langPath + langfile;

The solution is obvious! I just renamed my traineddata language files, stripping away the .gz extension myself, so no Android could do it for me. Then stripping away .gz in worker.js line 8421 as well, and voila!

tesseract.js-eng.traineddata.gz becomes tesseract.js-eng.traineddata

(Just remember this is still a gzipped file).

If anyone knows what’s up with .gz-files on Android, please let me know!