Development Notes

This app is written in Flutter using these technologies:

The Flutter UI Toolkit for crafting the user interface. The Google Machine Learning Vision kit using Text Recognition (Optical Character Recognition). This is provided for Flutter in the package firebase_ml_vision. The image_picker package to access your device’s photo gallery. The url_launcher package to enable web searching from the app. The flutter_tags package to allow editing of the words found in your photos prior to searching for them.

Versions of packages used:

Flutter 1.9.1+hotfix.2 firebase_ml_vision: ^0.9.2+1

image_picker: ^0.6.1+4

flutter_tags: ^0.4.3

url_launcher: ^5.1.2

The code is available here. The app is 7.3MB is size when built in release mode.

User Interface

The app has a horizontal 3 tab user interface. The first tab is used for selecting the image, the second tab is used for editing the words that are found, and the third tab is used for searching. These tabs can be selected by pressing the visible text at the top of the app screen or by swiping from side-to-side.

The contents of each tab can be scrolled vertically as needed. The tab for editing the words has an expansion tile that allows setting the font and scroll direction of the words. This layout functionality is enabled by the NestedScrollView widget.

Flutter Widget hierarchy

Flutter Widget Hierarchy

Notes on image processing

When selecting an image for text extraction, I could have used ImageSource.camera to use the active device camera, but I decided to use ImageSource.gallery as it provides a better user experience by presenting thumbnails of existing photos to select for processing.

I experimented with the maxHeight and maxWidth settings of ImagePicker.pickImage to balance app performance with the quality of text recognition. A native photo on my LG K20 phone is 4160 x 3120 and is 6MB in size. Specifying a maxHeight and maxWidth of 2000.0 in the app provides good performance and accurate text recognition. Depending on orientation, these settings usually result in a photo size of 2000 x 1500 processed by the ML Vision kit.

As stated in the “Input Image Guidelines” Ideally, for Latin text, each character should be at least 16x16 pixels. For Chinese, Japanese, and Korean text (only supported by the cloud-based APIs), each character should be 24x24 pixels. For all languages, there is generally no accuracy benefit for characters to be larger than 24x24 pixels.

Google ML Kit Vision processing

In order to use image recognition, you must create a FireBase project and follow these configuration steps. I opted for a free Firebase Spark plan. I did not need a Cloud Firestore database for this app.

The following code performs text recognition on the selected image:

List _items = [];

File pickedImage;

FirebaseVisionImage ourImage; ... Future readText() async {

TextRecognizer recognizeText = FirebaseVision.instance.textRecognizer();

ourImage = FirebaseVisionImage.fromFile(pickedImage);

VisionText visionText = await recognizeText.processImage(ourImage); // flush list as it might be a subsequent image being processed

if (_items.length > 0) {

_items.removeRange(0, _items.length);

} for (TextBlock block in visionText.blocks) {

for (TextLine line in block.lines) {

// Same getters as TextBlock

for (TextElement element in line.elements) {

if (element.text != null) {

_items.add(element.text);

}

}

}

} if (_items.length == 0) {

_items.add("No text returned from ML Vision for this image.");

}

setState(() {});

}

When running the app, I can see console messages that indicate images are processed in about 25 milliseconds:

I/native (19950): timer.cc:71 PhotoOcrEngine::Init (recognizer): 21.2633 ms (elapsed)

I/native (19950): timer.cc:71 Init: 24.7678 ms (elapsed)

On-device text detection

I am able to perform text detection with wifi and mobile data turned off. As advertised, an ideal use case for the on-device API is “Recognizing sparse text in images”.

That’s magical!

Photo by Almos Bechtold on Unsplash

Some caveats on the Magic

My testing was performed on Android only (no iOS).

Sometimes the text recognition works when not expected, and vice versa. I had limited success in recognizing hand-drawn block letters. However, handwriting recognition is available as a paid feature of the Cloud Vision API.

As of this writing, the Firebase ML Kit is still in beta status and the API might change. This is also the case with flutter packages firebase_ml_vision, image_picker, and flutter_tags.