a- Creating the “dialog flow” for our agent

Since the “chatbot” part is not our main focus in this post, we will “Keep it simple, stupid” and design a quick conversation in DialogFlow, as follows:

Create an intent “read”. Add a couple of user’s expressions, eg. “read this text” or “extract the text”. Add a “read” action. Enable the use of the webhook (see the fulfilment below).

b- Implementing the agent logic

Let’s now code the logic for our agent that will actually take the picture.

First, we’ll need two utility functions:

captureImage a function that captures an image using the user’s camera. uploadImage a function that uploads that image to the Google Cloud Storage (GCS).

Here the implementation of the captureImage function. This function is using a system utility imagesnap available on MacOS to actually access the camera, capture the image and store the image file under /tmp/google-actions-reader-${Date.now()}.png . This function then returns both the name and the file content in base64 :

const fs = require('fs');

const child_process = require('child_process');

const Buffer = require('safe-buffer').Buffer; /**

* Capture the image from the user computer's camera.

*/

function captureImage() {

return new Promise((res, rej) => {

const file = `/tmp/google-actions-reader-${Date.now()}.png`;

try {

child_process.execSync(`imagesnap -w 1 ${file}`);

const bitmap = fs.readFileSync(file);

res({

base64: new Buffer(bitmap).toString('base64'),

file

});

} catch (err) { rej(err); }

});

}

The next function uploadImage will simply upload that image to GCS in the cloud-function-ocr-demo__image bucket:

const child_process = require('child_process'); /**

* Uploads the file to GCS.

*

* @param {object} data The GCP payload metadata.

* @param {object} data.file The filename to read.

*/

function uploadImage(data) {

child_process.execSync(

`gsutil cp ${data.file} gs://cloud-function-ocr-demo__image`

);

return data.file.split('/').pop();

}

Please note the name of the bucket cloud-function-ocr-demo__image , we will need it later.

Now that we have our two utilities functions captureImage and uploadImage ready, let’s use them inside the read intent logic (remember this intent in the dialog from above?):

/**

* The "read" intent that will trigger the capturing and uploading

* the image to GSC.

*

* @param {object} app DialogflowApp instance object.

*/

function readIntent(app) {

captureImage()

.then(uploadImage)

.then(content => {

app.tell(`I sent you an SMS with your content.`);

})

.catch(e => app.ask(`[ERROR] ${e}`) );

}

This readIntent will basically capture and then upload the image to GCS.

Now that we have all of the agent’s logic implemented, let’s create the main Cloud Function that will process DialogFlow’s requests:

const aog = require('actions-on-google');

const DialogflowApp = aog.DialogflowApp; /**

* Handles the agent (chatbot) logic. Triggered from an HTTP call.

*

* @param {object} request Express.js request object.

* @param {object} response Express.js response object.

*/

module.exports.assistant = (request, response) => {

const app = new DialogflowApp({ request, response });

const actions = new Map();

actions.set('read', readIntent);

app.handleRequest(actions);

};

The assistant Cloud Function will be triggered from an HTTP call. This call will be made by DialogFlow if the user says, for example, “read this text” (as mentioned above) which is an expression defined in the read intent.

c- Deploying the assistant Cloud Function

This section will serve as an example for the rest of this guide.

In order to deploy a Cloud Function, we can use the gcloud command with the following arguments:

gcloud beta functions

deploy <function-label>

<trigger-type>

--source <source-code>

--entry-point <function-name>

<function-label> is a function label, this can be the same or different from <function-name> . <trigger-type> is how your function is going to be triggered (topic, http, storage…etc). <source-code> is the Google Cloud Repository where the source code of the function is hosted. This can’t be some other public Git repository! <function-name> is the actual exported function name (in your code).

You also can use a Google Cloud Storage bucket to host the source code of your function. But we’ll not cover this here.

Oh, by the way…

Hosting your source code in a Google Cloud Repository (a Git repo) is a good idea if you have a continuous delivery strategy in your organisation.

In our case, here is the full command:

gcloud beta functions

deploy ocr-assistant

--source https://source.developers.google.com/projects/...

--trigger-http

--entry-point assistant

In case you are wondering, the Google Cloud Repository source has the following format:

https://source.developers.google.com/projects/<project-id>/repos/<repo-id>/moveable-aliases/<branch-name>

Once deployed, your function should be ready to be triggered: