The Full-Stack Guide to Actions for Google Assistant

Plus: How We Taught Google Assistant to Teach You Spanish

If you’re currently building or thinking about developing a custom Google Assistant Action, this post is for you!

My friend Daniel Gwerzman and I recently released Spanish Lesson, an Assistant that will help you learn Spanish by teaching you a number of new words every day, and reading you sample sentences in Spanish for you to translate into English.

Since we had so much fun building the app, we thought we would show you how we did it, including some interesting code snippets. Our goal is to help you save some precious time when developing your own Actions, and to show you how fun they are to create!

The Spanish Lesson Logo :)

So how did this whole thing get started?

Some months ago, my life partner Ariella got a new Google Home device. She was very excited about it and tried all sort of things. At some point, I heard her ask the device: “Hey Google, Teach me Spanish”, to which the device responded: “I’m sorry, I don’t know how to help with that yet.”

Coincidentally, earlier that same day, I read Daniel Gwerzman’s article in which he explains why now is a good time to build actions on Google. He had some very good points — people are lazy and would rather talk than type, and since it’s still the early days of the platform, there are many opportunities to have a big impact in the technology space.

So when the Assistant responded to Ariella that it can’t teach her Spanish “yet,” it struck me: why don’t I make that happen?!

The next day, I pinged Daniel, and asked him if he was ready to embark on a new adventure. Daniel was very excited about the idea, and we started collaborating. We sat together and designed a persona, which was the basis for creating the texts and possible dialog flows. We learned a lot throughout this process, and we will probably publish another post from the product / UX point of view in a few weeks.

However, the focus of this post is the technical part of creating an Assistant Action — the challenges that we had, the stack we chose, basically sharing with you how the architecture of a complete solution for providing a real-life, complex, Action on Google Assistant.

We are going to cover the tech decisions that we made, and share our experience with the outcomes of our choices.

Template, Dialogflow, or Actions SDK?

There are currently three approaches to building Assistant actions: ready-made templates, Dialogflow, and Actions SDK.

The ready-made templates are great for use cases such as creating a Trivia Game or Flash Cards app. When you use a template, you don’t need to write a single line of code, just fill-in some spreadsheet and the action is created for you, based on the information that you fill. This can be very useful for school teachers, who can easily create games for their students.

In our case, however, we needed more power: we wanted to be able to keep track of the user’s progress, and actually mixing Spanish and English in a single app is quite a challenge, as you will see in a minute. So we had to choose between Dialogflow and Actions SDK.

Dialogflow gives you a nice user interface for building conversation flows (in some cases you can even get away without writing any code), and also incorporates some AI to help you figure out the user intent.

Actions SDK gives you “bare-bone” access to user input, and it is up to you to provide a backend which will parse that input and generate the appropriate responses.

We decided to go with Dialogflow, as it could handle some of the flows for us (e.g. asking the user a yes or no question, and understanding the user response), and it would also let us prototype quickly.

Dialogflow’s built-in capabilities proved very useful to us. For instance, if a user didn’t know how to translate the sentence they were given, they could say, “I don’t know” to get the answer and skip to the next one.

Quickly after we published the app, we realized users had many ways to say they didn’t know the sentence — “I have no idea,” “I forgot,” “what is it,” and even the good old, “idk.”