Today I’m hosting a guest post from skill developer Jo Jaquinta. In it, he goes into depth on the design and development of his Alexa skill “StarLanes”.

I’ve invited Jo to share this information because StarLanes is fairly unique among skills that are currently in release, and in order to build it and get it certified Jo had to address some technical requirements I suspect many other skill developers would like to tackle.

Take it away, Jo!

The StarLanes skill is one of the most advanced available on the Alexa platform. It is a multi-player interactive game, with factions, improvements, badges, a leaderboard, an in-game currency, and a reactive and adaptive audio interface. To push the Alexa platform to these limits required no small amount of development, innovation, and imaginative use of the APIs given by Amazon. As a guest poster in April’s blog I would like to pass on some of the tricks and wisdom learned along the path I took in creating StarLanes.

Anatomy of a Skill

Whether you are writing a single skill, or a series of skills, starting with a structure is a good step. It helps you organize your code, organize your thoughts, and give you the focus necessary to produce a high quality and easy to maintain skill. As TsaTsaTzu developed its repertoire of skills, we developed a pattern for them. This was largely based on the structure used by StarLanes. Although most were not as complicated as StarLanes, they still benefited from that structure and it is now policy to design our skills along these lines.

StarLanes is written in Java, and run as a servlet using the Alexa Skill Kit libraries. However the techniques used are broadly applicable to a servlet based skill in any language. Lambda skills are slightly different, and I will call out the points where that needs to be taken into account.

Classes

Our pattern for skills breaks down into five classes. Two are those required by the Alexa Skill Kit. Two are for the processing and business logic of the skill. The final one is a data structure that maintains state and assembles the response to a skill.

The Java version of the Alexa Skill Kit contains a base Servlet class, and a base Speechleet class. Typically you subclass the base servlet, and tie this to your extension of the Speechlet. In Javascript on Lambda these are combined together. The end result is on the Speechlet you have an access function for each of the four ways that Alexa can call your skill: Session Start, Launch, Intent, and Session End. For StarLanes these were modified as little as possible. Other than some instrumentation for logging, they only serve to translate from the specifics of the transaction environment (be it Lambda or Web) into objects defined by the Alexa Skills Kit.

The advantage of having this as a layer is that you insulate the rest of your code from the transactional environment. Before we settled on a Web Service, StarLanes was originally implemented as a Lambda function. This structured allowed us to make the move when our needs required it, without having to refactor much of anything. Even if you do not feel you will need to move between transaction environments, I would still recommend it as good practice. It keeps your code clean and small. And that makes debugging and maintenance easier.

The Speechlet class (and the Javascript equivalent) have the four mentioned entry points. However, in StarLanes we add simple pass through code to each of these entry points that maps to a single method in our Application class. This may seem counter intuitive, since the Application class, at some point, is going to have to work out which invocation is being dealt with, and to react accordingly. But a lot is gained from having a single entry point as we will see.

The role of the Application class is to manage the processing for the skill. Much as the Servlet/Speechlet classes provide a layer to contain the transaction environment, the application class provides a layer to handle the Alexa specifics. It works hand-in-hand with the Control class, which is where the business logic for the skill is contained.

The entry point in the Application class receives the invocation from Alexa, performs some pre-processing, then does selective processing based on the invocation type, and finally some post processing to compose the final response for Alexa.

Class Instantiation & Processing

In the pre-processing phase, the state class is instantiated. Invocation specific values which you need to care about, such as the UserID, are transferred from Alexa specific objects, into fields on the state class. If session variables are being used, these are likewise transferred from the Alexa session object to the state class. If persistent storage is being used, the data is retrieved at this point as well.

At the end of pre-processing, you now have all the information necessary to compute your skill’s reaction in one single place. You have one single parameter to pass from function to function, which greatly simplifies the argument profile of your functions. When you decide a deeper function needs additional information, you don’t need to go and change all the function profiles on the path to it. You just need to add an additional member to the state class. The logic of information retrieval is not spread throughout your application. It has one defined place to live. When your needs change, you have one place to maintain the code to adapt to that difference.

The central part of the entry point retrieves which type of invocation this call represents. It can branch to different paths to handle if this is a launch invocation or an intent based invocation. In both cases, the logic in the Application class extracts the values from the Alexa objects. If your skill is state machine based, then there is probably an additional branch here based on the current state (retrieved from the state class).

You will usually then have more branching (or a big switch statement) based on the intent. That’s a lot of branching. You can push branches of this logic tree into separate methods to keep things clean. But our approach has been to stop at that point. Once you’ve gotten to this point, it’s time to make a call into the control class. Any business logic to be done, can be done there. Having a nice line of separation between Alexa processing logic, and business logic is very handy.

Lastly, in the post-processing phase, there are two things to be done. First, is the reverse of the pre-processing phase. If there is anything in your state class that you need to persist, now is the time to do it. Either to session variables, or to off line storage, you can write those values now. As with pre-processing, that puts all your persistent logic in one place. Secondly, it is time to compose your reply. Alexa wants information in a certain format. As your state object gets passed around your logic, it is the storehouse for assembling that information. Here is where it can be extracted. Additionally, common logic can be applied. For example, if a card title or remprompt has not been set during logic execution, generic ones can be set at this point. That way you can maintain standards so the user gets a consistent reply.

Since we’ve covered all the Alexa processing from invocation handling to response composition in our Application class, that frees our Control class to focus solely on our application’s business logic. This is the guts of what makes your skill your skill. If you are consulting a database of information, or a 3rd party web service, or blowing up star systems, you can do all of that here, without worrying too much about Alexa specifics. More complicated skills may have several control classes, with the logic spread across them. (StarLanes has about 15!)

Re-Usable Web Service Code

If you have a service which is going to be served up through more than Alexa, much of your code in the Control class will be re-usable. For example, StarLanes provides a user-to-user and user-to-faction messaging feature. Since Alexa is not really good at free format text input, a small web interface was created for this. This web front end posts to a servlet that is part of the same application as the Alexa endpoint, and makes use of the same control classes for its functionality.

Summary

To recap: the Servlet/Speechlet classes encapsulate the transactional environment, protecting deeper code from the specifics of how your skill is invoked. The Application class encapsulates the Alexa specific code. It performs the primary branching that determines what you are doing. The Control class encapsulates the business logic of your skill. The how of what you are doing. The State class is the storehouse for all the data you need to compute your response, and picks up, over the course of execution, the data necessary to compose your response.

This structure has proved effective for applications as complicated as StarLanes, and as simple as Knock-Knock jokes. I hope it proves versatile enough to help you with your skills.

Jo Jaquinta is an avid developer on the Alexa platform with several skills published in the marketplace, and is a co-author of the best selling book “How To Program: Amazon Echo”. He brings several decades of industry experience to the platform, the last two of which have been at IBM. However he does not represent IBM and any opinions expressed here are his alone, and not those of IBM.

* * *

The Philips 456194 Hue White and Color Ambiance Starter Kit is one way to get your smart home lighting up and running quickly and easily. The package includes a HUE hub and three multicolor LED bulbs. Currently (as of 3/1/16) rated 4/5 stars across over 400 reviews and priced at $193.77.

Advertisements make it possible for Love My Echo to bring you great content for free, so thanks for your support.

* * *