Defining Utterances for the Alexa Skills Kit

When writing an app, or “skill” for the Amazon Echo, you have to provide examples of how your users might phrase their requests. Lots and lots of examples. Doing this by hand can be a ton of work, especially when you’re still developing and you have to go back and make edits to all of your examples.

So I created an online tool that makes it easier and faster to build your utterance examples. Read on to learn more about defining example utterances and how the online tool can make it painless.

What is an Utterance?

Your app for the Amazon Echo is going to have one or more capabilities or features. In the nomenclature of the Amazon Skills Kit, these are called Intents. To activate an Intent in your app, your users will ask the Echo a question or give it a command. Thanks to the flexibility of English, there could be several variations in the ways that these questions or commands might be phrased. A spoken phrase that activates an Intent is called an Utterance, and there may be many valid Utterances that you will want to accept for each of your app’s Intents.

The Amazon documentation recommends that you create an example Utterance for every possible way that your users might invoke your Intent. It also lists 30 example ways that an Utterance question might begin, and this page outlines many other formats. When you combine those with the potential variations in the rest of the users’ sentences, the total number of possible Utterances grows huge pretty quickly.

Depending on what your app does, commands and questions may include input that your app needs to work. Sometimes the set of valid inputs will be predictable and finite, like the cities with major airports or the names of hockey teams. Other times, the set of valid inputs is too large to be completely predictable. Examples of unpredictable inputs might include the names of movies or historical figures. You will define your Utterances with a kind of a fill-in-the-blank syntax where these inputs will appear. Amazon calls these inputs Slots.

When you have Slots that can accept a finite set of values, Amazon tells you to provide each example Utterance for each possible Slot value. For instance, an app that knows the schedule of NHL games might support a question asking when the next game will be for a specified team. Using just a single way of phrasing the question, your Utterances would include:

when is the next Red Wings game?
when is the next Sharks game?
when is the next Rangers game?
... and so on for all 30 teams.

Assuming you came up with just these three phrasings:

when is the next Sharks game?
when do the Sharks play next?
what is the next Sharks game?

and each phrasing had, say, 8 small variations like this:

when is the next Sharks game?
when's the next Sharks game?
when is the next Sharks hockey game?
when's the next Sharks hockey game?
when is the Sharks game?
when's the Sharks game?
when is the Sharks hockey game?
when's the Sharks hockey game?

then your total number of Utterances to handle this simple question is 30 * 3 * 8 = 720. Once you start thinking about all the slightly different ways that a person might ask this question, you’ll find that there are a bunch of small word differences that could be used. You can easily find yourself having to enumerate thousands of example Utterances.

A Smarter Way to Manage Utterance Variations

If you look at even the short examples above, you’ll notice there is an awful lot of repetition. Why not specify each alternative only once and let the computer construct all the permutations? That’s what my Amazon Echo Utterance Expander tool does.

So instead of the eight variations shown above, you just need to write:

(when is/when's) the (/next) Sharks (/hockey) game?

Paste that into the Utterance Expander and it will generate the full list of eight variations.

The syntax is simple. Wherever you want to support multiple alternatives for words or phrases, enclose the set of alternatives in parentheses and separate them with slashes. When there may or may not be a word or phrase somewhere, you can include a blank alternative. So

(/please) tell me a (/funny) joke

expands into

please tell me a funny joke
please tell me a joke
tell me a funny joke
tell me a joke
Nested Variations

It is legal to include nested sets of alternatives:

Tell me a (story/(/funny) joke)
Tell me a funny joke
Tell me a joke
Tell me a story
Handling Slots

Alternatives work for Slot examples as well.

What date is next {(Sunday/Monday/Tuesday/Wednesday/Thursday/Friday/Saturday)|dayofweek}?
What date is next {Friday|dayofweek}?
What date is next {Monday|dayofweek}?
What date is next {Saturday|dayofweek}?
What date is next {Sunday|dayofweek}?
What date is next {Thursday|dayofweek}?
What date is next {Tuesday|dayofweek}?
What date is next {Wednesday|dayofweek}?
Intents, Utterances, and Multiple Phrasings

When you provide your example Utterances in the Amazon Developer Console, each Utterance needs to be preceded by the name of the Intent it goes with. All of your Utterances for all of your Intents get combined into a single field that you paste into the form. So remember to include the Intent name at the beginning of each of your Utterances that you feed into the Utterance Expander.

You will also find that some Intents can be expressed using multiple phrases that are more than just small word differences. In these cases, it is more efficient to list them separately, one on each line, than to try to combine them into a single Utterance with multiple alternatives. Here’s a trimmed down example of an app with two Intents and what the input to the Utterance Expander might be:

NextHockeyGameForTeam when is the next {(Sharks/Rangers/Red Wings)|team} game?
NextHockeyGameForTeam when do the {(Sharks/Rangers/Red Wings)|team} play (/next)?
NextTelevisedHockeyGame when is the next (/hockey) game on (t. v./television)?




Leave a Reply

Your email address will not be published. Required fields are marked *