Designing a voice interface? Here's a useful list of lists: as many guiding principles as we could find, all in one place. List compiled and edited by Ben Sauer @bensauer.

Voice Platforms

Amazon Alexa

Design checklist

  • Make it clear how customers can benefit from your skill
  • Make sure customers can find your skill
  • Design for natural language conversation
  • Use good interaction design practices
  • Handle unexpected user utterances gracefully
  • Watch customers try to use your skill

How Alexa responds

  • Be brief
  • Speak and write naturally
  • Prompt with guidance for the user
  • Use conversation markers
  • Add variety
  • Use parallel language
  • Remember what was said
  • Provide definitive choices
  • Use brevity, arrangement, and pacing when listing options
  • Handle problems
  • Provide contextual help

Legacy guidelines

Back to top ↑

Google's Conversation Design Guidance

Principles

  • Keep it short. Respect users' time. Get to the point and get out of the way. 
  • Give users credit. People know how to talk. Don’t put words in their mouth. 
  • Be relevant and sensitive to context. 
  • Delight the ear without distracting the mind. When adding personality, be sure it's not over the top.
  • Engage beginners and attract experts. Designing for many people doesn't mean designing for the lowest common denominator.
  • Take turns. Just asked a question? Stop talking. 
  • Don't read minds. Give them the facts and let them decide.

Greetings and goodbyes

  • Tell users who you are
  • Give the right amount of information
  • End conversations appropriately

Conversational dialogs

  • Sound natural
  • Be cooperative
  • Take turns

Conversation repair

  • Prevent errors by expecting variations
  • Provide helpful reprompts or pivot to another question
  • Be prepared to help at any time
  • Let users replay information
  • Fail gracefully

Persona

  • Reflect your unique brand and identity
  • Keep users coming back
  • Stay consistent
Back to top ↑

Google's Conversation Design Best Practises

Be Co-operative, like your users

  • Understand recognition grammars and repair prompting
  • Accommodate diverse user speaking styles
  • Let people know what they can say, intuitively

Unlocking the power of spoken language

  • Communicate what the system understood
  • Offer meaningful examples when letting people know what they can say
  • Avoid stating the obvious
  • Give instructions only if needed

Instilling user confidence through confirmations and acknowledgements

  • Use explicit confirmation for clarity around high-risk requests
  • Use Implicit confirmation for speed around simple requests
  • Avoid "Go back" instructions
  • Leverage acknowledgers to help reassure people they've been heard
  • Randomize acknowledgers to avoid monotony and gimmickry

In conversation, there are no errors

  • Don't treat technical error "events" as users misbehaving
  • Handle different types of error events with the appropriate strategy
  • Prevent errors by providing help in the moment
  • Know when to give up
  • Make the success path more robust to "disguise" errors
Back to top ↑

Google's principles of human conversation

  • Give your VUI a personality. All voices project a persona whether you plan for one or not.
  • Move the conversation forward. Try to anticipate moments when your VUI can keep the conversation going by offering more information and recognizing informative answers from users.
  • Be brief, be relevant. Keep messages short and relevant. Let users take their turn. Don’t go into heavy-handed details until or unless the user will clearly benefit.
  • Leverage context. Keeping track of the conversation and remaining “aware” of the user’s context will effectively advance the perception of human intelligence.
  • Direct the user’s focus through word order and stress. To focus the user’s attention on what’s important, leverage their expectations of word order and stress placement.
  • Don’t teach “commands". Speaking is intuitive. If you have to explain a command, something’s wrong.
Back to top ↑

Apple's Human Interface Guidelines: Siri

  • Strive for a voice-driven experience that doesn’t require touching or looking at the screen. 
  • Respond quickly and minimize interaction. 
  • Take people directly to content. 
  • Be relevant and accurate. 
  • Be appropriate. 
  • When a request has a financial impact, default to the safest and least expensive option. 
  • Increase accuracy with custom vocabulary. 
  • Provide example requests. 
  • Don’t advertise. 
Back to top ↑

Microsoft's Cortana: Voice Design Best Practises

  • Design for the most common scenario. 
  • Tasks should feel quick and easy. 
  • Keep tasks glanceable.
  • Help guide the user's attention to the current focus of the conversation. 
  • Clearly communicate forward progress during task completion. 
Back to top ↑

IBM: Conversation design guidelines

Achieve mutual understanding

  • Recipient design. Adapt the dialogue as we do in everyday conversation: different topics or levels of detail depending on the person.  
  • Minimization. Strive to minimize details for the user without sacrificing understandability.
  • Repair. Build robust repair mechanisms, so that your conversational agent does not need to always get it right on the first try.

Practises

  • Onboard users. Conversational agents should always be able to talk about what they can do or what they know.
  • Progressive disclosure. Provide next steps sequentially and break down a process into bite-sized chunks. 
  • History. Relay the current state of the conversation to the user. In voice-based interactions, use repetition tactfully to not only provide feedback but also to mark location.
  • Artifacts. Whenever possible leverage the medium to facilitate the conversation. For example: a map with an X can better relay complicated instructions than words.
  • Multimodal feedback. Give the user feedback through the conversation and the visual user interface (if present) to illustrate whether or not a request was heard or an action took place.
  • Fail gracefully. Don’t be afraid to let the conversational agent admit a lack of understanding. Sometimes humans don’t understand each other either.
  • Personality. Construct the persona of your agent somewhat like you would for your user. How serious or professional do you want your agent to be?
Back to top ↑

Voysis

  • Have a compelling reason to use voice. Voice is a relatively new way to interface with apps and devices and users have habits that are developed already with those form factors. The solution should present a compelling reason to use voice over those existing habits (i.e. touch, click, etc.).
  • Set user expectations and build trust. The user should understand what is and isn’t possible with the system and if the system doesn’t understand or can’t respond it should handle those situations in an empathetic, honest and helpful way.
  • Make it naturally discoverable. Traditional GUIs have labels and constraints that help the user to understand where to go and what the system can do. VUIs need to be more flexible and at the same time allow users to understand what is possible when they are trying to complete a task through natural discovery.
  • Mimic how people naturally speak. A good way to help users complete their task is to ensure, as completely as possible, the system can understand and respond using natural language. Most systems are fairly constrained in what they can process and how they can respond.
  • Create context through modality and state. If a GUI is available it is essential to take advantage of it to help create a good voice experience. Modality allows the user to go back and forth between using voice and traditional interaction methods and gives the user visual feedback. It is also important that the user understands what state they are in. Is the system listening? processing? responding? Audibly or visually it’s important to give the user an indication.
Back to top ↑

Books

Designing Voice User Interfaces

by Cathy Pearl

  • Conversation design. Humans rarely have conversations that only last one turn. Design beyond that one turn; imagine what users might want to do next.
  • Set user expectations. Don’t ask a question if you won’t be able to understand the answer.
  • Confirmations. Make sure that users feel understood, and let them know when they weren't.
  • Conversational markers. Let the user know where they’re at in the conversation.
  • Error handling. Design for when things go wrong, because something will always go wrong.
  • Don't blame the user
  • Novice and expert users. Adapt to the experience and expertise of the users. 
  • Keep track of context. People don't repeat terms in conversation, they use pronouns like 'she' after the subject has been established. Make sure your system understands the context of user input. 
  • Help and other universals. Include a set of universals at every state: e.g.  repeat, main menu, help, operator, and goodbye.
  • Latency. Use audible or visual cues to communicate unavoidable system delays to the user.
  • Disambiguation. If a user gives ambiguous information, use contextual clues to make a smart guess or ask for clarity. 
  • Accessibility. Design experiences for everyone, no matter their abilities. Make interactions: time efficient, provide context, and prioritise personalization over personality. 
Back to top ↑

Don't Make Me Tap! Notes on Design 

by Bouzid / Ma

Respect

  • Respect the user's time
  • Respect the user's freedom
  • Don't lie to the user
  • Don't blame the user
  • Never terminate an interaction unilaterally
  • Tell the user what you are going to do
  • Don't switch modalities on the user without telling them

Intelligence

  • Know the user's preferences
  • Know the user's level of expertise
  • Anticipate the user's requests
  • Detect and act on request spikes

Consistency...

  • ...in language
  • ...in voice
  • ...in modality
  • ...across exchanges
  • ... across contexts
Back to top ↑

Grice's maxims

"These maxims may also be understood as describing the assumptions listeners normally make about the way speakers will talk, rather than prescriptions for how one ought to talk."

  • Maxim of quality: Try to make your contribution one that is true.
  • Maxim of quantity: Make your contribution as informative as is required (and no more).
  • Maxim of relation: be relevant.
  • Maxims of manner: Avoid obscurity of expression; be brief and orderly. 
Back to top ↑

BBC: Principles for designing a voice experience for children

Asking questions

  • If you're asking a question, end the sentence with it and listen for an answer immediately.
  • Don't tell children to 'say this' or 'say that', simply ask the question.
  • Don't ask rhetorical questions; children will answer them.
  • Ask questions that have distinctive, easy-to-say answers.

Listening for answers

  • When offering a choice, provide no more than three options.
  • Strive to present options that are balanced in their appeal to children.

Handling errors

  • Don't keep children stuck in error loops. Turn a bad situation good by progressing them even when they are misunderstood.
  • Don’t use language or tone to make the child feel as though they are to blame.

Writing content

  • Use a real voice to speak to children in a warm, friendly tone – avoid cold, monotone, synthesised voices.
Back to top ↑

Fjord: Six principles for designing for voice UI

  • Conversation as user interface. Conversations require cooperation between participants and mutual acceptance, understanding and respect. Same with a voice interface.
  • The interface of least resistance. A user will typically look for the easiest way to complete a task, but their definition of “easy” can vary wildly depending on the context and situation.
  • Everything happens in sequence. Effective conversational design follows familiar sequences and implements conversational structure and familiar queues to increase empathy, maintain a relationship with the user and motivate for further engagement.
  • Context is key. Voice interfaces are most useful in more private, controlled environments, such as the car or home.
  • What's your name again? Using emotive expressions to address the user and conversational context to recognize them helps build trust.
  • Mind your manners. The importance of helpful words, and all that implies, applies to voice UI, too. 
Back to top ↑

Fjord: six principles for designing engaging voice interfaces

by Michael Levy

  • Create a conversation bubble. Build a small pool of shared memory between the user and the system.
  • Keep it simple. Users look for the path of least resistance.
  • Guide users from A to B. Use repeatable patterns of behaviour to build familiarity into interactions. 
  • Know when to be seen, not heard. Understanding context is key to creating an engaging voice UI.
  • Build empathy through personality. An increased level of empathy is beneficial to both parties.
  • Open access to all. A truly engaging voice interface is one that doesn’t feel like it’s for some people and not others.
Back to top ↑

Jellyvision's Jack Principles

Maintaining Pacing

  1. Give the user only one task to accomplish at a time 
  2. Limit the number of choices the user has at any one time 
  3. Give the user only meaningful choices 
  4. Make sure the user knows what to do at every moment 
  5. Focus the user’s attention on the task at hand 
  6. Use the most efficient manner of user input 
  7. Make the user aware that the program is waiting 
  8. Pause, quit or move on without the user’s response if it doesn’t come soon enough.

Creating the Illusion of Awareness

Specifically Respond with Human Intelligence and Emotion to: 

  1. The user’s actions 
  2. The user’s inactions 
  3. The user’s past actions
  4. A series of the user’s actions
  5. The actual time and space that the user is in
  6. The comparison of different users’ situations and actions

Maintaining the Illusion of Awareness

  1. Use dialogue that conveys a sense of intimacy 
  2. Make sure characters act appropriately while the user is interacting
  3. Make sure dialogue never seems to repeat 
  4. Be aware of the number of simultaneous users
  5. Be aware of the gender of the users
  6. Make sure the performance of dialogue is seamless
  7. Avoid the presence of characters when user input cannot be evaluated
Back to top ↑

Designing Voice Experiences: Guiding Principles

by Lyndon Cerejo

  • Onboard the user and help them get started
  • Keep conversation exchanges brief to reduce cognitive load
  • Examples work better than instructions
  • Delight without interfering with the task
  • Use explicit confirmations for important actions, and implicit for less risky
  • Design for failure
  • Respect the user's privacy and security 
Back to top ↑

The Interaction Design Foundation's Guidelines for Designing Voice User Interfaces

  • Provide users with information about what they can do.
  • Help users understand where they are in the system.
  • Express intentions in examples.
  • Limit the amount of information.
  • Use visual feedback. 
Back to top ↑

NNG: Classic Usability Principles for Voice Interaction 

By Kathryn Whitenton

(The complete list of usability heuristics that these are taken from)

  • Error prevention: systems should prevent errors from occurring in the first place.
  • Flexibility and efficiency of use: allow for flexible use of the device (e.g. wherever they are);  make interactions efficient (e.g. avoid forcing the user to speak repetitive commands that they wouldn't say in conversation).  
Back to top ↑

Voice User Interface Principles

by Stephen Gay 

  1. Get to know each other: introduce yourself and establish communication preferences.
  2. Be approachable: make it easy for users to start the conversation at any time.
  3. Listen closely: show that you're listening, through both words and actions.
  4. Mind your manners: be responsive and socially sensitive in your interactions.
  5. Talk like a native: convey meaning by what you say and how you say it.
  6. Adapt your speaking style: anticipate what kind of conversation will suit the situation.
Back to top ↑

CX Partners: 7 reasons people aren’t using your voice UI

by Fabien Marry

  1. Be clear about what it can do.
  2. Don't ignore privacy and social contexts.
  3. Take care who is speaking.
  4. Remember the context of conversation: what did the user just say?
  5. Don't rely on voice alone.
  6. Consider the environmental audio (e.g. car, kitchen, music playing)
Back to top ↑

10 Best Practices when Designing for Voice

by Jess Williams

  1. Manage users' expectations.
  2. There doesn't need to be a hierarchy!
  3. Consider the linguistics. A great voice app needs to cater for differences in the way people express themselves.
  4. Keep Alexa's responses short.
  5. Don't have too many steps in the conversation.
  6. Try not to answer a question with a question.
  7. Spend time on the edge cases and half happy paths.
  8. Minimize choice.
  9. Minimize pressure. Always give the user an option that buys them more deciding time.
  10. AUDIO, AUDIO, AUDIO. The best skills on Alexa have audio content in them — a piece of audio that isn’t her voice.
Back to top ↑

Basic Guidelines for Successful Voice Design

by Cheryl Platz

  1. Reincorporation is key. When the customer has given you data in their utterance, reincorporate it to confirm recognition.
  2. GUI parity is not the goal. Speech interfaces are good at certain things: search, frequently repeated actions, sets of unique values. Speech interfaces are bad at screen-by-screen navigation and data-heavy interactions.
  3. Brevity is the soul of voice UI. Every word of your response will increase the time your customer must spend listening. Be particularly strong-handed with edits on frequent responses.
  4. Choose personality moments wisely. Only inset personality if you believe your customer has time to spare. Avoid in repetitive tasks. Best used in response to open-ended questions, i.e. "How are you?"
  5. Use questions to guide multi-turn interactions. Don't just open up the mic and hope for the best. If you don't have enough information to act, give the customer a starting point in the form of a question to set them up for success.
  6. Test drive your sample dialogs in audio form. Your TTS system may mispronounce common words or generate odd intonations, and your utterances may be awkward when spoken. If possible, generate audio comps with both sides of the conversation recorded.
  7. Consider earcons, but use sparingly. Earcons (audio icons) can lead to more streamlined interactions, especially for repeated tasks. There are additional considerations like speaker quality, cohesion and acoustics that are best discussed with a sound designer.
Back to top ↑

Notes

Why?

I realised pretty early on in my reading about voice interfaces that the available design resources are somewhat out-of-date, and few and far between. I wanted a list of useful guidance in one place, inspired by my colleague Jeremy's list of design principles: https://principles.adactio.com

If there’s anything new out there, please let me know. If you're one of the authors, and you have feedback / corrections, do get in touch, I'm ben@redbeard.org.uk  / @bensauer.

Conversational UI: bots vs. voice 

Some principles apply to both bot design and to VUI, some don't. I've chosen to focus on voice, although I hope it's useful whatever you're working on.

About design principles

Design principles and guidelines are statements that help you to make design decisions. Lots has been written about what makes good design principles, but even when something doesn't meet the criteria, it might still be useful. 

Caveats on the editing process

Given above, I've taken some liberties regarding what constitutes a guiding principle in compiling this list. 

  • Some are more like rules, rather than principles
  • I've not included anything that's too platform specific
  • Where a principle is too short to understand by itself, or needs explanation, I've added context
  • Some have been trimmed for brevity
  • Unedited definitions available via the links - please read the original content.
Back to top ↑

Want to do some practical learning to design for voice?

Check out my workshop, find out more below!


Back to top ↑