clock menu more-arrow no yes

Filed under:

Why voice assistants don’t understand people who stutter

We’re talking to more robots—but the technology leaves behind people with speech differences

When Apple released Siri in 2011, Marc Winski was excited. Here was a new way to play songs, make phone calls, and save time. All one needed to do was say the magic words: “Hey Siri, do this...” Winski didn’t expect his stutter to defeat the time-saving purpose of this technology. But it did.

“As soon as you pause or stop over a word, [Siri] stops listening,” said Winski, an actor living in Manhattan. “Something that was created to save time has created more stress.”

While these interactions are frustrating for Winski and the roughly 3 million people in the United States who stutter, voice assistants and voice recognition technology are here to stay. Whether it’s relaying your name to a non-human operator or telling Google to turn up the lights in your home, we’re talking to more robots—but the technology is leaving people who stutter behind.

Speech-recognition software does not always parse stuttered speech, because it has not yet been trained to account for the extra sounds created when someone stutters. According to Frank Rudzicz, a University of Toronto computer scientist who studies speech technology for people with speech disabilities, any given voice assistant is making observations about your speech upward of 16,000 times a second, looking for phonemes, or the sounds that, when combined, create words.

Phonemes help computers—and our brains—differentiate between words like “tan” and “pan.” Two of the phonemes there, the “T” sound and the “P” sound, are phonemes that let us know which of those words is a color and which is a cooking tool. When you talk to a voice assistant, its software tries to piece together what you’re saying through phonemes, which it then uses to infer what a user might be saying. While phonemes help convey meaning, speech-recognition software does not account for unintended repetitions of them, according to Rudzicz.

A voice assistant listening to someone stuttering on a “D” sound might think that the user is repeating the word “the” over and over again—as opposed to a clear command—because a stuttered “D” itself has no meaning. A human listener, on the other hand, is more adept at parsing stuttered speech.

When telling Siri to play Celine Dion’s “My Heart Will Go On,” Winski will sometimes pause after “heart,” due to his stutter, and Siri will proceed to play something by Heart, the ’70s rock band.

Leonard Peng

“I mean, Heart’s a great band, but that’s not what I want right now,” said Winski.

Rudzicz said he believes that voice-assistant technology is about “90 percent of the way there” to fully understanding people with non-fluent speech. The last 10 percent will come when the technology can create and memorize individual speech models for every user. It will know how a “D” comes out, an “S,” a “T.” This solution would work for people with cerebral palsy, who usually speak atypically but consistently, according to Rudzicz. For people who stutter, however, the solution becomes a bit more complicated, because there is yet more to learn about stuttering.

Stuttering is defined by the National Institute on Deafness and Other Communication Disorders as “a speech disorder characterized by repetition of sounds, syllables, or words; prolongation of sounds; and interruptions in speech known as blocks.” These blocks can vary greatly within one person; in stuttering, as in life, there are good days and bad days. According to Joseph Klein, a speech pathologist and associate professor of communication sciences and disorders at Appalachian State University, stuttering is so unpredictable that it can even disappear in some contexts, like when the stutterer is alone.

“What’s so hard with stuttering is that it’s so variable,” said Klein. “With computers, you can’t say ‘I’m stuttering a lot today, can you wait a second?’”

Manufacturers of voice assistants are tackling the problem of understanding disfluent speech in different ways. To help those who stutter when they say “Hey Alexa,” Amazon is trying to take the “voice” out of voice assistant. According to an Amazon spokesperson, the company “recently launched Tap to Alexa which enables input to Alexa for customers who have difficulty interacting with Alexa with their voice.” The feature provides a touchscreen interface as a way of using Alexa, but it is limited to Amazon’s Echo Show, which was released this June.

Google, on the other hand, is continuing to collect data from a range of speakers to make its recognition technology more precise.

“We’re actively working to improve the quality of our speech recognizer to include more users, including users who have non-standard speech patterns or speak with an accent,” said Johan Schalkwyk, vice president and engineering fellow of speech at Google. “This is a long-term research challenge that we’re committed to.”

Despite efforts from companies like Google and Amazon, some people who stutter are resigned to the idea that voice assistants simply aren’t for them. Andy Fitzenrider is a data specialist with the Seattle Police Department. He said he has tried to use an Amazon Alexa at a friend’s house, but that his efforts go nowhere.

“It’s kind of like in Star Wars when Han Solo says, ‘Chewie, I don’t think they had wookies in mind when they made her,’” said Fitzenrider about the difficulty of using Alexa as a person who stutters. While he doesn’t think he’ll ever seriously use Alexa, he likes to discuss voice assistants in Facebook groups with other people who stutter.

And Fitzenrider is a longtime user of another voice assistant, one that doesn’t have the high profile of Alexa or Google Home. The assistant service Fitzenrider uses is called Speech-to-Speech, and it connects individuals with speech disorders to a trained operator who can navigate phone calls for them. The caller tells the operator exactly what to say to the other party, whether it be a person or a computer asking for a name or series of numbers. According to Fitzenrider, Speech-to-Speech makes phone calls much easier. He’s been using it for over 20 years.

“For a long time I had other people make calls for me, just because the phone was such a frustrating experience, and I had so much anxiety about it,” said Fitzenrider. “I mean, especially calling people I didn’t know, because I never knew… Are they going to hang up on me? Are they going to be patient?”

For other stutterers, like Pedro Pena III, of Houston, the problem of voice assistants begins with branding.

“I don’t think I can do it with all the g’s, the a’s, the s’s. They need to start with letters I can actually say,” said Pena, referring to the names of Google’s Assistant, Amazon’s Alexa, and Apple’s Siri; many people who stutter consistently experience disfluency on particular sounds. “I would love to have the ability to do it… It would make my life so much easier, but I am also a realist.”

Pena has gone to speech therapy for the past 20 years, and he hosts a podcast, My Stuttering Life. Through sharing life experiences, Pena hopes the show can ultimately show other stutterers that they “are not alone.”

“I love technology; it’s supposed to make our lives much easier,” said Pena. “It’s wonderful… if you can use it.” One of Pena’s October episodes addresses voice-assistant technology, and he talks about how a voice-activated interface came with his new Ford Explorer. It goes unused.

Kevin Wheeler is a freelance writer and radio producer. He attends the Craig Newmark Graduate School of Journalism at CUNY.

Longform

How to Avert the Next Housing Crisis

Longform

The Neighbors Issue

Longform

Bungalow Courts Make the Best Neighbors

View all stories in Longform