EngineeringMay 16, 2025·3 min read

The clearest thing the resident said all day, and the transcriber heard mush.

Dysarthria — slurred, weak, imprecise speech from stroke, Parkinson's, or ALS — is common in skilled nursing and brutal for speech-to-text. The acoustics are real signal; the model just wasn't trained to receive them.

A resident on one of our pilot units had a stroke two years ago. His mind is sharp and his sentences are well-formed, but the left side of his mouth doesn't move the way it used to, so his consonants soften and his vowels run together. When he tells Companion I need to use the bathroom, a stock speech-to-text model frequently returns I need to lose the bad room — or nothing at all. He is speaking clearly, by his own enormous effort. The machine is the one that can't keep up.

Why dysarthria breaks ASR specifically

Dysarthria is a motor speech disorder: the articulators — lips, tongue, jaw, soft palate, vocal folds — are weak, slow, or poorly coordinated. The result is not random. It is a systematic distortion of exactly the acoustic features an ASR acoustic model leans on hardest.

Imprecise consonants. Stop consonants like /p/, /t/, /k/ lose their sharp burst. The model's phoneme boundaries smear, and a single substituted phoneme can flip a whole word.
Reduced loudness and breathiness. Parkinsonian speech in particular is hypophonic — quiet and fading. Low energy at the bedside means low SNR before the words even reach the decoder.
Slow or variable rate. ALS and cerebellar dysarthria stretch and compress timing. Streaming models that expect typical phone durations mis-segment, and the language model's timing priors actively hurt.
Hypernasality and imprecise vowels. A weak soft palate leaks air through the nose, blurring the vowel formants that distinguish bed from bad from bid.

On our internal dysarthric-speech slice, word error rate for an off-the-shelf model runs roughly 35–55%, against single-digit WER on matched non-dysarthric elders. That is not a degraded experience. Above about 40% WER, the transcript is closer to a hallucination than a recording, and any downstream summary built on it is worse than no summary at all.

What actually helps

There is no single fix, and we are honest about that. What moves the number is a stack of small, boring decisions that each respect the fact that this resident's acoustics are signal, not noise to be scrubbed.

Per-resident provider routing. Our adapter layer scores OpenAI Realtime, ElevenLabs ConvAI, and Grok on each resident's own speech and pins the device to whichever decodes them best. Dysarthria is heterogeneous; the winner differs person to person.
Aggressive vocabulary biasing. We prime the decoder with the small set of words this resident actually says — bathroom, pain, daughter, their medications — so a smeared utterance has the right candidates within reach.
Confirm, don't guess. When confidence drops on a high-stakes turn, Companion repeats back what it heard and waits. A two-second confirmation beats acting on lose the bad room.
Lean on the language model, not the acoustics alone. Multi-turn context narrows the space: a resident who just mentioned discomfort is far more likely to be asking for the bathroom than a bad room.

We also resist the temptation to "clean up" his voice with heavy noise suppression, because the same filters that flatten hiss also flatten the faint consonant cues he has left. Restraint is part of the engineering.

The man in that room fought hard to say a clear sentence. The least we can build is a device that meets the effort halfway instead of throwing it away — so that when he asks for the bathroom at 2am, someone comes, because the machine finally heard him.

sttdysarthriaspeech

Why dysarthria breaks ASR specifically

What actually helps

30 days. One wing. Your numbers.