She left her dentures in the cup, and her S's disappeared.
Dentures, dry mouth, and missing teeth distort exactly the high-frequency sibilants and fricatives that speech-to-text relies on to tell words apart. It's a mundane, deeply eldercare-specific failure that general ASR was never tuned for.
A resident asks Companion for something for the pressure and the transcript reads for the pleasure. Same resident, an hour earlier, was transcribed perfectly. The difference: at night she takes her dentures out, and without them her /s/ and /sh/ and /f/ collapse into a soft, indistinct hiss. This is not an exotic edge case. In a skilled-nursing wing it is a nightly event, on a schedule, for a large fraction of the floor — and almost no general-purpose STT model was built with it in mind.
The physics of a missing fricative
Fricatives and sibilants — /s/, /z/, /sh/, /f/, /th/ — are made by forcing air through a tight constriction, often against the teeth. Their distinguishing energy lives high, up around 4–8kHz. Three things common in eldercare attack exactly that band:
- Missing or absent teeth. No teeth, no constriction. The turbulent noise that makes an /s/ an /s/ never forms, so sip, ship, and tip converge acoustically.
- Ill-fitting dentures. A plate that shifts adds clicks and changes the oral cavity's shape, smearing the formants the model uses to place vowels.
- Xerostomia — dry mouth. Extremely common from age and from medications. Saliva is part of how fricatives sound; a dry mouth makes them weak, clicky, and inconsistent utterance to utterance.
Compounding it, our pipeline is 16kHz mono PCM by design — fine for speech, but it puts the Nyquist ceiling at 8kHz, right where sibilant energy already lives. We are not throwing away much, but we have zero headroom above the exact band that is already degraded. The margin for error is thin precisely where these residents need it widest.
Why it's worse than it looks on paper
Sibilants are not just any phonemes — they do enormous lexical and grammatical work. English marks plurals, possessives, and third-person verbs with /s/ and /z/. It distinguishes minimal pairs all over the high-stakes vocabulary: pressure vs pleasure, sore vs more, sick vs thick. On our denture-out slice, WER climbs from single digits to the 20–30% range, and the errors cluster on exactly the content words a clinical summary depends on.
What we do about it
- Time-of-day-aware biasing. We know roughly when a resident's dentures come out. After that, we lean harder on vocabulary hints and context priors, because the acoustics alone are no longer trustworthy.
- Minimal-pair-aware confirmation. When a transcript lands on a sibilant-sensitive word in a clinical context — pressure, pain, a medication — and confidence is soft, Companion confirms rather than acting on a coin flip.
- Restraint on high-pass filtering. It is tempting to boost the high band to recover sibilants. We don't, aggressively, because amplifying 4–8kHz also amplifies HVAC hiss and denture clicks. We tune per resident, not globally.
- Context over acoustics. A resident who has been talking about her blood pressure is asking about pressure, not pleasure. The multi-turn language model is doing more work than the microphone here, and that's correct.
It is a humbling problem, because the fix is not a clever algorithm — it is taking the unglamorous reality of an aging mouth seriously as an engineering input. The woman with her dentures in the cup still deserves to be understood when she says pressure. Building for her bedside means building for the body she actually has tonight.