As her dementia advanced, she stopped speaking the language the device was set to.
Multilingual residents code-switch mid-sentence, and dementia often peels back the layers to a first language learned decades ago. Locking ASR to one language ID is how you build a device that slowly stops understanding someone.
A resident on one of our units spoke English fluently for sixty years. As her dementia progressed, her English thinned and her childhood Polish came back — not as a choice but as a reversion. Some mornings she greets the nurse in Polish, asks for water in English, and names her late husband in Polish again, all in one breath. A speech-to-text system pinned to en-US transcribes the English fragments and turns the rest into nonsense words. The device doesn't understand her less because she changed. It misunderstands her because we assumed she wouldn't.
Two distinct hard problems
"Multilingual support" sounds like one feature. At the bedside it is at least two separate problems, and they fail differently:
- Language identification. Before you can transcribe, you have to know what language is being spoken — and for short, quiet, accented utterances, language ID is itself unreliable. A two-word request gives the classifier almost nothing to work with.
- Intra-sentence code-switching. Even with perfect language ID, a sentence that switches mid-clause defeats any model running in a single fixed language. I want to call my córka needs two languages decoded in one utterance, with the switch boundary detected on the fly.
Most production ASR forces a per-session language choice. That model assumes a person picks a language and stays in it. Bilingual elders — especially those with cognitive decline — simply don't, and the assumption fails silently, which is the worst way for it to fail.
Why dementia makes this non-optional
Language reversion in dementia is well documented and clinically meaningful. The first language, learned in childhood, is the most durable; later languages erode first. So the residents most likely to revert are precisely the ones least able to adapt to a device that only speaks their second language. They cannot meet the machine halfway. The machine has to come to them, and it has to keep moving as they change.
A device that only works in the language a resident learned second is a device that works until the day they need it most.
What we do about it
- A per-resident language profile, not a session flag. We store the languages a resident actually uses in their Firestore config — ordered, with weights — so the pipeline expects the right two or three from the first word instead of guessing fresh each turn.
- Provider routing by language capability. Our adapter layer knows which backend — OpenAI Realtime, ElevenLabs, Grok — handles a given language pair best, and routes that resident's device accordingly. There is no single best multilingual model.
- Code-switch-tolerant decoding and confirmation. We bias toward the resident's languages jointly rather than forcing one, and when an utterance straddles a switch with low confidence, Companion confirms intent rather than dropping the foreign-language span.
- Profiles that drift with the resident. When we observe a resident reverting, the language weights shift over time. The cohort changes; the config follows. A static profile would re-create the original problem six months later.
None of this is solved. Joint multilingual decoding is genuinely hard, and we still miss switches. But the woman drifting back into Polish should not have to claw her way back into English to be heard at her own bedside. Meeting her where her language actually lives — today, not the day she was admitted — is the whole point.