EngineeringApril 11, 2026·3 min read

If it can't say her name, nothing else it says counts.

TTS engines mangle names and drug names confidently and constantly. Custom lexicons and phoneme overrides are how Companion gets Mrs. Okafor — and metoprolol — right every time.

The first thing Companion says to a new resident is her name. If it says it wrong — flattens Okafor, anglicizes Nguyen, stresses the wrong syllable of Siobhan — the relationship starts in a hole. To the resident, a device that can't say her name is a device that doesn't know her, and she's right. Everything warm and clever the system does afterward is discounted against that first small disrespect.

Why TTS gets names wrong, confidently

Text-to-speech converts graphemes (letters) to phonemes (sounds) using a model trained on a language's average spelling-to-sound patterns. Names break the average on purpose. They come from every language, they don't follow English orthography, and the same spelling has different correct pronunciations for different people. The engine has no way to know that this Okafor is Igbo and stresses the second syllable. It guesses from English defaults, and it guesses with total confidence — no hesitation, no flag, just a wrong name said smoothly.

Medical terms are the same failure with higher stakes. Drug names are coined, not inherited — metoprolol, furosemide, escitalopram — and a synthesizer will routinely misplace stress or swallow a syllable. In a medication reminder, a garbled drug name is worse than a missing one, because it sounds authoritative while being unintelligible.

Lexicons and phoneme overrides

The fix is to stop letting the model guess for the words that matter. We maintain a pronunciation lexicon — a mapping from specific written forms to explicit phoneme strings — and feed it to the synthesizer so it speaks our spelling, not its statistical default.

<!-- IPA override so the engine stops guessing -->
<phoneme alphabet="ipa" ph="oʊˈkɑː.fɔːr">Okafor</phoneme>
is due for
<phoneme alphabet="ipa" ph="mɛˈtoʊ.prə.lɒl">metoprolol</phoneme>

The lexicon is built and maintained in layers:

A shared clinical lexicon of drug names, conditions, and procedures, written once and reused across every facility.
A per-resident name entry, captured at onboarding — ideally confirmed with the resident or family, because they are the authority on their own name, not us.
Per-provider phoneme dialects, because not every TTS engine accepts the same phoneme alphabet, so the same lexicon entry compiles down differently depending on which provider the adapter routed this device to.
A fallback path for engines that don't support <phoneme> at all, where we approximate with respelling — oh-KAH-for — which is cruder but better than the default guess.

That last point is the recurring tax of working across providers: a correct pronunciation isn't one fact, it's one fact times the number of engines, each with its own markup support and its own quiet failures. The lexicon abstraction is what keeps that from leaking into every reply.

So Companion greets the resident in 214B by name — Good morning, Mrs. Okafor — and gets it right, the way her granddaughter says it, on the first morning and every morning after. She has no idea there's an IPA string behind it or that we had to spell it three ways for three vendors. She just knows the voice in her room knows who she is. That is where trust starts, and it starts with one word.

ttspronunciationlexicon

Why TTS gets names wrong, confidently

Lexicons and phoneme overrides

30 days. One wing. Your numbers.