EngineeringMay 7, 2026·3 min read

Hydralazine and hydroxyzine are one transcription error apart, and they do opposite things.

Drug names are long, rare, foreign-rooted, and full of near-homophones that general speech-to-text was never trained to tell apart. A single wrong med name in a SOAP note is not a typo — it's a patient-safety event.

A resident mentions a new pill, and the speech-to-text layer has to decide between hydralazine (lowers blood pressure) and hydroxyzine (an antihistamine and sedative). They differ by a few phonemes and they do nearly opposite things in a frail body. A general ASR model has heard the word the a billion times and hydralazine almost never. When it guesses, it guesses toward the common word — and in medicine the common word is frequently the wrong one. This is the failure mode that keeps us most honest, because here a transcription error becomes a clinical one.

Why medical vocabulary is uniquely hostile to ASR

Drug names are rare by construction. They are coined, often Latin- or Greek-rooted, and almost absent from the conversational corpora ASR is trained on. The language model's prior actively fights the correct answer.
They cluster into near-homophones. Hydralazine / hydroxyzine, Celebrex / Celexa, Klonopin / clonidine, Zantac / Xanax. The set of confusable names is a known, published patient-safety hazard precisely because humans mishear them too.
They're long and multisyllabic. A four- or five-syllable word gives noise, dysarthria, or a denture-softened consonant many places to corrupt — and one corrupted syllable can land you on a different real drug.
Elderly speakers mangle the hard part. The distinguishing syllable is often a fricative or stop the resident can no longer produce cleanly, so the exact phoneme that separates two drugs is the one most likely lost.

The asymmetry of cost is the whole point. A wrong conjunction is invisible. A wrong drug name in a nurse-reviewed SOAP note is a hazard that a busy human might not catch on the second read.

Biasing is necessary and not sufficient

The standard tool is biasing: hand the recognizer a list of phrase hints so rare terms become candidates it will actually consider. We do this aggressively, but it is a scalpel, not a cure. Bias too broadly — load the entire formulary — and you make every common word sound like a drug, inventing medications out of ordinary speech. Bias too narrowly and you miss the one the resident actually takes. The list has to be small, specific, and right.

The safest med name is the one the recognizer was never allowed to guess at — because we confirmed it instead.

What we do about it

Per-resident formulary biasing. Each device is primed with that resident's actual medication list, not the whole pharmacopeia. The candidates are the drugs they really take, so furosemide is reachable and a thousand irrelevant homophones are not.
Confusable-pair awareness. When a transcript lands on a name with a known dangerous twin, we treat it as low-confidence by default and confirm out loud, regardless of the model's stated confidence.
Never silently commit a drug name. Companion does not write a medication into a SOAP note from a single ambiguous utterance. It confirms with the resident, and the note is still nurse-reviewed before it counts.
Events, not diagnosis. We capture that a resident mentioned a medication and how, as a reviewable event — we are not autonomously reconciling their meds. The human stays in the loop on exactly the words where being wrong is most expensive.

We would rather Companion ask did you mean hydralazine? one extra time than write hydroxyzine into the record once. For the resident in 214B, the cost of a clarifying question is two seconds. The cost of the wrong med name in her chart is the kind of thing we built this whole system to never let happen.

sttvocabularymedical

Why medical vocabulary is uniquely hostile to ASR

Biasing is necessary and not sufficient

What we do about it

30 days. One wing. Your numbers.