She isn't finished. She's looking for the word.
For our residents, the difference between a thinking pause and the end of a turn is the whole problem. Interrupting a word search is the cruelest failure Companion can commit — so we built around it.
I wanted to ask about my, oh, what do you call it... The resident in 214B stops. Three seconds pass. To every consumer voice assistant on the market, three seconds of silence means the turn is over and it's time to talk. But she isn't finished — she's searching for a word, the way an 88-year-old with mild cognitive decline searches for a word, and if Companion talks now it has interrupted her mid-thought. This is the most important turn-detection problem we have, because for our population the thinking pause is not an edge case. It's the normal case.
Why this is the case that matters
Younger, fluent speakers pause briefly and predictably; their gaps cluster tightly, so a fixed timeout mostly works. Elder speech does not behave that way. Word-finding difficulty, slower lexical retrieval, breath control, and the cognitive load of assembling a sentence all stretch mid-utterance pauses out to two, three, sometimes four seconds — and those long pauses are inside the turn, not at the end of it. The distribution of pause lengths for our residents overlaps almost entirely between 'still thinking' and 'done.' That overlap is the hard part. You cannot separate the two populations of pauses with a single threshold, because they aren't two populations. They're the same lengths of silence meaning opposite things.
And the stakes are lopsided in a way that should govern every design choice: getting cut off mid-word-search is not a minor annoyance for this resident. It's a small humiliation, a confirmation that she's too slow, and a reliable way to make her stop talking to the device entirely. We would rather wait too long a hundred times than cut her off once.
Adaptive per-resident windows
So we stopped looking for the right global timeout and started learning a window per resident. Over a resident's first days on Companion, we measure her own pause distribution from nurse-reviewed event logs — not audio recordings — and set her silence window from her own behavior, not the wing's average.
- A resident whose mid-thought pauses run long gets a generous window — we've set individuals as high as 3.5s — and accept the latency as the cost of never cutting her off.
- A resident who speaks in short, fluent bursts gets a tighter window so she isn't left waiting on dead air.
- The window is contextual, too: it widens after a known filler (um, oh, let me see) and after question words that signal a longer sentence is coming, and narrows after a clearly complete, confidently delivered clause.
The averages are exactly what we're refusing to trust. A wing-wide median window can be perfect for the wing and wrong for the one resident with aphasia whose pauses are twice everyone else's — and she is precisely the resident the system most needs to get right.
Filler and breath as 'keep listening' cues
The richest evidence that a pause is a thinking pause often arrives just before the silence. People rarely trail off into a word search cleanly; they signal it. A drawn-out uhh, an audible breath, a small hmm, a rising-then-held pitch that hasn't resolved — these are acoustic flags for I'm not done, give me a second. We treat them as explicit hold cues: detect one and the silence window that follows is extended, not started fresh, because the resident has already told us she's still composing. A falling, settled pitch into silence is the opposite cue, and only then do we let the window run short.
None of this is a single knob. It's a posture — that with this population, silence is the default state of thinking, and the burden of proof is on the system to show she's finished before it dares to speak. The payoff is quiet and complete: the resident in 214B reaches for a word, finds it, finishes her own sentence — and Companion was still there, having waited, and answers her like it had all the time in the world. Which, for her, it does.