A snappy assistant feels rude to an 86-year-old.
Elders process speech more slowly and pause longer mid-thought. Tuning Companion's pacing — silence windows, speaking rate, and prosody — is most of what makes it feel kind.
The first version of any voice assistant is built for the person who built it: a thirty-something who talks fast, pauses briefly, and wants the answer now. Point that same assistant at an 86-year-old in a hospital bed and it feels less like help and more like being rushed by a stranger. The technology is identical. The pacing is wrong, and pacing is most of what kind means in a conversation.
Pauses are not silence
Older adults often need longer to retrieve a word and longer to assemble a sentence, and the gap mid-thought — I wanted to ask about, oh, what is it... — can stretch for seconds. A standard end-of-turn detector reads that gap as the resident is done and starts talking. Now Companion has interrupted someone who was still thinking, which is the specific discourtesy that makes people stop using a device.
So our silence window at the bedside is deliberately generous compared to a consumer assistant. We would rather wait a beat too long than cut someone off mid-search-for-a-word. The cost is a little latency; the payoff is that the resident gets to finish their own sentence, every time. That tradeoff only makes sense because we also lean on warm acknowledgments — the system can afford to wait when it has cheap ways to signal it's still listening.
Slower on the way out, too
Pacing cuts both directions. A reply delivered at brisk podcast speed is hard to follow for someone with age-related hearing changes and slower auditory processing. So we shape Companion's outbound speech as much as its listening:
- Slower speaking rate — a measurably reduced words-per-minute versus the synthesizer's default, so each word has room to land.
- Real punctuation, real pauses — short sentences with breaks between clauses, because a wall of speech with no gaps is exhausting to parse.
- One idea per turn — we don't stack three questions into a single breath; we ask one thing and wait.
- Warm, even prosody — a steady, lower-energy delivery reads as calm and respectful; an upbeat consumer-assistant cadence reads, to many elders, as flippant.
None of this is a single setting. It's a posture that touches the VAD, the prompt, and the TTS voice config together, and we tune it against real elder voices rather than our own. The thing we are optimizing is not response time. It is the feeling of being given time.
The resident in 214B trails off looking for a word, finds it, finishes — and Companion was still there, having waited, and answers gently and unhurried. She never thinks about pacing. She just feels, correctly, that the voice in the room is in no rush to be done with her.