Companion listens for the resident, not the hallway.
Why CareOS runs voice activity detection on the CoreS3 itself — and how a simple RMS gate keeps a bedside conversation feeling like a conversation.
A resident in a hospital bed cannot reach for a button. Companion is hands-free by design — no wake word, no tap-to-talk. The resident just speaks. Which means the device has to figure out, on its own, when a sentence begins and when it ends, while ignoring the TV down the hall, a roommate snoring, and two aides talking in the doorway.
A small gate on a small CPU
The CoreS3 firmware runs a lightweight voice activity detector in front of the realtime session. Every 20ms frame of 16kHz PCM gets an RMS energy estimate. Above an RMS threshold of ~300 we treat the frame as speech and start streaming. Below it we count silence. After 2 seconds of continuous silence we close the utterance and let the model reply. A short hysteresis hold keeps brief mid-sentence pauses — the breath between I was thinking and about my daughter — from cutting the resident off.
Tradeoffs you can hear
VAD is a tuning problem with two failure modes. A false positive — the TV triggers a session — wastes tokens and makes Companion talk over the room. A false negative — the device misses a soft-spoken question — feels rude, like being ignored. We bias the threshold toward the resident's voice at bedside distance and let the silence window do the rest of the work.
Running VAD on-device matters. If we shipped every microphone frame upstream, every snore and every game show would open a realtime session, burn audio minutes, and add round-trip latency to the moments that are actually speech. The gate stays local; only real utterances reach CareOS.
The product effect is the part that matters. A resident speaks, pauses to think, finishes — and Companion replies on the rhythm of the conversation, not after a robotic timeout.