EngineeringAugust 4, 2025·2 min read

Why every audio frame in CareOS is 16 kHz mono PCM.

Speech tops out around 8 kHz of bandwidth. 16 kHz is the honest sample rate — anything higher just burns facility wifi.

Every voice path in CareOS — the mic on Companion, the WebSocket frames flowing to the realtime provider, the TTS bytes coming back for playback — is 16 kHz mono 16-bit little-endian PCM. One format, end to end. The choice is not arbitrary, and it is not nostalgia for a phone codec.

Nyquist, and where speech actually lives

Nyquist's theorem says a sample rate of Fs can faithfully represent frequencies up to Fs/2. Human speech intelligibility — the formants that distinguish vowels, the high-frequency energy in fricatives like s and f — sits almost entirely under 8 kHz. Sample at 16 kHz and you capture all of it. Music is a different problem: cymbals and air around a violin live up at 15-20 kHz, which is why 44.1 and 48 kHz exist. A resident asking for help at 2 a.m. is not a string quartet.

Going down to 8 kHz — the old telephone rate — clips everything above 4 kHz. Sibilance collapses. Six and fix start to sound alike, and the whole conversation acquires that tinny, far-away quality people associate with a bad phone line. That is exactly the feeling we are trying to avoid at the bedside.

Why uniform beats higher

Most production ASR and TTS models — including the ones we use through the realtime provider abstraction — are trained and served at 16 kHz. Sending 48 kHz audio means the provider resamples it down to 16 kHz before inference anyway. We would triple our upstream bandwidth and add resampling latency on both ends for audio the model literally throws away.

Keeping one rate across mic, network, and playback means no swr or libsamplerate calls in the hot path, no fractional-delay artifacts, and no buffer size mismatches between capture and TTS. The conversation sounds close and warm, and it still works in the basement room with one bar of wifi — because we are sending the bytes that matter and none of the ones that do not.

audioformatspeech

See it in a wing

30 days. One wing. Your numbers.

Ten Companion units, cellular preconfigured, ready in week one. Weekly outcome reports auto-emailed.

Schedule a 20-minute call →