EngineeringMay 6, 2025·2 min read

One wire protocol, many voices.

Why Companion firmware speaks only OpenAI Realtime — and how the server quietly swaps the model behind it.

Voice models move fast. ElevenLabs ships a warmer turn-taking model on a Tuesday; OpenAI lowers Realtime latency a week later; Grok lands a new voice the following month. If the ESP32-S3 inside every Companion is wired directly to one vendor's protocol, every one of those improvements becomes an OTA campaign across hundreds of bedside devices. That is the wrong place for the seam.

So we drew the seam on the server. The CoreS3 firmware speaks exactly one wire format — the OpenAI Realtime event schema — over a WebSocket to our Go API. That is the only protocol it will ever know. The API is the translator.

The adapter layer

Behind the socket, every upstream provider implements a single Go interface — Realtime — with methods for session lifecycle, audio in, audio out, tool calls, and interruption. ElevenLabs ConvAI, OpenAI Realtime, and Grok each have their own adapter that maps our internal events to their dialect and back. It is the adapter pattern wearing hexagonal clothes: the device is the port, the providers are interchangeable implementations.

Routing is per-device. When a Companion connects, we resolve its provider config from Firestore, cache it in Redis, and instantiate the matching adapter for that session. A facility piloting ElevenLabs and one on OpenAI can sit in the same rack. Switching a device is a Firestore write, not a firmware push.

Why centralize the translation

On-device abstraction would have meant shipping three SDKs to a microcontroller with finite flash, and re-shipping them every time a provider revs their schema. Centralizing on the server keeps the firmware boring and the cloud sharp. New provider? New adapter, one deploy.

For the resident in 214B, none of this is visible. The night a vendor releases a more natural voice, the Companion on her overbed table sounds a little warmer when she asks for water. No update prompt, no downtime, no one in the hallway with a laptop. Just a better conversation, the same evening.

realtimearchitecturevoice

See it in a wing

30 days. One wing. Your numbers.

Ten Companion units, cellular preconfigured, ready in week one. Weekly outcome reports auto-emailed.

Schedule a 20-minute call →