EngineeringAugust 26, 2025·3 min read

Companion has to remember yesterday without re-reading it every turn.

Multi-turn memory for one resident, across a conversation and across days — what we keep verbatim, what we summarize, and what we deliberately throw away.

The cruelest failure for a bedside companion is forgetting. A resident tells Companion on Monday that her sister passed; on Tuesday Companion cheerfully asks how her sister is doing. There is no clever recovery from that. So memory isn't a feature we add for polish — it's the thing that decides whether the resident believes there is anyone there at all. And it fights, directly, with cost and latency.

Three layers, three lifetimes

We don't have one memory. We have three, each with a different lifetime and a different storage strategy.

Working turn buffer — the last several exchanges, kept verbatim in the live Realtime session. This is what makes pronouns and and then? work. It lives only for the conversation.
Resident profile — durable facts that should never decay: name, room, who the family is, hard preferences, standing clinical flags. Small, structured, in Firestore, hot in Redis.
Conversation summaries — a few sentences distilled from each day's talk. Asked about her daughter Carol's visit. Mentioned shoulder pain on the left side. Mood low in the evening. These are the bridge between days.

When a session opens, Companion is primed with the profile plus the last handful of summaries — not the raw transcripts. The resident experiences continuity; the context window only ever carries a few hundred tokens of history instead of a week of dialogue.

Why we summarize instead of just keeping everything

It is tempting to stuff the full history into every prompt — models have long context now. We don't, for three reasons. Cost: every token of history is re-billed on every single turn, and a chatty resident over a month is a lot of turns. Latency: a longer prompt is a slower time-to-first-token, and we are already fighting an 800ms budget. Quality: buried in ten thousand tokens of small talk, the one line that mattered — my chest felt tight last night — gets diluted. A tight summary keeps the signal near the top where the model actually attends to it.

Summarization is its own risk surface, so it isn't fully autonomous. The summary the model writes is an internal artifact for continuity, not a clinical record. Anything that looks like a symptom or a safety signal is also captured as a structured event and flows into the nurse-reviewed SOAP pipeline — because paraphrased by a model is not a standard we'd accept for a chart.

What we throw away on purpose

Forgetting is also a design choice. We do not retain audio — only events and summaries — so most of any given conversation simply evaporates, which is the privacy posture we want at a bedside. And we let day-to-day chatter decay: what she watched on TV last Tuesday should fade exactly the way it would for a human friend. The art is keeping the things a person would keep — Carol's name, the bad shoulder, the sister — and letting the rest go, so that the resident in 214B feels remembered rather than recorded.

conversational-aimemorycontext

Three layers, three lifetimes

Why we summarize instead of just keeping everything

What we throw away on purpose

30 days. One wing. Your numbers.