- Short-term buffer: Retain the last N turns (token-budget aware). Used for immediate coherence.
- Conversation summarizer: Rolling summaries compressed every M turns to reduce context size while preserving intent and entities.
- Long-term store: Semantic vectors in pgvector/Qdrant keyed by user/session, with metadata (topics, referenced URLs/files, task types), plus user preference embeddings for tone/style. Retrieval pulls top-K plus recency-weighted items.
- Entity store: Lightweight table for canonical entities (people, projects, repos) to keep names/pronouns consistent, with user persona facets (tone, verbosity, formality, interests).
- Artifacts cache: Links to generated images/meshes and code snippets with fingerprints for reuse.
- Definition: A checkpoint is a named snapshot containing: conversation summary, latest user goals, selected entities, retrieved docs, and pending tasks.
- Operations:
save_checkpoint(name, notes?)→ pins current snapshot.load_checkpoint(name)→ restores snapshot into active memory and surfaces notes to the model.list_checkpoints()→ returns available names + timestamps.delete_checkpoint(name)→ optional cleanup.
- Storage layout: Stored in relational DB rows with JSONB payloads; large docs referenced by vector IDs to avoid duplication.
- Client UX: UI exposes a dropdown/command palette to select or save checkpoints; CLI flag
--checkpoint <name>. Allow user persona selection (e.g., “concise”, “playful”, “formal”) and manual preference notes to be pinned into checkpoints.
- Decide if external context is required (classifier or heuristic: unknown entities, stale data, news-like queries).
- Issue web search via provider; fetch top results with scraper that strips boilerplate and enforces allow/deny lists.
- Chunk, embed, and store fetched docs with provenance (URL, title, timestamp).
- At response time, merge:
- Short-term buffer
- Rolling summary
- Retrieved docs (RAG)
- Checkpoint snapshot (if provided)
- User preference/persona vectors to steer tone, verbosity, and stylistic choices
- Track citations/provenance for transparency in UI.
- PostgreSQL for checkpoints, entity store, and metadata; Redis for hot buffers.
- Background jobs to age out stale memories, rebuild summaries, and re-embed changed checkpoints.
- Multi-tenant isolation: Namespace user/org IDs in vector and relational stores; encrypt sensitive fields at rest.
- Memory regression tests: Synthetic dialogs to ensure checkpoint restore injects correct entities/goals.
- Latency metrics: Track retrieval + embedding timing; alert on p95 spikes.
- Quality probes: Evaluate answers with/without checkpoints to verify uplift.