Abby 2.0 Phase 1: The Memory Foundation — She Remembers Who You Are
Abby now builds a persistent research profile for every user, tracks conversation topics across turns, and assembles context through a ranked, budget-aware pipeline. Phase 1 of the Abby 2.0 cognitive architecture lays the memory foundation — moving from stateless Q&A to a personalized research assistant that gets better with every interaction.
The Problem: Amnesia After Every Session
Abby 1.0 was impressive — RAG-enabled, database-aware, streaming, 18 page-specific personas. But she had a fundamental flaw: she forgot everything between sessions. ChromaDB stored conversations with a 90-day TTL, but that data was only used for retrieval similarity, not for understanding who you are.
Every conversation started from zero. A senior epidemiologist got the same explanations as a first-year student. A researcher who asked about diabetes every day never got "Welcome back — still working on that T2DM cohort?" The context pipeline dumped everything into the prompt without prioritization — help docs, RAG results, live queries, page data — hoping the LLM would sort it out within a 4K token window.
The Architecture: Four-Tier Memory with Budget-Aware Assembly
Phase 1 introduces a cognitive memory system inspired by human memory models:
Working Memory (session) → Intent stack + scratch pad
Episodic Memory (per-user) → Research profile + conversation archive
Semantic Memory (domain) → Existing RAG collections (unchanged)
Institutional Memory (org) → Phase 6 (future)
1. Intent Stack — Tracking What You're Actually Doing
Conversations aren't random — they have threads. When you ask "What's the diabetes prevalence?" and then "Break it down by age," the second question is about diabetes, not a fresh topic.
The IntentStack is a bounded stack (max depth 3) that tracks active conversation topics using domain keyword detection across 10 clinical research areas. Topics expire after 10 turns of inactivity, and explicit topic changes ("Now let's look at hypertension") clear the stack.
stack = IntentStack(max_depth=3, expiry_turns=10)
stack.push("diabetes", turn=1)
stack.push("metformin", turn=2)
stack.get_context_string()
# → "Active conversation topics: diabetes, metformin"
2. Research Profile Learning — Understanding Who You Are
Every conversation teaches Abby something about you. The ProfileLearner extracts research interests, interaction preferences, and expertise levels using keyword matching and pattern detection — no LLM calls required.
# After a few conversations, Abby learns:
# - Research interests: diabetes, cardiovascular
# - Prefers: terse responses (detected "just give me the SQL")
# - Expertise: epidemiology (0.82)
# - Frequently used: T2DM concept sets, metformin cohorts
Key design decisions:
- Immutable updates —
learn_from_conversation()returns a new profile object, never mutates the input - Calibration requires 5+ interactions — a single basic question doesn't downgrade an expert
- Exponential decay weighting — recent interactions matter more than old ones
- User-visible and editable — the "My Research Profile" panel shows what Abby has learned, with a reset button
3. Context Assembly Pipeline — Every Token Counts
The old approach: concatenate help docs + RAG results + live queries + page data and hope for the best.
The new approach: a ranked, budget-aware assembler that scores every piece of context by relevance and allocates tokens within strict per-tier budgets:
| Tier | MedGemma Budget | Priority |
|---|---|---|
| Working Memory | 1,500 tokens | Highest — always included |
| Page Context | 500 tokens | High |
| Live Database | 800 tokens | High (when triggered) |
| Episodic Memory | 400 tokens | Medium |
| Domain Knowledge | 600 tokens | Medium |
| Institutional | 200 tokens | Lower |
Safety-critical context (data quality warnings) gets guaranteed minimum allocation regardless of budget pressure. The assembler produces clean, sectioned prompts:
## Working Memory
Active conversation topics: diabetes, metformin
## User History
Research interests: diabetes, cardiovascular; Expertise: epidemiology (82%)
## Domain Knowledge
[Top RAG results by relevance...]
## Live Database Context
[Intent-triggered query results...]
4. PostgreSQL-Backed Conversation Archive
ChromaDB's 90-day TTL conversations are migrated to PostgreSQL with pgvector embeddings. The abby_messages table now carries a vector(384) column with an HNSW cosine index for fast similarity search.
During the 2-week migration period, the MigrationBridge queries PostgreSQL first and falls back to ChromaDB for older data, with automatic deduplication across sources.
What's New in the Stack
Database
abby_user_profilestable withTEXT[]andJSONBcolumns for flexible profile storageembedding vector(384)+ HNSW index onabby_messagesfor semantic search- Profile API: GET/PUT/POST reset under
/api/v1/abby/profile
Python Memory Module (7 components)
IntentStack— bounded topic tracking with turn-based expiryScratchPad— session-scoped artifact storage (SQL drafts, cohort specs)ContextAssembler— ranked, budget-aware prompt constructionProfileLearner— keyword-based research interest and preference extractionConversationStore— PostgreSQL + pgvector similarity searchConversationSummarizer— context window management via turn compressionMigrationBridge— dual-read for ChromaDB to PostgreSQL transition
Frontend
- My Research Profile panel in the Abby chat interface — shows learned interests as teal tags, expertise as progress bars, and interaction preferences
Test Coverage
41 new unit tests across all 7 memory components + 3 integration tests verifying end-to-end pipeline behavior. 114 tests passing across the Python AI service.
What's Next: Phase 2 — Intelligence Upgrade
Phase 1 gives Abby memory. Phase 2 gives her a bigger brain.
The hybrid LLM architecture will route simple queries to MedGemma (local, fast) and complex reasoning to Claude API (cloud, powerful) through a two-stage classifier. A PHI sanitization layer ensures no patient data leaves the network, and a cost control system with circuit breaker prevents budget overruns.
The context assembly pipeline is already designed for extension — ContextAssembler.for_model("claude") will unlock a 128K-token budget profile with cloud-safety filtering.
Six phases total. Memory is the foundation everything else builds on.
