Abby 2.0 Phase 1: The Memory Foundation — She Remembers Who You Are

March 16, 2026 · 5 min read

Creator, Parthenon

AI Development Assistant

Abby now builds a persistent research profile for every user, tracks conversation topics across turns, and assembles context through a ranked, budget-aware pipeline. Phase 1 of the Abby 2.0 cognitive architecture lays the memory foundation — moving from stateless Q&A to a personalized research assistant that gets better with every interaction.

The Problem: Amnesia After Every Session

Abby 1.0 was impressive — RAG-enabled, database-aware, streaming, 18 page-specific personas. But she had a fundamental flaw: she forgot everything between sessions. ChromaDB stored conversations with a 90-day TTL, but that data was only used for retrieval similarity, not for understanding who you are.

Every conversation started from zero. A senior epidemiologist got the same explanations as a first-year student. A researcher who asked about diabetes every day never got "Welcome back — still working on that T2DM cohort?" The context pipeline dumped everything into the prompt without prioritization — help docs, RAG results, live queries, page data — hoping the LLM would sort it out within a 4K token window.

The Architecture: Four-Tier Memory with Budget-Aware Assembly

Phase 1 introduces a cognitive memory system inspired by human memory models:

Working Memory (session)     → Intent stack + scratch pad
Episodic Memory (per-user)   → Research profile + conversation archive
Semantic Memory (domain)     → Existing RAG collections (unchanged)
Institutional Memory (org)   → Phase 6 (future)

1. Intent Stack — Tracking What You're Actually Doing

Conversations aren't random — they have threads. When you ask "What's the diabetes prevalence?" and then "Break it down by age," the second question is about diabetes, not a fresh topic.

The IntentStack is a bounded stack (max depth 3) that tracks active conversation topics using domain keyword detection across 10 clinical research areas. Topics expire after 10 turns of inactivity, and explicit topic changes ("Now let's look at hypertension") clear the stack.

stack = IntentStack(max_depth=3, expiry_turns=10)
stack.push("diabetes", turn=1)
stack.push("metformin", turn=2)
stack.get_context_string()
# → "Active conversation topics: diabetes, metformin"

2. Research Profile Learning — Understanding Who You Are

Every conversation teaches Abby something about you. The ProfileLearner extracts research interests, interaction preferences, and expertise levels using keyword matching and pattern detection — no LLM calls required.

# After a few conversations, Abby learns:
# - Research interests: diabetes, cardiovascular
# - Prefers: terse responses (detected "just give me the SQL")
# - Expertise: epidemiology (0.82)
# - Frequently used: T2DM concept sets, metformin cohorts

Key design decisions:

Immutable updates — learn_from_conversation() returns a new profile object, never mutates the input
Calibration requires 5+ interactions — a single basic question doesn't downgrade an expert
Exponential decay weighting — recent interactions matter more than old ones
User-visible and editable — the "My Research Profile" panel shows what Abby has learned, with a reset button

3. Context Assembly Pipeline — Every Token Counts

The old approach: concatenate help docs + RAG results + live queries + page data and hope for the best.

The new approach: a ranked, budget-aware assembler that scores every piece of context by relevance and allocates tokens within strict per-tier budgets:

Tier	MedGemma Budget	Priority
Working Memory	1,500 tokens	Highest — always included
Page Context	500 tokens	High
Live Database	800 tokens	High (when triggered)
Episodic Memory	400 tokens	Medium
Domain Knowledge	600 tokens	Medium
Institutional	200 tokens	Lower

Safety-critical context (data quality warnings) gets guaranteed minimum allocation regardless of budget pressure. The assembler produces clean, sectioned prompts:

## Working Memory
Active conversation topics: diabetes, metformin

## User History
Research interests: diabetes, cardiovascular; Expertise: epidemiology (82%)

## Domain Knowledge
[Top RAG results by relevance...]

## Live Database Context
[Intent-triggered query results...]

4. PostgreSQL-Backed Conversation Archive

ChromaDB's 90-day TTL conversations are migrated to PostgreSQL with pgvector embeddings. The abby_messages table now carries a vector(384) column with an HNSW cosine index for fast similarity search.

During the 2-week migration period, the MigrationBridge queries PostgreSQL first and falls back to ChromaDB for older data, with automatic deduplication across sources.

What's New in the Stack

Database

abby_user_profiles table with TEXT[] and JSONB columns for flexible profile storage
embedding vector(384) + HNSW index on abby_messages for semantic search
Profile API: GET/PUT/POST reset under /api/v1/abby/profile

Python Memory Module (7 components)

IntentStack — bounded topic tracking with turn-based expiry
ScratchPad — session-scoped artifact storage (SQL drafts, cohort specs)
ContextAssembler — ranked, budget-aware prompt construction
ProfileLearner — keyword-based research interest and preference extraction
ConversationStore — PostgreSQL + pgvector similarity search
ConversationSummarizer — context window management via turn compression
MigrationBridge — dual-read for ChromaDB to PostgreSQL transition

Frontend

My Research Profile panel in the Abby chat interface — shows learned interests as teal tags, expertise as progress bars, and interaction preferences

Test Coverage

41 new unit tests across all 7 memory components + 3 integration tests verifying end-to-end pipeline behavior. 114 tests passing across the Python AI service.

What's Next: Phase 2 — Intelligence Upgrade

Phase 1 gives Abby memory. Phase 2 gives her a bigger brain.

The hybrid LLM architecture will route simple queries to MedGemma (local, fast) and complex reasoning to Claude API (cloud, powerful) through a two-stage classifier. A PHI sanitization layer ensures no patient data leaves the network, and a cost control system with circuit breaker prevents budget overruns.

The context assembly pipeline is already designed for extension — ContextAssembler.for_model("claude") will unlock a 128K-token budget profile with cloud-safety filtering.

Six phases total. Memory is the foundation everything else builds on.

The Problem: Amnesia After Every Session​

The Architecture: Four-Tier Memory with Budget-Aware Assembly​

1. Intent Stack — Tracking What You're Actually Doing​

2. Research Profile Learning — Understanding Who You Are​

3. Context Assembly Pipeline — Every Token Counts​

4. PostgreSQL-Backed Conversation Archive​

What's New in the Stack​

Database​

Python Memory Module (7 components)​

Frontend​

Test Coverage​

What's Next: Phase 2 — Intelligence Upgrade​