Skip to main content

Abby 2.0 Phase 1: The Memory Foundation — She Remembers Who You Are

· 5 min read
Creator, Parthenon
AI Development Assistant

Abby now builds a persistent research profile for every user, tracks conversation topics across turns, and assembles context through a ranked, budget-aware pipeline. Phase 1 of the Abby 2.0 cognitive architecture lays the memory foundation — moving from stateless Q&A to a personalized research assistant that gets better with every interaction.

Abby AI assistant

The Problem: Amnesia After Every Session

Abby 1.0 was impressive — RAG-enabled, database-aware, streaming, 18 page-specific personas. But she had a fundamental flaw: she forgot everything between sessions. ChromaDB stored conversations with a 90-day TTL, but that data was only used for retrieval similarity, not for understanding who you are.

Every conversation started from zero. A senior epidemiologist got the same explanations as a first-year student. A researcher who asked about diabetes every day never got "Welcome back — still working on that T2DM cohort?" The context pipeline dumped everything into the prompt without prioritization — help docs, RAG results, live queries, page data — hoping the LLM would sort it out within a 4K token window.


The Architecture: Four-Tier Memory with Budget-Aware Assembly

Phase 1 introduces a cognitive memory system inspired by human memory models:

Working Memory (session)     → Intent stack + scratch pad
Episodic Memory (per-user) → Research profile + conversation archive
Semantic Memory (domain) → Existing RAG collections (unchanged)
Institutional Memory (org) → Phase 6 (future)

1. Intent Stack — Tracking What You're Actually Doing

Conversations aren't random — they have threads. When you ask "What's the diabetes prevalence?" and then "Break it down by age," the second question is about diabetes, not a fresh topic.

The IntentStack is a bounded stack (max depth 3) that tracks active conversation topics using domain keyword detection across 10 clinical research areas. Topics expire after 10 turns of inactivity, and explicit topic changes ("Now let's look at hypertension") clear the stack.

stack = IntentStack(max_depth=3, expiry_turns=10)
stack.push("diabetes", turn=1)
stack.push("metformin", turn=2)
stack.get_context_string()
# → "Active conversation topics: diabetes, metformin"

2. Research Profile Learning — Understanding Who You Are

Every conversation teaches Abby something about you. The ProfileLearner extracts research interests, interaction preferences, and expertise levels using keyword matching and pattern detection — no LLM calls required.

# After a few conversations, Abby learns:
# - Research interests: diabetes, cardiovascular
# - Prefers: terse responses (detected "just give me the SQL")
# - Expertise: epidemiology (0.82)
# - Frequently used: T2DM concept sets, metformin cohorts

Key design decisions:

  • Immutable updateslearn_from_conversation() returns a new profile object, never mutates the input
  • Calibration requires 5+ interactions — a single basic question doesn't downgrade an expert
  • Exponential decay weighting — recent interactions matter more than old ones
  • User-visible and editable — the "My Research Profile" panel shows what Abby has learned, with a reset button

3. Context Assembly Pipeline — Every Token Counts

The old approach: concatenate help docs + RAG results + live queries + page data and hope for the best.

The new approach: a ranked, budget-aware assembler that scores every piece of context by relevance and allocates tokens within strict per-tier budgets:

TierMedGemma BudgetPriority
Working Memory1,500 tokensHighest — always included
Page Context500 tokensHigh
Live Database800 tokensHigh (when triggered)
Episodic Memory400 tokensMedium
Domain Knowledge600 tokensMedium
Institutional200 tokensLower

Safety-critical context (data quality warnings) gets guaranteed minimum allocation regardless of budget pressure. The assembler produces clean, sectioned prompts:

## Working Memory
Active conversation topics: diabetes, metformin

## User History
Research interests: diabetes, cardiovascular; Expertise: epidemiology (82%)

## Domain Knowledge
[Top RAG results by relevance...]

## Live Database Context
[Intent-triggered query results...]

4. PostgreSQL-Backed Conversation Archive

ChromaDB's 90-day TTL conversations are migrated to PostgreSQL with pgvector embeddings. The abby_messages table now carries a vector(384) column with an HNSW cosine index for fast similarity search.

During the 2-week migration period, the MigrationBridge queries PostgreSQL first and falls back to ChromaDB for older data, with automatic deduplication across sources.


What's New in the Stack

Database

  • abby_user_profiles table with TEXT[] and JSONB columns for flexible profile storage
  • embedding vector(384) + HNSW index on abby_messages for semantic search
  • Profile API: GET/PUT/POST reset under /api/v1/abby/profile

Python Memory Module (7 components)

  • IntentStack — bounded topic tracking with turn-based expiry
  • ScratchPad — session-scoped artifact storage (SQL drafts, cohort specs)
  • ContextAssembler — ranked, budget-aware prompt construction
  • ProfileLearner — keyword-based research interest and preference extraction
  • ConversationStore — PostgreSQL + pgvector similarity search
  • ConversationSummarizer — context window management via turn compression
  • MigrationBridge — dual-read for ChromaDB to PostgreSQL transition

Frontend

  • My Research Profile panel in the Abby chat interface — shows learned interests as teal tags, expertise as progress bars, and interaction preferences

Test Coverage

41 new unit tests across all 7 memory components + 3 integration tests verifying end-to-end pipeline behavior. 114 tests passing across the Python AI service.


What's Next: Phase 2 — Intelligence Upgrade

Phase 1 gives Abby memory. Phase 2 gives her a bigger brain.

The hybrid LLM architecture will route simple queries to MedGemma (local, fast) and complex reasoning to Claude API (cloud, powerful) through a two-stage classifier. A PHI sanitization layer ensures no patient data leaves the network, and a cost control system with circuit breaker prevents budget overruns.

The context assembly pipeline is already designed for extension — ContextAssembler.for_model("claude") will unlock a 128K-token budget profile with cloud-safety filtering.

Six phases total. Memory is the foundation everything else builds on.