Skip to main content

Abby AI Assistant

Abby is Parthenon's built-in AI research assistant, powered by MedGemma and backed by a ChromaDB vector knowledge base. Abby helps researchers with concept search, cohort building suggestions, clinical result interpretation, genomic variant summarization, and general platform guidance — all within the context of the current page.

Overview

Abby combines a large language model (MedGemma 1.5 4B via Ollama) with retrieval-augmented generation (RAG) to provide accurate, context-aware responses. Rather than relying solely on the model's training data, Abby retrieves relevant knowledge from four specialized ChromaDB collections before generating each response. This ensures answers are grounded in Parthenon's documentation, the user's conversation history, community knowledge, and clinical reference data.

Air-gapped deployments

Because Abby runs entirely through Ollama and ChromaDB (both on-premises services), no data ever leaves your infrastructure. This makes Abby suitable for air-gapped healthcare environments with strict data governance requirements.

Knowledge Layers

Abby's ChromaDB brain is organized into four vector collections, each serving a distinct purpose:

1. Documentation

PropertyValue
Collectionparthenon_docs
ContentProject documentation, user manual chapters, API references
Size39,000+ chunks
IngestionAuto-ingested on AI service startup
Embedding Modelsentence-transformers

The documentation collection is the primary knowledge source. On startup, the AI service scans all Markdown and MDX files in the project, splits them into overlapping chunks, generates embeddings, and stores them in ChromaDB. This means Abby always has up-to-date knowledge of every Parthenon feature, configuration option, and workflow.

2. Conversation Memory

PropertyValue
Collectionconversation_memory
ContentPer-user question-and-answer pairs
TTL90 days (auto-pruned)
Embedding Modelsentence-transformers

Every interaction with Abby is stored as a vector embedding tied to the user's ID. When a user asks a follow-up question or revisits a topic, Abby retrieves relevant prior exchanges to maintain continuity. Conversation memory is scoped per-user — users never see each other's history. Entries older than 90 days are automatically pruned to keep the collection performant.

3. Shared FAQ

PropertyValue
Collectionshared_faq
ContentFrequently asked questions promoted from individual conversations
Embedding Modelsentence-transformers

When multiple users ask similar questions, the system can promote those Q&A pairs into the shared FAQ collection. This creates an organization-wide knowledge base that benefits all users. FAQ promotion can be triggered manually via the management API or configured to run automatically based on frequency thresholds.

Building institutional knowledge

The shared FAQ collection grows organically as your team uses Abby. Common questions about your specific OMOP CDM configuration, local data quirks, or institutional workflows get captured and reused — reducing repetitive support requests.

4. Clinical Reference

PropertyValue
Collectionclinical_reference
ContentOMOP concepts embedded with clinical semantics
Embedding ModelSapBERT (clinical domain)

The clinical reference collection contains OMOP concept embeddings generated using SapBERT, a biomedical language model specifically trained on UMLS concepts. This enables Abby to understand clinical terminology at a semantic level — for example, recognizing that "heart attack" and "acute myocardial infarction" refer to the same clinical concept, even when the exact term doesn't appear in the vocabulary tables.

How RAG Works

Retrieval-Augmented Generation (RAG) is the technique that connects Abby's knowledge base to the language model. Here is how a typical interaction flows:

  1. User asks a question — The question is sent to the Python FastAPI AI service along with the current page context.
  2. Retrieval — The AI service queries all four ChromaDB collections in parallel, using the question as a semantic search query. Each collection returns its most relevant chunks ranked by cosine similarity.
  3. Context assembly — Retrieved chunks are deduplicated, ranked by relevance, and injected into the system prompt alongside the page context.
  4. Generation — MedGemma receives the enriched prompt and generates a response grounded in the retrieved knowledge.
  5. Memory storage — The Q&A pair is embedded and stored in the user's conversation memory for future retrieval.

Page-Aware Context

Abby adapts her responses based on which page the user is currently viewing. The system defines 22 page contexts that tailor Abby's behavior, expertise framing, and suggested follow-up actions:

Page ContextAbby's Focus
Vocabulary BrowserConcept search strategies, hierarchy navigation, semantic matching
Concept Set BuilderInclusion/exclusion logic, descendant flags, concept mapping
Cohort BuilderCohort expression construction, criteria logic, temporal constraints
Cohort GenerationGeneration status, error troubleshooting, performance tips
CharacterizationFeature extraction setup, baseline characteristics, covariate selection
Incidence RatesRate calculation methodology, time-at-risk configuration
Treatment PathwaysPathway analysis design, event sequencing, sunburst interpretation
PLE / PLPEstimation methods, propensity scores, prediction model evaluation
Data ExplorerAchilles results interpretation, data quality metrics
Data IngestionSchema mapping guidance, concept mapping suggestions
GenomicsVariant interpretation, ClinVar annotations, tumor board summaries
ImagingDICOM viewer guidance, PACS connectivity
HEORCost-effectiveness modeling, care gap analysis
FHIR IntegrationSMART auth configuration, bulk export troubleshooting
AdministrationSystem configuration, user management, health monitoring

When no specific page context is detected, Abby defaults to a general research assistant persona with broad platform knowledge.

Dual Embedding Models

Abby uses two specialized embedding models to handle different types of content:

ModelPurposeUsed By
sentence-transformersGeneral-purpose text embeddingsDocumentation, Conversation Memory, Shared FAQ
SapBERTBiomedical / clinical concept embeddingsClinical Reference

sentence-transformers (specifically all-MiniLM-L6-v2) provides fast, high-quality embeddings for general text — documentation paragraphs, user questions, and FAQ entries. It runs efficiently on CPU and produces 384-dimensional vectors.

SapBERT (cambridgeltl/SapBERT-from-PubMedBERT-fulltext) is a biomedical language model pre-trained on UMLS concept relationships. It understands clinical synonymy, abbreviations, and hierarchical relationships between medical terms. This model is used exclusively for the clinical reference collection to ensure that concept searches capture semantic meaning, not just lexical overlap.

Why two models?

A general-purpose embedding model excels at matching natural language questions to documentation text. However, clinical terminology has unique properties — abbreviations (MI = myocardial infarction), synonymy (heart attack = acute MI), and hierarchical relationships (ibuprofen IS-A NSAID) — that require a domain-specific model to capture accurately.

Management Endpoints

The AI service exposes five ChromaDB management endpoints for administrative use. These are accessible via the Python FastAPI service (default port 8002).

EndpointMethodDescription
/chromadb/healthGETReturns ChromaDB connection status, collection counts, and total vectors stored
/chromadb/ingest-docsPOSTTriggers re-ingestion of all project documentation into the parthenon_docs collection
/chromadb/ingest-clinicalPOSTIngests OMOP concepts into the clinical_reference collection using SapBERT embeddings
/chromadb/promote-faqPOSTPromotes frequent Q&A pairs from conversation memory into the shared FAQ collection
/chromadb/prune-conversationsPOSTRemoves conversation memory entries older than the configured TTL (default: 90 days)

Usage Examples

# Check ChromaDB health
curl http://localhost:8002/chromadb/health

# Re-ingest documentation after updating docs
curl -X POST http://localhost:8002/chromadb/ingest-docs

# Ingest clinical concepts (run after vocabulary update)
curl -X POST http://localhost:8002/chromadb/ingest-clinical

# Promote frequently asked questions to shared FAQ
curl -X POST http://localhost:8002/chromadb/promote-faq

# Prune old conversation memory
curl -X POST http://localhost:8002/chromadb/prune-conversations
Re-ingestion timing

The ingest-docs and ingest-clinical endpoints perform full re-ingestion, which can take several minutes depending on the volume of content. Run these during maintenance windows to avoid temporary gaps in retrieval quality during the ingestion process.

System Health

ChromaDB is monitored as part of the Parthenon health dashboard at Admin > System > Health. The health check verifies:

  • ChromaDB service is reachable
  • All four collections exist and are queryable
  • Total vector count is within expected range
  • Embedding model endpoints are responsive

If ChromaDB is unreachable, Abby gracefully degrades to operating without RAG context — responses will still be generated by MedGemma but without the benefit of retrieved knowledge. A yellow status appears on the health dashboard when ChromaDB is degraded, and red when it is completely unreachable.

After vocabulary updates

When you update the OMOP vocabulary via Admin > System > Vocabulary, remember to re-ingest clinical concepts by calling the /chromadb/ingest-clinical endpoint. This ensures Abby's clinical reference collection reflects the latest concept additions and deprecations.