Skip to main content

Abby Architecture

Abby's architecture spans four layers: the frontend component library, the Python AI service, the ChromaDB vector database, and the Ollama language model runtime. Each layer is independently scalable and gracefully degrades if a dependency is unavailable.

System Overview

Abby RAG Architecture

Component Stack

ComponentTechnologyRole
FrontendReact 19, TypeScriptChat UI, typing indicators, source attribution, feedback
Backend ProxyLaravel 11, SanctumAuthentication, rate limiting, request routing
AI ServicePython 3.12, FastAPIRAG pipeline, embedding, prompt construction, chat orchestration
Vector StoreChromaDBEmbedding storage, semantic search, collection management
Embeddingssentence-transformers + SapBERTDual-model embedding for general and clinical content
Language ModelMedGemma 1.5 (4B) via OllamaResponse generation grounded in retrieved context
AccelerationApache SolrPre-computed 3D projections for the vector explorer
CacheRedisConversation state, embedding cache

Request Flow

When a user asks Abby a question, the following sequence executes:

Docker Services

Abby's backend is composed of four Docker services defined in the project's docker-compose.yml:

# AI Service (FastAPI)
python-ai:
build: docker/python/Dockerfile
ports: ["8002:8000"]
volumes:
- ./ai:/app
- ./docs:/app/docs:ro
- ./OHDSI-scraper/ohdsi_corpus:/app/ohdsi_corpus:ro
- ./OHDSI-scraper/book_of_ohdsi:/app/book_of_ohdsi:ro
- ./OHDSI-scraper/hades_vignettes:/app/hades_vignettes:ro
- ./OHDSI-scraper/ohdsi_forums:/app/ohdsi_forums:ro

# ChromaDB
chromadb:
image: chromadb/chroma:latest
environment:
- IS_PERSISTENT=TRUE
volumes:
- chromadb-data:/chroma/chroma

# Ollama (external, host-mounted)
# Accessed via OLLAMA_BASE_URL=http://host.docker.internal:11434

Embedding Models

Abby uses two embedding models, each optimized for a different type of content:

General Embedder: sentence-transformers (384-dim)

  • Model: all-MiniLM-L6-v2
  • Used by: docs, conversations_user_*, faq_shared
  • Strengths: Fast inference, excellent for matching natural language questions to documentation text
  • Index: HNSW with cosine similarity

Clinical Embedder: SapBERT (768-dim)

  • Model: cambridgeltl/SapBERT-from-PubMedBERT-fulltext
  • Used by: clinical_reference, ohdsi_papers
  • Strengths: Understands clinical synonymy (MI = myocardial infarction), abbreviations, and UMLS hierarchical relationships
  • Pre-training: Fine-tuned on PubMedBERT with UMLS concept pairs
Why two models?

Clinical terminology has unique properties — abbreviations, synonymy, and hierarchical relationships — that general-purpose embedders miss. SapBERT captures that "ibuprofen IS-A NSAID" and "heart attack = acute myocardial infarction" are semantically equivalent, which is critical for research literature retrieval.

Graceful Degradation

Each component degrades independently:

FailureImpactFallback
ChromaDB downNo RAG contextMedGemma generates from training data only
Ollama downNo response generationError message with suggestion to check AI service
Solr downNo 3D vector mapLive PCA+UMAP computed on-demand (~8-10s)
SapBERT model unavailableNo clinical embeddingsClinical collections skipped in retrieval
Redis downNo conversation cacheStateless responses (no memory persistence)

The health dashboard at Admin > System > Health monitors all components and displays yellow/red status indicators when degradation occurs.