Skip to main content

Abby 2.0 Phase 2: The Intelligence Upgrade — A Hybrid Brain with Safety Rails

· 5 min read
Creator, Parthenon
AI Development Assistant

Abby now has two brains. Simple queries stay local on MedGemma (fast, free). Complex reasoning escalates to Claude via API (powerful, cloud). A PHI sanitizer blocks any patient data from leaving the network, and a cost tracker with circuit breaker keeps spending within budget. Researchers get smarter answers on hard questions without compromising privacy or breaking the bank.

Abby AI assistant

The Problem: One Brain Isn't Enough

MedGemma 1.5 4B is fast and medically literate — perfect for "What is concept 201826?" or "How many patients in our CDM?" But ask it to design a complex cohort with temporal logic, critique a study methodology for immortal time bias, or synthesize information across multiple memory tiers into a coherent research recommendation — and it hits its ceiling.

The answer isn't to replace MedGemma. It's to route the right questions to the right brain.


The Architecture: Two-Stage Routing

Every message flows through a two-stage router before hitting any LLM:

User Message
|
Stage 1: Deterministic Rules (<1ms, zero cost)
| "Create a cohort..." → CLOUD (action word detected)
| "Hello Abby" → LOCAL (greeting detected)
| "What is concept X?" → LOCAL (simple lookup)
| 200+ chars with multiple clauses → CLOUD (complexity signal)
|
Stage 2: Bootstrap Scoring (when Stage 1 is uncertain)
| Complexity indicators → boost cloud score
| Simplicity indicators → boost local score
| Tie? → Cloud wins (err toward capability)

~70% of requests stay local. Greetings, lookups, navigation help, simple questions — all handled by MedGemma in under 2 seconds at zero cost. The remaining 30% — cohort design, methodology review, multi-step reasoning, analysis interpretation — go to Claude.

Users never see the routing. Abby is one persona. The response just gets smarter when the question warrants it.

Bootstrap → Classifier Pipeline

The router starts with deterministic rules and heuristic scoring. As users provide thumbs-up/down feedback, each rating is stored alongside the routing decision (which model answered, why it was routed there). Once 500+ labeled samples accumulate, a fine-tuned distilbert classifier can replace the bootstrap scoring for even better routing accuracy.


PHI Protection: Defense in Depth

Clinical data must never leave the network. We enforce this at two levels:

Primary Defense: Architectural Boundary

The Context Assembly Pipeline has an explicit allowlist. Only approved data sources can enter a cloud-bound prompt:

Allowed (cloud-safe)Blocked (never sent)
User's natural language querycdm.person (individual records)
Aggregate statisticscdm.visit_occurrence
Concept/vocabulary namescdm.condition_occurrence
Cohort definition structurecdm.drug_exposure
Help documentationcdm.measurement
Institutional knowledgeAll individual-level clinical tables

Secondary Defense: PHI Sanitizer

A PHISanitizer runs on every cloud-bound prompt as defense-in-depth:

  • Regex detection: SSN patterns, MRN (with medical record context), phone numbers (with contact context), email addresses, dates of birth (with DOB context)
  • NER detection: spaCy en_core_web_sm PERSON entity recognition for name detection
  • Clinical context guard: Prevents false positives on OMOP concept IDs (e.g., "Concept ID 4329847" is NOT a medical record number)
  • Circuit breaker: If PHI is detected, the cloud request is blocked entirely (not just redacted) and falls back to MedGemma locally

Audit Trail

Every cloud API call is logged to abby_cloud_usage with: timestamp, user_id, department, token counts, cost, SHA-256 hash of the sanitized payload, redaction count, and routing reason.


Cost Controls: Budget Enforcement with Circuit Breaker

Cloud API costs are tracked per-request and enforced per-month:

  • Multi-tier alerting: Alerts at 50%, 80%, and 95% of monthly budget
  • Circuit breaker at 95%: Cloud routing is disabled. All requests fall back to MedGemma with a user-visible confidence indicator showing "low" confidence
  • Degraded mode: When budget is exhausted, Abby prepends: "Note: This response was generated locally due to usage limits. For a more thorough analysis, try again later."
  • Per-user tracking: Every cloud call records the user and department for cost attribution

Confidence & Attribution

Every response now carries:

  • Confidence indicator: high (Claude + context available), medium (MedGemma + context), low (degraded mode / budget exhausted)
  • Routing metadata: Which model answered, why it was routed there, which stage decided
  • Source attribution: Context pieces are tagged with tier and source for traceability

What Shipped

ComponentTestsPurpose
RuleRouter14Two-stage routing with deterministic rules + bootstrap scoring
ClaudeClient5Anthropic SDK wrapper with cost estimation
PHISanitizer15Regex + spaCy NER detection with clinical context guard
CloudSafetyFilter7Allowlist-based content protection
CostTracker6Budget enforcement with circuit breaker
Context Assembler (Claude profile)928K token budget for Claude's 128K context window
Pipeline IntegrationHybrid routing wired into chat endpoint
Integration Tests6End-to-end routing, safety, and budget verification

169 tests passing across the Python AI service.


What's Next: Phase 3 — Semantic Knowledge Graph

Phase 2 gave Abby a bigger brain. Phase 3 gives her deeper understanding.

Abby will gain a structured understanding of OMOP concept relationships — not just keyword matching, but true inference. She'll know that metformin is a drug for Type 2 diabetes mellitus, which is a subtype of diabetes mellitus. She'll know which domains have sparse data in your CDM and warn you before you build a cohort on a foundation of missing records.

Materialized views over the existing concept_relationship and concept_ancestor tables, a Redis cache for hot paths, and a KnowledgeGraphService with hierarchy traversal — no separate graph database needed.