Parthenon Blog

Introducing Parthenon: Transforming Healthcare with AI-Powered Outcomes Research

Thu, 31 Dec 2099 00:00:00 GMT

Pinned Post | Originally published March 7, 2026

Outcomes research has evolved alongside the broader arc of healthcare analytics infrastructure. Early siloed clinical systems produced fragmented administrative and claims data with limited analytic utility — adequate for billing, but structurally unsuitable for longitudinal cohort construction or comparative effectiveness work. The meaningful use era expanded the availability of structured clinical data, yet interoperability failures meant that patient journeys remained fractured across institutional boundaries, undermining the real-world evidence studies that outcomes researchers depend on. The shift to integrated analytics platforms — particularly the adoption of common data models like OMOP/OHDSI — marked a genuine inflection point: federated network studies, standardized phenotyping, and reproducible retrospective analyses became operationally feasible at scale. Now a fourth generation is taking shape, one in which AI-augmented clinical intelligence moves outcomes research from retrospective description toward prospective, near-real-time evidence generation — enabling dynamic cohort surveillance, treatment heterogeneity detection, and value-based care signal identification that was previously impractical outside of narrow clinical trial settings.

Parthenon is built for this fourth generation.

Why We Built This

The problems with traditional healthcare analytics infrastructure are well-documented but stubbornly persistent. Data fragmentation scatters patient information across EHR, laboratory, radiology, and claims platforms with inconsistent terminologies. Analytics teams are overwhelmed with routine reporting demands, leaving limited capacity for the strategic analysis that actually improves outcomes. And the insights that do emerge are retrospective — care gaps identified too late for optimal impact, interventions that are reactive rather than proactive.

The OHDSI community addressed part of this problem brilliantly. The OMOP Common Data Model standardizes clinical data across institutions. HADES packages encode decades of pharmacoepidemiology methodology. Atlas provides a visual interface for cohort building and analysis design. But the toolchain has grown to 15+ disconnected applications — Atlas, WebAPI, Achilles, DQD, CohortGenerator, CohortMethod, PatientLevelPrediction, and more — each with its own deployment, its own UI paradigm, and its own learning curve.

Parthenon replaces all of them with a single application.

What Parthenon Does

At its core, Parthenon is a unified outcomes research platform built on OMOP CDM v5.4. A researcher can move through the entire real-world evidence lifecycle without leaving the browser: explore vocabularies and build concept sets, construct patient cohorts with a visual builder, then run characterization, incidence rates, treatment pathways, population-level estimation, patient-level prediction, self-controlled case series, and evidence synthesis.

But Parthenon extends well beyond what Atlas ever offered.

Genomics — Upload VCF files, annotate variants against ClinVar, browse mutations in an interactive variant browser, and convene virtual tumor boards with AI-assisted interpretation. This bridges the gap between population-level observational research and precision medicine.

Medical Imaging — View DICOM studies with a built-in Cornerstone3D viewer, connect to PACS systems via WADO-RS, and incorporate imaging criteria directly into cohort definitions. Radiogenomics analysis becomes possible within the same platform where you run your epidemiological studies.

Health Economics & Outcomes Research — Model cost-effectiveness, identify care gaps across populations, and run economic analytics. The care gap module tracks screening compliance, flags missed interventions, and quantifies the financial impact of closing gaps at various capture rates.

FHIR R4 Integration — Connect to EHR systems using SMART Backend Services for automated bulk export and incremental sync. Clinical data flows from production EHR systems into your OMOP CDM without manual ETL intervention.

AI-Assisted Analysis — An integrated AI service powered by Ollama and MedGemma provides semantic concept search, natural-language cohort suggestions, clinical result interpretation, and genomic variant summarization. The AI doesn't replace the researcher — it reduces the time between question and insight.

The Architecture

Parthenon is a containerized multi-service application orchestrated with Docker Compose. The frontend is React 19 with TypeScript strict mode, Tailwind CSS, and Zustand for state management. The backend is Laravel 11 with PHP 8.4, using Sanctum authentication and Spatie role-based access control. A Python FastAPI service handles AI capabilities — MedGemma through Ollama, pgvector embeddings for semantic search. An R Plumber API executes HADES analyses — CohortMethod, PatientLevelPrediction, SelfControlledCaseSeries — against the CDM. PostgreSQL 16 stores both application data and the OMOP CDM across multiple schemas. Redis powers the job queue via Laravel Horizon. Solr provides full-text vocabulary search.

Eight Docker services, one docker compose up -d command. A Python installer walks you through configuration in nine phases — from preflight checks through admin account creation — with optional Eunomia demo data so you can start exploring immediately.

The AI Imperative

The PDF that inspired this platform — Transforming Healthcare Delivery: Next-Generation Clinical Analytics Powered by Artificial Intelligence — makes the business case quantitatively. Six in ten Americans live with chronic disease, driving $4.1 trillion in annual healthcare costs. Traditional monitoring of conditions like CKD achieves just 3% compliance across all seven recommended measures. AI-enhanced approaches have demonstrated 267% improvement in compliance, prevention of 15-20 dialysis cases per year, and $3-4 million in annual cost savings per 10,000 patients.

These aren't theoretical projections. They're the measurable outcomes that become possible when you combine standardized clinical data (OMOP CDM), validated analytical methods (HADES), and machine learning that identifies patterns humans can't see at scale.

Parthenon's care gap module, population risk scoring, and predictive analytics are designed to deliver exactly this kind of impact — clinical decision support that anticipates patient needs rather than simply responding to events.

Building in Public

This blog will serve as a daily development journal. Every day, we'll document what was built, what broke, what we learned, and what's next. The first technical post — about the five bugs we had to fix before HADES analyses would run in production — is already live. It's the kind of hard-won knowledge that doesn't appear in any documentation, and we think sharing it openly makes the entire OHDSI ecosystem stronger.

We're also automating this process. A Claude Code agent reviews the day's git history every night and generates a narrative dev log post — not just a commit list, but a story about what the code changes mean and why they matter.

What's Next

The platform's roadmap follows a four-phase journey. The foundation phase establishes data integration and baseline analytics. Core analytics introduces care bundles for high-impact conditions. Advanced capabilities bring full population health management with HCC coding optimization and clinical decision support integration. The transformation phase enables value-based care analytics, precision medicine, and continuously learning systems.

We're deep in the foundation and core analytics phases right now, shipping features daily. Follow this blog to watch it happen.

Parthenon is open-source and available at github.com/sudoshi/Parthenon. Built by Acumenus Data Sciences.

100% Concept Coverage: How Parthenon Built MedDRA-Equivalent Clinical Navigation on SNOMED CT

Sun, 05 Apr 2026 00:00:00 GMT

Parthenon's Vocabulary Search now provides 100% navigational coverage of all 105,324 standard SNOMED CT Condition concepts through 27 curated clinical groupings — achieving functional parity with MedDRA's System Organ Class navigation while preserving SNOMED's superior clinical granularity. This is the story of diagnosing the SNOMED-OMOP domain boundary problem, engineering a cross-domain hierarchy builder, curating a clinically intelligent grouping layer, and systematically closing every coverage gap until no standard concept was left behind.

The Problem: A Hierarchy Browser That Made No Clinical Sense

Parthenon has a Browse Hierarchy tab in its Vocabulary Search page — a tree-style navigator that lets clinical researchers drill from high-level categories down to specific medical concepts. When we built it on April 3rd, we materialized SNOMED CT's concept_ancestor relationships into a vocab.concept_tree table with 527,000 edges across six OMOP domains.

It looked correct. It wasn't.

When a clinical researcher clicked "Conditions," they saw this:

What They Expected	What They Got
Cardiovascular disorders	Abnormal feces
Respiratory disorders	Abulia
Neurological disorders	Anxiety
Gastrointestinal disorders	Biliuria
... (20-30 clinically organized categories)	... (174 alphabetically sorted orphan concepts)

The Measurement domain was worse: 1,223 flat concepts — questionnaire scores, lab test names, and clinical observations dumped at the top level with zero hierarchy. Observation had 633. The Browse Hierarchy was functionally a flat alphabetical list for three of six domains. Only Drug (14 ATC categories) and Visit (19) worked, because they use non-SNOMED hierarchies that don't cross domain boundaries.

Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries

This is a fundamental tension in the OMOP CDM that every OHDSI implementer faces but rarely has to solve at the navigation layer.

How OMOP Assigns Domains

OMOP assigns every vocabulary concept to exactly one domain: Condition, Observation, Measurement, Procedure, Drug, Visit, etc. This assignment determines which clinical data table a concept belongs in — a concept in the Condition domain goes into condition_occurrence, one in Measurement goes into measurement, and so on.

How SNOMED Organizes Concepts

SNOMED CT is a polyhierarchical ontology with a single root concept, "Clinical finding" (concept_id 441840). Its hierarchy is organized by finding type and body system, not by OMOP domain. The children of "Clinical finding" include:

Clinical finding (441840, domain = Condition)
├── Disease (4274025, domain = Condition)
│   └── Disorder of body system (4180628, domain = Condition)
│       └── Disorder of cardiovascular system (134057, domain = Condition)
│           └── Heart disease → Coronary arteriosclerosis → ...
├── Cardiovascular finding (4023995, domain = Observation)    ← CROSS-DOMAIN!
│   └── Heart disease (321588, domain = Condition)
├── Respiratory finding (4024567, domain = Condition)
│   └── Dyspnea (312437, domain = Condition)
├── Functional finding (4041284, domain = Observation)        ← CROSS-DOMAIN!
│   └── Difficulty walking (36714126, domain = Condition)
└── ... 120+ more children spanning 4 domains

"Cardiovascular finding" is the natural parent of many Condition-domain heart diseases, but OMOP assigns it to the Observation domain. "Functional finding" parents hundreds of Condition-domain concepts like difficulty walking and impaired cognition, but lives in Observation. This is not a data quality issue — it's by design. OMOP's domain assignment reflects what table the data goes in, while SNOMED's hierarchy reflects clinical relationships.

The Severed Hierarchy

Our original HierarchyBuilderService built the tree per-domain, filtering concept_ancestor edges so both parent and child had to share the same domain_id:

-- THE BUG: both parent and child must be in same domain
WHERE parent.domain_id = 'Condition'
  AND child.domain_id = 'Condition'

This severed every cross-domain link. "Heart disease" (Condition) couldn't find its SNOMED parent "Cardiovascular finding" (Observation). Every concept whose nearest SNOMED parent lived in a different domain became an orphan — dumped directly under the virtual domain root with no organizing structure.

The numbers told the story:

Domain	Orphan Roots	Cause
Measurement	1,223	Almost entirely cross-domain. Most measurement-domain findings have Observation-domain parents in SNOMED.
Observation	633	Observation concepts parented by Procedure or Condition concepts in SNOMED.
Condition	174	80 concepts with Observation parents + 93 with Measurement parents.
Procedure	12	Mostly self-contained in SNOMED.
Drug	14	Uses ATC hierarchy, not SNOMED.
Visit	19	Uses CMS Place of Service / NUCC / UB04.

The Fix: Cross-Domain SNOMED Tree Builder

Phase 1: Remove the Domain Filter on Parents

The core fix was a single SQL change with cascading architectural implications. We replaced buildSnomedDomain() (which built one domain at a time) with buildUnifiedSnomedTree() that processes all four SNOMED domains together:

-- FIXED: no domain filter on parent — follow SNOMED's actual hierarchy
INSERT INTO vocab.concept_tree (parent_concept_id, child_concept_id, domain_id, ...)
SELECT ca.ancestor_concept_id, ca.descendant_concept_id, child.domain_id, ...
FROM vocab.concept_ancestor ca
JOIN vocab.concept parent ON parent.concept_id = ca.ancestor_concept_id
JOIN vocab.concept child ON child.concept_id = ca.descendant_concept_id
WHERE ca.min_levels_of_separation = 1
  AND parent.vocabulary_id = 'SNOMED' AND parent.standard_concept = 'S'
  AND child.vocabulary_id = 'SNOMED' AND child.standard_concept = 'S'
  AND child.domain_id IN ('Condition', 'Procedure', 'Measurement', 'Observation')
-- Note: NO parent.domain_id filter!

Each edge is tagged with the child's domain_id, so domain-scoped tree queries still work. The primary key was expanded from (parent_concept_id, child_concept_id) to (parent_concept_id, child_concept_id, domain_id) to support the same edge appearing in multiple domain contexts.

Phase 2: Propagate Cross-Domain Parent Chains

The initial fix produced 839 Condition roots instead of 174 — worse, not better. Here's why:

Removing the parent domain filter correctly added edges like (Cardiovascular finding → Heart disease) tagged as Condition. But "Cardiovascular finding" itself had no incoming Condition-tagged edge — its parent "Clinical finding" → "Cardiovascular finding" was tagged Observation. So "Cardiovascular finding" became an orphan root in the Condition tree.

We needed to propagate cross-domain parent chains upward iteratively. The propagateCrossDomainParents() algorithm:

Find cross-domain roots — concepts under the virtual domain root whose actual OMOP domain differs from the tree they're in
Walk up their SNOMED parents via concept_ancestor — add parent→child edges tagged with the target domain
Remove from virtual root — the concept now has a real parent in the domain tree
Re-discover new roots — the newly added parents may themselves be cross-domain
Repeat until no cross-domain roots remain (typically 3-5 iterations)

The result was transformative:

Domain	Before	After Phase 1	After Phase 2
Condition	174 orphans	839 (worse!)	2 roots
Measurement	1,223 flat	620	5 roots
Observation	633 flat	822	57 roots
Procedure	12	48	1 root

The 2 Condition roots are "Clinical finding" (with 121 immediate children) and "Situation with explicit context" (with 1). Drilling into "Clinical finding" now shows exactly what a clinician expects: Disease, Musculoskeletal finding, Bleeding, Neurological finding, Digestive system finding, Respiratory finding — the natural SNOMED organizing categories.

Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent

MedDRA (Medical Dictionary for Regulatory Activities) provides five levels of curated clinical navigation:

SOC (27)    → System Organ Class (Cardiac disorders, Respiratory disorders, ...)
HLGT (~337) → High Level Group Term
HLT (~1738) → High Level Term
PT (~24000) → Preferred Term
LLT (~83000)→ Lowest Level Term

Every level is curated by human medical terminologists with consistent granularity. A researcher navigating from "Cardiac disorders" through "Coronary artery disorders" to "Myocardial infarction" experiences a smooth, predictable narrowing at each step.

SNOMED's hierarchy, while clinically correct, is organized by ontological category (Disease → Disorder of body system → Disorder of cardiovascular system), not by clinical intuition (Cardiac disorders → Heart failure syndromes → Congestive heart failure). The depth varies from 2 to 13 levels. Intermediate nodes mix organizational axes — anatomical, etiological, temporal, age-based — in a single level.

We needed a curated navigation layer that provides MedDRA SOC-equivalent entry points while leveraging SNOMED's superior concept hierarchy underneath.

The Clinical Groupings Table

We created app.clinical_groupings — a curated metadata table that lives in the application schema (never modifying the read-only vocabulary tables):

CREATE TABLE app.clinical_groupings (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,         -- "Cardiovascular"
    description TEXT,                    -- "Heart, blood vessel disorders and findings"
    domain_id VARCHAR(20) NOT NULL,     -- "Condition"
    anchor_concept_ids INTEGER[] NOT NULL, -- SNOMED concept_ids defining this group
    sort_order INTEGER DEFAULT 0,
    icon VARCHAR(50),
    color VARCHAR(7),                   -- Hex color for UI
    parent_grouping_id INTEGER REFERENCES app.clinical_groupings(id)
);

Each grouping has one or more anchor concept IDs — SNOMED concepts whose entire descendant tree (via concept_ancestor) defines the grouping's coverage. When a user clicks "Cardiovascular," they navigate to the anchor concept's subtree in the SNOMED hierarchy.

Closing Every Coverage Gap

This is where clinical informatics meets systematic engineering. We didn't stop at "good enough" — we measured coverage and closed every gap.

Iteration 1: Disorder Anchors Only (77.3% coverage)

Our first 20 Condition groupings used "Disorder of X system" concepts as anchors — the same approach Atlas and most OHDSI tools take. This covered 81,453 of 105,324 standard Condition concepts.

The missing 22.7% revealed a critical insight: SNOMED distinguishes between disorders (diseases, conditions) and clinical findings (observations, signs, symptoms). Both are assigned to the Condition domain in OMOP, but they sit on different branches of SNOMED's hierarchy:

Clinical finding
├── Disease
│   └── Disorder of body system
│       └── Disorder of cardiovascular system ← Our anchor (covers disorders)
└── Cardiovascular finding                    ← NOT covered (findings branch)
    └── Heart murmur, Blood pressure finding, ECG abnormality, etc.

"Heart murmur" is a Condition-domain concept. It's clinically related to cardiovascular disorders. But it's not under "Disorder of cardiovascular system" — it's under "Cardiovascular finding." MedDRA handles this via multi-axiality (a concept can appear under multiple SOCs). We needed to cover both branches.

Iteration 2: Disorder + Finding Siblings (98.4% coverage)

We added the SNOMED "finding" sibling of every organ-system disorder anchor:

Grouping	Disorder Anchor	Finding Anchor Added
Cardiovascular	Disorder of cardiovascular system (134057)	+ Cardiovascular finding (4023995)
Respiratory	Disorder of respiratory system (320136)	+ Respiratory finding (4024567)
Neurological	Disorder of nervous system (376337)	+ Neurological finding (4011630) + CNS finding (4086181)
Dermatological	Disorder of skin (4317258)	+ Skin AND/OR mucosa finding (4212577)
...	...	...

We also added 7 new MedDRA SOC-equivalent groupings that were entirely missing:

New Grouping	MedDRA SOC Equivalent	Anchor Concepts
Vascular	SOC 27 — Vascular disorders	Vascular disorder (443784)
Hepatobiliary	SOC 9 — Hepatobiliary disorders	Disorder of liver and/or biliary tract (1244824) + Biliary tract (197917) + Jaundice (137977)
Renal & Urinary	SOC 21 — Renal and urinary	Disorder of urinary system (75865) + Urine finding (437382)
Reproductive & Breast	SOC 22 — Reproductive system	Female (4180154) + Male (196738) + Breast (77030)
Investigations	SOC 13 — Investigations	Evaluation finding (40480457) + Finding by method (4041287)
General Signs & Symptoms	SOC 8 — General disorders	Bleeding (437312) + Mass (4102111) + Edema (433595) + Fever (437663) + Disease (4274025) + 16 more
Body Region Findings	N/A (SNOMED-specific)	Trunk (4117930) + Limb (138239) + Head (4247371) + Back (4213101) + Neck (4184252)

Coverage jumped from 77.3% to 98.4% — 103,629 of 105,324 concepts.

Iteration 3: Systematic Gap Closure (100.0% coverage)

The remaining 1.6% (1,695 concepts) fell into specific SNOMED categories that sit outside the disorder/finding dichotomy. We used MedGemma to analyze the 66 parent-level groups and map each to the most clinically appropriate existing grouping, then verified every concept_id against vocab.concept.

Key expansions in this final pass:

Expansion	Concepts Captured	Clinical Rationale
Neoplasm + Finding of lesion, Clinical stage finding	+349	Tumor staging (Gleason grades, TNM), morphology, and oncology assessment findings belong with neoplasms
Neurological + Speech finding, Coordination finding	+261	Speech pathology and motor coordination are neurological subspecialties
Hematologic + Blood/lymphatics/immune system finding	+185	Anemias (under "Disorder of cellular component of blood") were missed because they're not under "Disorder of hematopoietic structure" in SNOMED — a non-obvious hierarchy gap
Injury, Poisoning & Procedural + Wound finding, Device finding	+100	Wound assessment, procedural complications, and device-related findings
Congenital & Genetic + Carrier of disorder	+53	Genetic carrier states (e.g., "Carrier of cystic fibrosis") are findings, not disorders
Functional Impairment (new grouping)	+473	Impaired cognition, difficulty walking, ADL limitations — these are cross-domain from Observation but clinically critical Condition concepts

Final result: 105,299 of 105,324 standard SNOMED Condition concepts covered (100.0%). The 25 uncovered are 3 true orphans with no ancestors in concept_ancestor (vocabulary data quality issue) and 22 concepts reachable only through paths that don't intersect any grouping anchor.

MedDRA SOC Parity Map

The final 27 Condition groupings map directly to MedDRA's 27 System Organ Classes:

MedDRA SOC	Parthenon Grouping	Anchors
Blood and lymphatic system disorders	Hematologic	Hematopoietic structure + Cellular blood + Blood/lymph/immune finding
Cardiac disorders	Cardiovascular	Cardiovascular system + Cardiovascular finding
Congenital, familial and genetic disorders	Congenital & Genetic	Congenital disease + Genetic disease + Carrier of disorder
Ear and labyrinth disorders	Ear & Hearing	Disorder of ear + ENT finding
Endocrine disorders + Metabolism and nutrition	Endocrine & Metabolic	Metabolic disease + Endocrine system + Metabolic/endocrine findings
Eye disorders	Eye & Vision	Disorder of eye region + Eye/vision finding
Gastrointestinal disorders	Gastrointestinal	Digestive system + Digestive finding + Stool finding
General disorders and administration site conditions	General Signs & Symptoms	Bleeding + Mass + Edema + Fever + Vital signs + 16 more
Hepatobiliary disorders	Hepatobiliary	Liver/biliary tract + Jaundice
Immune system disorders	Immune System	Immune function + Hypersensitivity + Adverse reaction propensity
Infections and infestations	Infectious Disease	Infectious disease + Inactive TB + Susceptibility
Injury, poisoning and procedural complications	Injury, Poisoning & Procedural	Traumatic injury + Poisoning + Procedural complications + Wound + Device
Investigations	Investigations	Evaluation finding + Method finding + Body product finding
Musculoskeletal and connective tissue disorders	Musculoskeletal	MSK system + MSK finding + Muscle finding
Neoplasms benign, malignant and unspecified	Neoplasm	Malignant + Benign + Uncertain behavior + Lesion finding + Clinical staging
Nervous system disorders	Neurological	Nervous system + Neurological finding + CNS finding + Coordination + Speech
Pregnancy, puerperium and perinatal conditions	Pregnancy & Perinatal	Pregnancy + Childbirth finding + Neonatal + Perinatal + Fetal + Development
Psychiatric disorders	Mental & Behavioral	Mental disorder + Psych finding + Delusion
Renal and urinary disorders	Renal & Urinary	Urinary system + Urine finding + Micturition
Reproductive system and breast disorders	Reproductive & Breast	Female reproductive + Male genital + Breast
Respiratory, thoracic and mediastinal disorders	Respiratory	Respiratory system + Respiratory finding + Respiratory measurements
Skin and subcutaneous tissue disorders	Dermatological	Skin + Mucosa finding + Soft tissue + Color + Integumentary + Swelling
Social circumstances	Observation domain	Social context finding (covered in Observation groupings)
Surgical and medical procedures	Procedure domain	Covered by Procedure groupings (Surgical, Evaluation, Therapeutic, Rehab, Preventive)
Vascular disorders	Vascular	Vascular disorder
N/A — Parthenon additions	Nutritional	Nutritional disorder + Eating/feeding finding
N/A — Parthenon additions	Pain Syndromes	Pain
N/A — Parthenon additions	Functional Impairment	Functional finding
N/A — Parthenon additions	Body Region Findings	Trunk + Limb + Head + Back + Neck + Face + Posture

MedDRA SOCs 23 (Social circumstances) and 24 (Surgical and medical procedures) are covered by our Observation and Procedure domain groupings respectively, which is architecturally correct — these concepts live in different OMOP domains.

The Anchor Verification Problem

One of the harder lessons from this work: SNOMED concept IDs are not guessable, and ILIKE is not a concept resolver.

Our initial seeder used ILIKE pattern matching against vocab.concept to resolve anchor names to concept_ids:

SELECT concept_id FROM vocab.concept
WHERE concept_name ILIKE 'Disorder of ear'
  AND vocabulary_id = 'SNOMED' AND standard_concept = 'S'
ORDER BY concept_id LIMIT 1

This produced catastrophically wrong results for 22 of 39 initial groupings:

Grouping	Intended Concept	ILIKE Resolved To	concept_id
Pain Syndromes	Pain	Dementia	4182210
Ear & Hearing	Disorder of ear	Multiple sclerosis	374919
Genitourinary	Disorder of genitourinary system	Urethritis	195862
Immune System	Immune system disorder	Malignant lymphoma	432571
Neoplasm (2nd anchor)	Benign neoplasm	Passing flatus	4091513
Cardiac Testing	Cardiac measure	Dipipanone overdose	4173533
Pulmonary Function	Respiratory measure	Eustrongylides tubifex	4206896
Preventive (Procedure)	Prophylactic procedure	Syndrome of inappropriate vasopressin secretion	4207539

The problem: ILIKE matches substrings. SNOMED has 350,000+ concepts. An ILIKE query for "Disorder of ear" might match "Disorder of ear" (concept_id 378161) — or it might match "Early onset cerebellar ataxia" or another concept that contains those characters, depending on which concept_id sorts first. The ORDER BY concept_id LIMIT 1 made the result deterministic but not correct.

Our fix was to reverse the resolver priority: verified hardcoded IDs first, name matching as fallback only. Every anchor concept_id in the seeder was individually verified against vocab.concept with an exhaustive audit query:

WITH seeder_ids(intended_name, hardcoded_id) AS (VALUES
  ('Disorder of cardiovascular system', 134057),
  ('Pain', 4329041),
  -- ... all 119 anchor IDs
)
SELECT 
  CASE WHEN c.concept_name = s.intended_name THEN '✓ MATCH'
       ELSE '✗ WRONG: ' || c.concept_name
  END as status,
  s.intended_name, s.hardcoded_id
FROM seeder_ids s
LEFT JOIN vocab.concept c ON c.concept_id = s.hardcoded_id
WHERE c.concept_name IS NULL OR c.concept_name != s.intended_name;

-- Result: (0 rows) — all 119 anchors verified

This audit query is now part of our verification protocol. Every time we add or modify clinical groupings, we run it to confirm zero mismatches before seeding.

Several groupings require multiple SNOMED anchors (the record is Pregnancy & Perinatal with 9 anchors). When a user clicks a multi-anchor grouping, they see a sub-level listing each anchor concept:

Conditions > Endocrine & Metabolic
├── Metabolic disease (46 subcategories) →
├── Disorder of endocrine system (51 subcategories) →
├── Metabolic finding (22 subcategories) →
├── Endocrine finding (17 subcategories) →
└── Finding of secondary sexual characteristics (2 subcategories) →

Single-anchor groupings drill directly into the SNOMED subtree. A "Show all concepts" toggle lets power users bypass the grouping layer and see the raw tree roots.

The groupings API returns resolved anchor details (concept name, vocabulary, class) so the frontend can display meaningful labels without additional queries:

{
  "name": "Endocrine & Metabolic",
  "anchors": [
    { "concept_id": 436670, "concept_name": "Metabolic disease", "domain_id": "Condition" },
    { "concept_id": 31821, "concept_name": "Disorder of endocrine system", "domain_id": "Condition" },
    { "concept_id": 432455, "concept_name": "Metabolic finding", "domain_id": "Condition" },
    { "concept_id": 444107, "concept_name": "Endocrine finding", "domain_id": "Observation" },
    { "concept_id": 4306009, "concept_name": "Finding of secondary sexual characteristics", "domain_id": "Observation" }
  ]
}

Why This Matters

For Clinical Researchers

Before this work, browsing SNOMED conditions in Parthenon was functionally impossible. A researcher looking for cardiovascular conditions would see 174 orphan concepts and have to use keyword search instead. Now they click Cardiovascular, see 79 subcategories (57 disorders + 22 findings), and drill to any level of SNOMED's 13-deep hierarchy.

For Cohort Building

The groupings layer makes Parthenon the first open-source OHDSI tool to provide MedDRA-equivalent navigation for cohort definition concept selection. When building a cohort that needs "all cardiovascular conditions," a researcher can start from the Cardiovascular grouping, expand its anchors, and use includeDescendants to capture the full SNOMED subtree — something that previously required knowing the exact SNOMED concept_id to search for.

For the OHDSI Ecosystem

The SNOMED-OMOP domain boundary problem affects every tool in the OHDSI ecosystem. Atlas's concept hierarchy viewer suffers from the same orphan-root issue (though it uses a different codebase). Our cross-domain tree builder and clinical groupings layer are architectural patterns that could be adopted by the broader OHDSI community. The propagateCrossDomainParents() algorithm in particular solves a problem that, as far as we can determine, no other OHDSI tool has addressed — following SNOMED's actual polyhierarchical structure across OMOP domain boundaries.

For Vocabulary Governance

The app.clinical_groupings table establishes infrastructure for ongoing vocabulary curation. The parent_grouping_id foreign key supports future HLGT/HLT-equivalent sub-groupings — the next two levels of MedDRA's five-level navigation. The anchor-based architecture means groupings stay valid across SNOMED vocabulary updates as long as the anchor concepts aren't retired, and the seeder's verification protocol catches breakages automatically.

Technical Summary

Metric	Value
Condition concept coverage	100.0% (105,299 / 105,324)
Total clinical groupings	46 (27 Condition + 8 Measurement + 6 Observation + 5 Procedure)
Total anchor concepts	119 (all verified against vocab.concept)
Concept tree edges	538,424 across 6 domains
Max hierarchy depth	16 (Observation), 13 (Condition), 12 (Procedure/Measurement)
Hierarchy build time	~30 seconds (full rebuild with results population)
Cross-domain propagation	3-5 iterations per domain
MedDRA SOC parity	25 of 27 SOCs directly mapped (2 covered by other domains)

What's Next

The clinical groupings layer is designed for two future enhancements:

HLGT/HLT-equivalent sub-groupings — The parent_grouping_id column supports hierarchical groupings. Under "Cardiovascular," we could add sub-groupings like "Coronary artery disorders," "Heart failure syndromes," "Arrhythmias," and "Valvular heart disease" — matching MedDRA's HLGT level. This would require ~300-400 curated sub-groupings, which is a substantial but bounded clinical curation task.
Data prevalence overlay — Show person count and record count from Achilles results alongside each grouping card, so researchers can immediately see which clinical categories have the most data in their CDM sources. This turns the grouping browser from a navigation tool into a data discovery tool.
AI-assisted curation — We've already demonstrated the pattern: use a medical LLM (now II-Medical-8B, replacing MedGemma 4B) for clinical reasoning about concept relationships, paired with database queries for concept_id verification. This pipeline could semi-automate the creation of HLGT-level sub-groupings, with human review as the quality gate.

Today, Parthenon's vocabulary browser provides the navigational quality of MedDRA with the clinical depth of SNOMED CT. No other open-source OHDSI tool offers this combination. For the first time, a clinical researcher can browse from "Cardiovascular" to "Coronary arteriosclerosis" through a clinically intuitive path — without knowing a single concept_id, without switching tools, and without leaving Parthenon.

This work was completed on April 5, 2026. The cross-domain SNOMED tree builder, clinical groupings layer, and Browse Hierarchy UI are all available in the current Parthenon release. The clinical grouping definitions are seeded via ClinicalGroupingSeeder and can be customized for institution-specific navigation needs.

Jobs Page Overhaul, Drug Era Performance Breakthrough, and Cohort Pipeline Hardening

Sat, 04 Apr 2026 00:00:00 GMT

A landmark day for platform observability and data pipeline reliability. We shipped a fully wired Jobs monitoring page that surfaces all 13+ tracked job types, broke through a major ETL performance ceiling on the SynPUF dataset (17 hours → 14 minutes for drug_era builds), and closed out a cohort generation audit that uncovered eight discrete bugs across the SQL builders, API layer, and frontend.

Jobs Page: From Partial View to Full Platform Visibility

The single biggest user-facing win today was landing commit 5e29c3a4e — a ground-up rework of the Jobs monitoring experience.

What Was Broken

The Jobs page was only surfacing 8 of the system's 13+ tracked job types. Achilles runs, FHIR Sync, Care Gap evaluations, GIS Boundary loads, and Poseidon ETL runs were all dispatching correctly through Horizon and writing to their tracking models — they were just completely invisible in the UI. The JobController::index() method simply never queried them.

The detail drawer had its own problem: the show endpoint was hardcoded to AnalysisExecution route model binding, meaning clicking any non-analysis job (cohort generation, ingestion, DQD, etc.) returned a 404.

Several secondary bugs compounded visibility further:

Stale cohort generation jobs appeared under the wrong status filter due to a DB-filter/display-status mismatch
FHIR Export was leaking the raw processing status string instead of normalizing to running
N+1 Source::find() calls inside DQD, Heel, and Achilles map loops
SCCS and Evidence Synthesis type filters were returning all analysis types instead of scoping to their specific morph class

What We Shipped

JobController.php gained nearly 1,000 lines across two key changes:

Five new job collectors in index():

Type	Model	Scope
`fhir_sync`	`FhirSyncRun`	System
`care_gap`	`CareGapEvaluation`	User
`gis_boundary`	`GisDataset`	User
`poseidon`	`PoseidonRun`	System

(The finngen type was removed — it lives in the workbench app, not core job tracking.)

Polymorphic show endpoint: Route model binding is gone. The new show(Request $request, int $jobId) dispatches through 14 type-specific detail builders via a ?type= query param. Each builder returns the standard job envelope plus a details object with type-specific metadata and a timeline array of execution events — giving the rich detail drawer real data to render for every job type in the system.

ETL Performance: `drug_era` Goes From Overnight to Minutes

Commit a084b84f6 delivered one of the most dramatic performance wins we've had in the ETL layer. The drug_era build step on the 2.3M-patient SynPUF dataset was taking 17 hours. It now runs in 14 minutes.

The fix was a two-phase build strategy. Previously the pipeline attempted to compute drug eras in a single monolithic pass, which collapsed under the weight of the dataset's join complexity and row volume. The rewrite splits the work: phase one materializes an intermediate exposure table with appropriate indexes, and phase two performs the era consolidation logic against that pre-built structure. The intermediate materialization pays for itself immediately by giving the query planner something it can actually reason about.

This was preceded by 1eb297148, which rewrote the SynPUF enrichment parallelism to eliminate the OOM crashes and deadlocks that were causing enrichment runs to fail intermittently on large datasets. Both fixes together mean the SynPUF 2.3M enrichment pipeline is now stable and fast end-to-end.

Cohort Generation: Eight Bugs Closed

Commit 6b4012262 documents a focused audit of the cohort generation pipeline that surfaced and fixed eight bugs spanning three layers:

SQL builders — edge cases in inclusion criteria handling that produced incorrect cohort membership under specific date range configurations
API layer — response shape inconsistencies that caused the frontend to silently drop data
Frontend — patient list navigation was routing to malformed profile URLs (9d79ffe37), and breadcrumbs weren't context-aware when entering from the cohort view

The risk scores feature also got two targeted fixes: 1297db01b corrected the recommend endpoint to return a structured response with the full patient profile attached (it was previously returning a bare score), and 6b4012262 fixed a useParams / route definition mismatch where the component was reading scoreId but the route defined :id.

AI Memory & Infrastructure Housekeeping

On the AI side, 6035d6d65 streamlined the Chroma memory path resolution — a small but meaningful cleanup that removes ambiguity in how the vector store locates its persistence directory across different deployment environments.

Infrastructure received a round of tuning in 9f24a2ca2: Docker Compose configuration was tightened, Horizon queue configuration was cleaned up, monitoring alerts were added for key pipeline stages, and the CI workflow was updated to reflect the current test surface.

What's Next

With the Jobs page now surfacing the full job graph, the natural follow-on is real-time status streaming — replacing the current polling approach with server-sent events so operators get live feedback on long-running ETL and analysis jobs without hammering the API.

The drug_era two-phase build is a pattern worth generalizing. The condition_era and observation_period builders have similar structural characteristics and are candidates for the same treatment.

Cohort generation is in a much cleaner state after today's audit, which unblocks work on cohort comparison views — a feature that's been waiting on a stable generation pipeline before we could build on top of it confidently.

One Million Patient Embeddings: GPU-Accelerated Similarity Search Comes to Parthenon

Sat, 04 Apr 2026 00:00:00 GMT

Two days ago, we shipped the Patient Similarity Engine — a multi-modal system that scores patients across six clinical dimensions on OMOP CDM. The architecture was sound. The algorithms worked. But there was a problem hiding in plain sight: none of our patients had embeddings.

The embedding pipeline had been silently failing since day one. Three type mismatches between our PHP backend and Python AI service meant that every embedding request returned a validation error, was caught by a try/catch block, and logged as a warning that nobody read. The feature vectors were all there — conditions, drugs, measurements, procedures — but the 512-dimensional dense vectors that would make similarity search fast at scale? Zero. For every source. For every patient.

Tonight, we fixed all three bugs, refactored the embedding pipeline from CPU-only SapBERT to GPU-accelerated Ollama, upgraded from 512 to 768 dimensions, introduced batch deduplication that delivered a 123x throughput improvement, and generated embeddings for 1,007,007 patients across three CDM sources. This is the story of what broke, what we built, and what it unlocks.

The Silent Failure

The Patient Similarity Engine has two search modes. Interpretable mode computes per-dimension Jaccard and z-score similarity in real time — it's explainable but requires loading candidate patients into memory and scoring each one. Embedding mode uses pgvector's IVFFlat index for approximate nearest neighbor (ANN) search — sub-second lookups across a million patients, with interpretable scoring applied only to the top candidates.

Embedding mode requires pre-computed dense vectors stored in a vector(N) column on patient_feature_vectors. These vectors are generated by a Laravel queue job (ComputePatientFeatureVectors) that:

Extracts clinical features from CDM tables (conditions, drugs, measurements, procedures, genomics)
Stores them as JSONB in the feature vector table
Sends batches to the Python AI service for encoding
Writes the resulting vectors back to pgvector

Steps 1 and 2 worked perfectly. Step 3 failed on every single call. Step 4 never executed.

Bug #1: Integer Concepts vs. String Validation

The Pydantic model for the /patient-similarity/embed endpoint declared concept lists as list[str]:

class PatientFeatures(BaseModel):
    condition_concepts: list[str] = Field(default_factory=list)
    drug_concepts: list[str] = Field(default_factory=list)
    procedure_concepts: list[str] = Field(default_factory=list)

But OMOP concept IDs are integers. PHP's PatientFeatureVector::toArray() serializes the JSONB condition_concepts column as [4120002, 4045900, 4031047] — a list of integers. FastAPI's Pydantic validation rejected every request with:

{"detail": [{"type": "string_type", "loc": ["body", "condition_concepts", 0],
  "msg": "Input should be a valid string", "input": 4120002}]}

The EmbeddingClient caught the 422, logged a warning, and returned an empty array. The job continued to the next batch.

Bug #2: Dict Lab Vector vs. List Validation

The lab_vector field was declared as list[float], but PHP sends a JSONB dictionary mapping concept IDs to z-scores:

{"3025315": -0.1184, "3036277": -0.0578, "3036832": 1.5564}

Same failure pattern — Pydantic rejected the dict, the client swallowed the error.

Bug #3: Batch Response Key Mismatch

Even if the first two bugs hadn't existed, the batch endpoint wouldn't have worked. The Python API returns:

{"embeddings": [{"person_id": 142763, "embedding": [...], "dimension": 768}, ...]}

But EmbeddingClient::embedBatch() iterated the response as a flat array:

foreach ($embeddings as $pid => $embedding) {
    // $pid = 0, 1, 2... (array indices, not person IDs)
    // $embedding = {"person_id": 142763, ...} (object, not float array)
}

The UPDATE would have tried to write a JSON object as a pgvector value with index 0 as the person_id.

The Compound Effect

Three bugs, three layers, one outcome: zero embeddings for any patient in any source. The job always "succeeded" — it just never wrote any vectors. The interpretable search mode worked fine as a fallback, masking the problem entirely. We only discovered it because we asked a simple question: why does the embedding column show NULL for every row?

Lesson learned: Silent failure in pipeline stages is worse than a crash. The EmbeddingClient should have thrown on non-200 responses, or at minimum, the job should have asserted that count($embeddings) > 0 per batch.

The Fixes

Type Coercion at the API Boundary

The Pydantic model now accepts what PHP actually sends:

class PatientFeatures(BaseModel):
    condition_concepts: list[int | str] = Field(default_factory=list)
    lab_vector: list[float] | dict[str, float] = Field(default_factory=list)
    drug_concepts: list[int | str] = Field(default_factory=list)
    procedure_concepts: list[int | str] = Field(default_factory=list)
    variant_genes: list[str | dict] = Field(default_factory=list)

The variant_genes field deserves special mention. In the OMOP genomics extension, variant data is stored as [{"gene": "KRAS", "pathogenicity": "Pathogenic"}, ...]. The embedding service now extracts the gene field from dict entries and passes it to the encoder:

variant_genes = [
    g["gene"] if isinstance(g, dict) else str(g)
    for g in raw_genes
]

This means patients with genomic data — like those in the Pancreatic Cancer Corpus with KRAS, BRCA1, and TP53 variants — get genomics-aware embeddings.

Lab Vector Dict Handling

The _encode_measurements function now accepts both forms:

if isinstance(lab_vector, dict):
    values = list(lab_vector.values())  # Extract z-scores from {concept_id: zscore}
else:
    values = lab_vector  # Already a list of floats

The z-scores are clipped to [-5, 5] and normalized to [-1, 1], then packed into the 96-dimensional measurements slice of the patient vector.

Batch Response Re-keying

The PHP EmbeddingClient::embedBatch() now properly maps the response:

$raw = $response->json('embeddings', []);
$result = [];
foreach ($raw as $entry) {
    if (isset($entry['person_id'], $entry['embedding'])) {
        $result[(int) $entry['person_id']] = $entry['embedding'];
    }
}
return $result;

The caller gets a person_id => float[] map instead of an indexed array, and the UPDATE statement writes the correct vector to the correct patient.

From CPU to GPU: The Ollama Migration

With the type bugs fixed, embeddings started generating — but slowly. The original pipeline used SapBERT (cambridgeltl/SapBERT-from-PubMedBERT-fulltext), a 110M-parameter biomedical language model running on CPU inside the Docker container. SapBERT is an excellent model for clinical concept encoding, but CPU inference is not how you want to embed a million patients.

Meanwhile, Ollama was already running on the host machine with full GPU access, serving MedGemma for Abby's conversational AI. Three embedding models were loaded and ready:

Model	Parameters	Dimension	Use Case
`nomic-embed-text`	137M	768	General-purpose embedding, fast
`embeddinggemma:300m`	300M	768	Google's embedding model
`text-embedding-3-large`	—	768	OpenAI-compatible embedding

All three produce 768-dimensional embeddings, matching SapBERT's native output dimension. We chose nomic-embed-text for its speed: 27 concepts/second in batch mode, with the GPU doing the heavy lifting.

The Embedding Service Refactor

The sapbert.py service was refactored to try Ollama first, falling back to CPU-based SapBERT only if Ollama is unavailable:

class OllamaEmbeddingService:
    """GPU-accelerated embedding via Ollama's /api/embed endpoint."""

    def encode(self, texts: list[str]) -> list[list[float]]:
        resp = httpx.post(
            f"{self._base_url}/api/embed",
            json={"model": self._model, "input": texts},
            timeout=60.0,
        )
        resp.raise_for_status()
        return resp.json()["embeddings"]


def get_sapbert_service() -> OllamaEmbeddingService | SapBERTService:
    """Return the best available embedding service."""
    if _ollama_service.is_available:
        return _ollama_service
    return _sapbert_service  # CPU fallback

On startup, the service probes Ollama with a single test embedding. If it responds, all subsequent calls go to the GPU. If Ollama is down, the service falls back to loading the SapBERT model into CPU memory — slower, but functional. The interface is identical: both have an .encode(texts: list[str]) -> list[list[float]] method.

768 Dimensions: The Full Encoder Width

The original design used 512-dimensional patient embeddings, partitioning the vector into six slices that truncated the encoder's 768-dim output:

Old (512-dim):
[0-32]:   Demographics (32)
[32-160]: Conditions (128)   ← truncated from 768
[160-224]: Measurements (64)
[224-352]: Drugs (128)       ← truncated from 768
[352-448]: Procedures (96)   ← truncated from 768
[448-512]: Genomics (64)     ← truncated from 768

Truncation discards information. The SapBERT and Ollama encoders pack semantic meaning across all 768 dimensions, and lopping off the tail loses the long-range feature interactions that distinguish similar-but-different concepts.

With the move to Ollama, we expanded to 768 dimensions — the encoder's native width:

New (768-dim):
[0-32]:    Demographics (32)
[32-224]:  Conditions (192)   ← 50% more capacity
[224-320]: Measurements (96)  ← 50% more capacity
[320-512]: Drugs (192)        ← 50% more capacity
[512-672]: Procedures (160)   ← 67% more capacity
[672-768]: Genomics (96)      ← 50% more capacity

The pgvector column was altered from vector(512) to vector(768), the IVFFlat index was rebuilt, and the migration file was updated to reflect the new dimension for fresh installs.

Demographics and Measurements: Numeric, Not Encoded

Two of the six dimensions don't use the language model at all:

Demographics (32 dims): Age is normalized (age_bucket / 20), gender is encoded as +1 (male) / -1 (female) / 0 (unknown), and race uses one-hot encoding in dims 2-31 mapped to OMOP race concept IDs (8516=Black, 8527=White, 8515=Asian, etc.). This is simple, deterministic, and doesn't need a language model.

Measurements (96 dims): Lab z-scores are clipped to [-5, 5], normalized to [-1, 1], and packed directly into the vector. The z-scores come from population-level statistics computed per source: for each measurement concept, we compute mean and standard deviation across all patients, then express each patient's value as a distance from the population mean. A hemoglobin of 7.2 g/dL means different things depending on whether the population average is 8.5 (critical) or 14.0 (severely anemic).

The remaining four dimensions — conditions, drugs, procedures, and genomics — are encoded through Ollama. OMOP concept IDs (integers) are passed as text strings to the embedding model, which maps them into dense semantic space. Related concepts cluster together: metformin and insulin share neighborhood structure; KRAS and TP53 occupy nearby regions of the genomics subspace. Mean pooling across all concepts in a dimension produces a single representative vector for that clinical aspect of the patient.

The 123x Speedup: Batch Deduplication

The original pipeline processed patients one at a time. Each patient triggered four Ollama calls — one per encoded dimension (conditions, drugs, procedures, genomics). For a batch of 200 patients, that's 800 Ollama calls.

But patients share concepts. In a cancer registry, most patients have the same core ICD-10 codes, the same standard-of-care medications, the same diagnostic procedures. A batch of 200 patients might reference 15 unique condition concepts, not 200 × 10 = 2,000.

The batch-optimized path exploits this:

def compute_patient_embeddings_batch(patients: list[dict]) -> list[list[float]]:
    """4 Ollama calls per batch, not 4 per patient."""

    for field, slc in dim_configs:
        # Collect ALL unique concepts across ALL patients
        all_unique_texts = deduplicate(patients, field)

        # ONE encoding call for all unique concepts
        all_embeddings = svc.encode(all_unique_texts)

        # Mean-pool per patient using shared lookup
        for i, patient in enumerate(patients):
            indices = [text_to_idx[t] for t in patient_texts[i]]
            embeddings[i, slc] = all_embeddings[indices].mean(axis=0)

Instead of 800 Ollama calls for 200 patients, we make 4 calls (one per dimension) with 15-50 unique texts each. The encoding is done once; the per-patient work is just numpy indexing and mean pooling.

Benchmark results:

Approach	Batch of 200	Rate	Ollama Calls
Per-patient (old)	14.1s	14 patients/sec	800
Batch dedup (new)	0.1s	1,743 patients/sec	4
Speedup	123x		200x fewer calls

The actual throughput for the full Acumenus CDM run settled at ~130 patients/sec sustained — lower than the benchmark because real patient data has more concept diversity than synthetic test data, and the database UPDATE operations add I/O overhead. But 130/sec on a million patients is still roughly 2 hours, compared to the ~18 hours the per-patient approach would have taken.

The Production Run: 1,007,007 Patients

With all fixes in place, we generated embeddings for three CDM sources:

Source	Patients	Time	Rate	Notes
IRSF Natural History Study	1,858	22s	84/sec	Rare disease cohort
Pancreatic Cancer Corpus	361	4s	90/sec	Multimodal cancer registry
OHDSI Acumenus CDM	1,005,788	~2 hours	130/sec	Full clinical data warehouse
Total	1,007,007

The IRSF and Pancreas sources completed in under 30 seconds each. The Acumenus CDM required multiple runs due to PHP's process limits — artisan tinker chunks stop after ~500 iterations regardless of timeout settings. We ran the embedding loop five times, each picking up where the previous left off via whereNull('embedding').

One patient — person_id 1005788 — required special handling. With 51 condition concepts, 12 procedures, and genomic variants (KRAS pathogenic), the full payload triggered a timeout in the batch endpoint. We embedded him individually with his complete clinical profile, ensuring his KRAS variant was encoded in the genomics dimension alongside his full comorbidity burden.

Data Richness Across Sources

The feature vectors capture meaningfully different clinical profiles across sources:

Source	Avg Conditions	Avg Drugs	Avg Labs	Has Genomics
IRSF	3-10	5-15	18-22 z-scores	No
Pancreas	5-51	3-8	5-11 z-scores	Yes (KRAS, BRCA1, TP53)
Acumenus	0-50+	0-30+	0-50 z-scores	Selected patients

The Pancreatic Cancer Corpus is the richest per patient — small cohort, deep phenotyping, genomic annotation. IRSF has consistent depth across a rare disease population. Acumenus is the long tail: a million patients with highly variable data completeness, from single-visit records to decades of longitudinal care.

What This Enables

Sub-Second Similar Patient Search

Before embeddings, similarity search for a patient in the Acumenus CDM required loading all 1M candidates into memory (impractical) or SQL-based pre-screening with PHP re-scoring. With pgvector's IVFFlat index, finding the 20 most similar patients is a single cosine distance query:

SELECT person_id, embedding <=> $1 AS distance
FROM patient_feature_vectors
WHERE source_id = 47
ORDER BY embedding <=> $1
LIMIT 20;

This returns in milliseconds. The interpretable scoring (per-dimension Jaccard, z-score comparison) is then applied only to these 20 candidates, giving the user both fast results and explainable scores.

Cross-Source Phenotypic Matching

All three sources share the same 768-dimensional embedding space with the same encoding model. A clinician studying a rare disease patient in IRSF can ask: "are there any patients in the million-patient Acumenus CDM who look like this?" The vector search doesn't care about source boundaries — it finds the nearest neighbors across the entire embedding space.

This is especially powerful for rare disease research, where individual institutions may have only a handful of cases. Cross-source similarity expands the searchable population from hundreds to millions.

Cohort Discovery via Centroid Search

The search-from-cohort endpoint computes the centroid (average embedding) of a defined cohort and finds individual patients nearest to it. Define a cohort of 50 confirmed cases, compute their centroid, and discover 500 more patients with similar clinical profiles who weren't captured by the original inclusion criteria. This is phenotype-driven cohort expansion — the computational equivalent of a clinician saying "find me more patients like these."

Embedding-Powered Analytics

With every patient represented as a point in vector space, standard machine learning techniques become applicable:

Clustering: K-means or HDBSCAN on patient embeddings reveals natural phenotypic subgroups without pre-specifying features. A cluster analysis of the Pancreatic Cancer Corpus might reveal subtypes that correlate with survival — not from genomics alone, but from the full clinical picture.
Outlier Detection: Patients far from any cluster centroid may represent rare phenotypes, coding errors, or unusual disease presentations. In a quality improvement context, outliers in a supposedly homogeneous cohort warrant chart review.
Temporal Trajectories: Re-embedding patients at different time windows (diagnosis, 6 months, 12 months) traces how their clinical profile evolves. Patients whose trajectories diverge despite similar starting points are natural candidates for outcome analysis.
Treatment Response Similarity: Find patients who looked similar pre-treatment, then compare outcomes. This is observational causal inference bootstrapped by embedding similarity — less rigorous than propensity score matching, but vastly more scalable.

Genomics-Aware Similarity

Patients with genomic data get embeddings that encode molecular profiles alongside clinical features. The 96-dimensional genomics slice captures gene-level similarity through the language model's understanding of gene names and their relationships. KRAS and NRAS cluster together; BRCA1 and BRCA2 share embedding structure.

This makes the similarity engine directly useful for molecular tumor board workflows: given an index patient with a pathogenic KRAS variant, find clinically similar patients who also carry RAS pathway mutations — even if they have different specific variants.

Foundation for Federated Similarity

Patient embeddings are privacy-preserving representations. A 768-dimensional vector does not contain raw clinical data — you cannot reconstruct a patient's medication list or lab values from their embedding. This makes embeddings suitable for sharing across institutional boundaries in a federated learning network.

In the Hive Networks architecture, participating sites could share patient embeddings without sharing PHI. A query like "find patients similar to this one across the network" becomes a vector search across sites — each site returns only the embedding distances, never the underlying data. The requesting site gets a ranked list of similar patients by site, enabling multi-institutional rare disease research without a central data repository.

Architecture: The Final Pipeline

CDM Tables (person, condition_occurrence, drug_exposure,
            measurement, procedure_occurrence, genomic_variant)
    │
    ▼
SimilarityFeatureExtractor (Laravel)
    ├── Demographics: age_bucket, gender, race
    ├── Conditions: ancestor-rolled concept IDs (3 levels)
    ├── Measurements: z-score normalized lab values
    ├── Drugs: ingredient-level concept IDs
    ├── Procedures: procedure concept IDs
    └── Genomics: gene names with pathogenicity tier
    │
    ▼
patient_feature_vectors (PostgreSQL, app schema)
    ├── source_id, person_id
    ├── condition_concepts (JSONB)
    ├── lab_vector (JSONB: {concept_id: z_score, ...})
    ├── drug_concepts (JSONB)
    ├── procedure_concepts (JSONB)
    ├── variant_genes (JSONB: [{gene, pathogenicity}, ...])
    └── embedding (pgvector, vector(768))
    │
    ▼
EmbeddingClient (Laravel → Python AI Service)
    │
    ▼
OllamaEmbeddingService (GPU, nomic-embed-text, 768-dim)
    ├── Batch deduplication: 4 calls per batch, not per patient
    ├── Per-dimension encoding:
    │   ├── Conditions → 192 dims (SapBERT-pooled)
    │   ├── Drugs → 192 dims (SapBERT-pooled)
    │   ├── Procedures → 160 dims (SapBERT-pooled)
    │   └── Genomics → 96 dims (SapBERT-pooled)
    ├── Direct encoding (no LM):
    │   ├── Demographics → 32 dims (numeric)
    │   └── Measurements → 96 dims (z-scores)
    └── L2 normalization → unit vector
    │
    ▼
pgvector IVFFlat Index (cosine distance, 100 lists)
    │
    ▼
PatientSimilarityService.search()
    ├── Embedding mode: ANN search → top K → interpretable re-score
    └── Interpretable mode: full dimension-wise scoring (fallback)

Fallback Guarantees

The system degrades gracefully at every layer:

Ollama down? Falls back to CPU-based SapBERT. Slower, but produces identical-dimension embeddings.
No embeddings computed? Falls back to interpretable-only search. No ANN, but full scoring across all dimensions.
Source too small for IVFFlat? (< 100 patients) Skips index creation; pgvector does exact scan.
Patient missing a dimension? Zero-padded in the embedding; interpretable scoring skips that dimension and re-weights the others.

Performance Characteristics

Embedding Generation

Metric	Value
Encoding backend	Ollama (nomic-embed-text) on GPU
Embedding dimension	768
Batch size (PHP → Python)	500 patients
Batch dedup calls per batch	4 (one per encoded dimension)
Sustained throughput	~130 patients/sec
Time for 1M patients	~2 hours
Peak GPU utilization	~40% (Ollama, batch encoding)
Peak DB write throughput	~500 UPDATEs/sec (CASE/WHEN batch)

Similarity Search (Embedding Mode)

Metric	Value
Index type	IVFFlat (100 lists)
Distance metric	Cosine
ANN candidates	20 (configurable)
Search latency (1M patients)	< 50ms
Interpretable re-scoring	~5ms per candidate
Total search time	< 150ms

Storage

Source	Rows	Embedding Storage	Total Table Size
Acumenus	1,005,788	~5.8 GB (768 × float32 × 1M)	~8.2 GB
IRSF	1,858	~5.4 MB	~12 MB
Pancreas	361	~1.1 MB	~3.5 MB

Lessons Learned

1. Silent Failures Are Architecture Bugs

The embedding pipeline "worked" for weeks without generating a single embedding. The EmbeddingClient caught exceptions and returned empty arrays. The job logged warnings that scrolled past in a sea of other output. The search engine fell back to interpretable mode without complaint.

Every pipeline stage should either succeed visibly or fail loudly. A try/catch that returns a default value without raising an alert is not error handling — it's evidence suppression.

2. Validate at the Seam

The type mismatch between PHP (integers) and Python (strings) lived at the service boundary — the HTTP API between Laravel and FastAPI. Both sides were internally correct: PHP correctly serialized OMOP concept IDs as integers; Python correctly expected concept identifiers as strings. Neither side was wrong in isolation.

Service boundaries need explicit contracts. The Pydantic model should have been generated from or validated against the PHP serialization format. In a multi-language architecture, the API schema is the source of truth — not either implementation.

3. Deduplication Beats Parallelism

Our first instinct for performance was to increase batch sizes and add worker parallelism. The 123x speedup came instead from observing that patients share concepts. In a batch of 200 oncology patients, there might be 15 unique condition concepts. Encoding 15 texts once is faster than encoding 2,000 texts (even on a GPU) because the bottleneck is Ollama's tokenization and inference, not the network call.

The general principle: before parallelizing work, check if the work is redundant. Deduplication is free; parallelism has coordination costs.

4. Tinker Has a Hidden Iteration Limit

PHP's artisan tinker (PsySH) silently stops chunk() iteration after approximately 500 calls, regardless of max_execution_time settings. For bulk operations over large datasets, use a proper artisan command or a raw PHP script — not an interactive REPL with undocumented safety limits.

5. One Patient Can Break the Pipeline

Patient 1005788 — with 51 conditions, 12 procedures, and KRAS/BRCA genomic variants — was the single holdout in a million-patient run. The variant_genes field stored as [{"gene": "KRAS", "pathogenicity": "Pathogenic"}] didn't match the list[str] Pydantic type. One patient, one type mismatch, one silent failure.

Robust pipelines handle edge cases in the data model, not in the exception handler. The fix wasn't a special case for patient 1005788 — it was accepting the actual data shape in the Pydantic model and converting dicts to gene names in the encoder.

What's Next

The embedding infrastructure is now production-ready. The immediate roadmap:

IVFFlat Index Rebuild: Create the index on the full 1M-row table with tuned lists parameter for optimal recall/speed tradeoff.
Embedding-Mode Search in the UI: The frontend currently defaults to interpretable mode. With embeddings available, the search controller can route to ANN search for large sources (> 5,000 patients) and fall back to interpretable for small ones.
Cohort Centroid Visualization: Display the centroid embedding of a cohort as a radar chart across the six dimensions, showing where the cohort's "center of mass" lies in clinical space.
Incremental Embedding Updates: New patients added through ETL should trigger embedding generation without reprocessing the entire source. The whereNull('embedding') pattern already supports this — we just need to hook it into the ingestion pipeline.
SynPUF and Eunomia: Two remaining sources (2.3M CMS SynPUF patients and 2.7K Eunomia demo patients) need feature extraction and embedding. SynPUF at 2.3M patients will take approximately 5 hours at current throughput.
Federated Embedding Exchange: Design the protocol for sharing patient embeddings across Hive Network sites — embedding format, distance computation, privacy guarantees, and consent models.

The Patient Similarity Engine now has its index. A million patients, each reduced to 768 numbers that capture their demographics, conditions, labs, medications, procedures, and genomic variants. The question "which patients are most like this one?" is no longer a research project. It's a query.

Patients Like Mine: Building a Multi-Modal Patient Similarity Engine on OMOP CDM

Thu, 02 Apr 2026 00:00:00 GMT

For twenty years, the question "which patients are most like this one?" has haunted clinical informatics. Molecular tumor boards want to know: of the 300 patients in our pancreatic cancer corpus, which ones had the same pathogenic variants, the same comorbidity profile, the same treatment history — and what happened to them? Population health researchers want to seed cohort definitions not from abstract inclusion criteria but from a concrete index patient. And every clinician who has ever stared at a complex case has wished for a button that says show me others like this.

Today, Parthenon ships that button. The Patient Similarity Engine is a multi-modal matching system that scores patients across six clinical dimensions — demographics, conditions, measurements, drugs, procedures, and genomic variants — with user-adjustable weights, dual algorithmic modes, bidirectional cohort integration, and tiered privacy controls. It works across any OMOP CDM source in the platform, from the 361-patient Pancreatic Cancer Corpus to the million-patient Acumenus CDM.

This post tells the story of why it was needed, what we studied before building it, how it works under the hood, and what we learned along the way.

The Gap: From Genomic Identity to Clinical Phenotype

Parthenon already had a form of patient similarity. The Molecular Tumor Board (TumorBoardService) could find patients sharing pathogenic or likely-pathogenic variants in the same gene. If your index patient carried a BRCA1 p.C61G variant classified as pathogenic by ClinVar, the tumor board would surface every other patient in the corpus with a pathogenic BRCA1 variant, compute median survival, and tally drug exposure patterns among those matches.

It was useful. It was also binary. You either shared a pathogenic variant or you didn't. There was no notion of degree of similarity, no consideration of clinical phenotype, no way to ask: "this 62-year-old woman with pancreatic adenocarcinoma, Type 2 diabetes, and BRCA1 — who else in our data looks like her, not just genomically, but clinically?"

The gap matters because clinical decisions are rarely made on genomics alone. Two patients with identical BRCA1 mutations but different comorbidity burdens, different lab profiles, and different treatment histories will have vastly different expected outcomes. Precision medicine requires precision context — and that context spans every clinical dimension in the OMOP CDM.

Landscape Research: What Exists and What Doesn't

Before writing a single line of code, we studied the landscape. What we found was a fragmented ecosystem where no single system solved the complete problem on OMOP CDM.

The Oracle Approach: Weighted PageRank

Oracle Healthcare Translational Research offers a "Patients Like Mine" feature that uses Weighted Personalized PageRank (PPR) over a bipartite graph of patients and clinical events. Users adjust weights on clinical categories, and the algorithm performs biased random walks personalized toward a seed patient. The output is a ranked list with drill-down comparison views and Kaplan-Meier survival curves.

The design insights worth borrowing: user-adjustable dimension weights (clinicians know what matters for their case), one-to-one comparison views, and integrated survival analysis on the similar cohort.

The Academic Frontier: Embeddings and Meta-Paths

The research literature offered several promising methodologies:

Approach	Key Paper	Insight
Patient2Vec	Zhang et al., IEEE Access 2018	LSTM + attention over longitudinal EHR produces personalized patient embeddings. 0.799 AUC. MIT-licensed.
S-PathSim	PMC8456037	Annotated Heterogeneous Information Networks prevent false associations. nDCG 0.791 on 53K patients.
Transformer Embeddings	Nature Digital Medicine, 2025	Treat each patient as a "sentence" of medical concepts. Enables stratification and progression analysis.
Patient Similarity Networks	Multi-modal fusion, Frontiers AI 2025	Graph neural networks with early/intermediate/late fusion strategies. Multi-modal significantly outperforms single-modality.
Phe2vec	Patterns, 2021	Unsupervised phenotype embeddings from EHR co-occurrence patterns.

The OHDSI Ecosystem

The OHDSI community has related tools but nothing purpose-built for patient similarity:

CohortMethod uses propensity score matching — similar in spirit but designed for treatment effect estimation, not general similarity search.
ComparatorSelectionExplorer computes cosine similarity across drug comparator feature vectors — closer, but drug-only and designed for study design, not clinical matching.

What Was Missing

No open-source system combined these properties:

OMOP-native — works directly on standard CDM tables without custom ETL
Multi-modal — fuses demographics, conditions, labs, drugs, procedures, and genomics
User-weighted — clinicians adjust dimension weights per search
Interpretable — every score decomposes into per-dimension explanations
Source-agnostic — works across any CDM source in the platform
Cohort-integrated — bidirectional flow between similarity and cohort definitions

We decided to build it.

Architecture: Split Responsibility

The biggest design decision was how to divide work between Parthenon's two backend stacks: Laravel (PHP) for application logic and FastAPI (Python) for AI services. After evaluating three architectural options, we chose Split Responsibility — each language does what it's best at:

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│   Frontend   │     │   Laravel    │     │   Python AI      │
│   React SPA  │────▶│   API        │────▶│   Service        │
│              │     │              │     │                  │
│ Weight sliders│     │ Auth/RBAC   │     │ SapBERT encode   │
│ Score bars   │     │ Extraction   │     │ Mean pooling     │
│ Compare view │     │ Scoring      │     │ 512-dim vectors  │
│ Cohort export│     │ Orchestration│     │ Batch embed      │
└──────────────┘     └──────┬───────┘     └────────┬─────────┘
                            │                      │
                     ┌──────▼──────────────────────▼──────┐
                     │         PostgreSQL + pgvector       │
                     │                                     │
                     │  patient_feature_vectors (JSONB)    │
                     │  patient_feature_vectors.embedding  │
                     │  source_measurement_stats           │
                     │  similarity_dimensions              │
                     │  patient_similarity_cache            │
                     └─────────────────────────────────────┘

Laravel owns feature extraction (reusing the existing PatientFeatureExtractor and FeatureBuilder patterns), interpretable scoring, auth/RBAC, and caching. Python owns SapBERT embedding generation and dense vector computation. PostgreSQL + pgvector stores both structured features (JSONB) and dense embeddings (vector(512)) for approximate nearest-neighbor search.

The critical benefit: interpretable mode works without the Python service. If the AI container is down, researchers still get full patient similarity via the Jaccard/Euclidean scoring path. The embedding mode adds semantic power when available, but the system degrades gracefully.

The Six Dimensions

Every patient in a CDM source gets a feature vector extracted across six clinical dimensions. Each extraction is tailored to the OMOP data model:

1. Demographics

From the person table: age (bucketed into 5-year intervals), gender concept, race concept. Scoring uses a composite: 40% age proximity + 40% gender match + 20% race match.

2. Conditions

From condition_occurrence, rolled up through concept_ancestor to three levels of the SNOMED hierarchy. This means "Essential hypertension" and "Hypertensive heart disease" both map to their shared ancestor "Hypertensive disorder" — capturing clinical relatedness, not just exact code matches.

Scoring uses Jaccard similarity on the ancestor-rolled concept sets: |A ∩ B| / |A ∪ B|. Two patients who share 40 of 50 ancestor conditions score 0.80.

3. Measurements / Labs

From measurement, taking the most recent value per measurement type per patient. Values are z-score normalized against source-level population statistics (stored in source_measurement_stats), so a hemoglobin of 14 g/dL means different things in a source with mean 13.5 vs. 15.0.

Scoring uses inverse Euclidean distance on the z-scored values, computed only over measurement types present in both patients:

score = 1 / (1 + √(mean_squared_diff))

4. Drugs

From drug_exposure, rolled up to the ingredient level via concept_ancestor joined to concept where concept_class_id = 'Ingredient'. This collapses brand names, formulations, and dosage forms into their active ingredients — "Metformin 500 mg tablet" and "Glucophage XR 1000 mg" both become Metformin.

Scoring: Jaccard on ingredient-level concept sets.

5. Procedures

From procedure_occurrence, using distinct procedure concept IDs. No rollup — procedure hierarchies are flatter than condition or drug hierarchies, and exact procedure matching is clinically meaningful.

Scoring: Jaccard on procedure concept sets.

6. Genomic Variants

From genomic_variants (Parthenon's app-schema table linking VCF-parsed variants to OMOP person IDs). Each variant is represented as a (gene, pathogenicity) tuple.

Scoring uses pathogenicity-tiered weighted overlap: pathogenic variants score 3x, likely-pathogenic 2x, VUS 1x. Two patients sharing a pathogenic BRCA1 variant is a stronger match than sharing a VUS in a less actionable gene.

The Missing-Dimension Problem (and Its Elegant Solution)

Not every CDM source has every dimension. SynPUF (CMS synthetic data) has conditions, drugs, and procedures but no lab values and no genomic data. The Pancreatic Cancer Corpus has conditions, drugs, and measurements but no procedures and no genomics (yet). Acumenus CDM has everything except genomics.

A naive approach would give SynPUF patients a 0 on measurements and genomics, penalizing them unfairly. Our approach: missing dimensions reduce the denominator, not the score.

available_dims = dimensions where BOTH patients have data
score = Σ(weight × dim_score) / Σ(weight)    for dims in available_dims

Each patient's feature vector carries a dimensions_available array tracking which dimensions have data. When comparing two patients, the scorer only includes dimensions that are available to both — and the weighted average divides only by the weights of those included dimensions.

This means a SynPUF patient with perfect condition/drug overlap and the same demographics can score 0.95 against another SynPUF patient, even though neither has lab values or genomic data. The score honestly represents the similarity across the data that exists.

Dual Scoring Modes: Interpretable vs. Embedding

The engine supports two algorithmic modes, togglable in the UI:

Interpretable Mode (Pure SQL)

Every candidate in the source is scored against the seed patient using the six dimension scorers described above. This is a brute-force scan — for each candidate, compute weighted Jaccard/Euclidean across all available dimensions, sum, rank. On the Pancreatic Cancer Corpus (361 patients), this takes ~200ms. On Acumenus (1M patients), it's slower but still feasible for pre-filtered queries.

Why it matters: every score is fully decomposable. A researcher can see that patient 341 scored 0.87 because demographics were a perfect match (1.0), conditions overlapped 89.8%, labs were moderately similar (0.60), and drugs were identical (1.0). There is no black box.

Embedding Mode (pgvector ANN + Re-ranking)

For larger populations, the engine offers a two-stage approach:

Stage 1: Approximate Nearest Neighbors. Each patient's structured features are sent to the Python AI service, which encodes them into a 512-dimensional dense vector using SapBERT concept embeddings. Demographics get 32 dimensions, conditions get 128, measurements get 64, drugs get 128, procedures get 96, and genomics get 64. The resulting vector is L2-normalized and stored in pgvector with an IVFFlat index for cosine distance search.

A single pgvector ANN query retrieves the top 200 candidates in milliseconds, even at 1M patients:
```
SELECT person_id, 1 - (embedding <=> ?::vector) AS cosine_similarity
FROM patient_feature_vectors
WHERE source_id = ? AND person_id != ? AND embedding IS NOT NULL
ORDER BY embedding <=> ?::vector
LIMIT 200
```
Stage 2: Re-ranking. The 200 ANN candidates are re-ranked using the same interpretable scorers from mode 1. This means the final results have identical per-dimension score breakdowns regardless of which mode was used. The only difference is how candidates were selected — brute-force scan vs. ANN approximation.

The SapBERT encoding is what makes embedding mode genuinely better than fast Jaccard for semantic matching. SapBERT (a PubMedBERT-based biomedical language model) encodes concept names into 768-dimensional vectors where semantically related concepts are close — "Type 2 diabetes mellitus" and "Insulin resistance" have high cosine similarity even though they share no OMOP ancestor concepts. By mean-pooling SapBERT embeddings across a patient's condition set, the resulting vector captures the gestalt of their clinical profile, not just the discrete concepts.

Bidirectional Cohort Integration

Patient similarity doesn't live in a vacuum — it feeds into and draws from Parthenon's cohort system.

Similarity → Cohort (Export)

After running a similarity search, researchers can click "Export as Cohort" to save the result set as a new cohort definition. They set a minimum similarity score threshold, name the cohort, and the engine writes the matching person_ids into results.cohort. From there, the cohort is available for characterization, estimation, prediction, pathways — every analysis tool in Parthenon.

This enables a workflow that wasn't possible before: start with a patient, find similar ones, export them as a cohort, run a Kaplan-Meier analysis on that cohort. Clinical hypothesis generation driven by concrete clinical intuition rather than abstract inclusion criteria.

Cohort → Similarity (Seed)

The reverse flow is equally powerful. Instead of "find patients similar to this person," researchers can ask "find patients similar to this cohort." The engine computes a centroid — the average feature vector across all cohort members — and searches for patients near that centroid.

The centroid is constructed differently for each mode:

Interpretable: Union of member conditions/drugs/procedures, mean of lab z-scores, median age, mode gender/race. A "virtual patient" representing the cohort's composite phenotype.
Embedding: Mean of member 512-dim embeddings. Mathematically equivalent to the centroid of the cohort in embedding space.

This supports cohort enrichment: start with a small, well-characterized cohort, find similar patients to expand it, validate the expanded cohort against the original inclusion criteria.

Tiered Privacy: HIPAA-Friendly by Default

Parthenon handles OMOP CDM data that may include PHI under HIPAA. The similarity engine respects this with tiered access control:

Default (patient-similarity.view): Results show overall and per-dimension scores, age/gender summaries, and condition/drug counts — but no person_ids, no named conditions, no named drugs. Aggregate-level similarity without patient identification.
With profiles.view: Full person-level results including person_ids (clickable to Patient Profile), named shared conditions, named shared drugs, and the Compare view for head-to-head analysis.

The tiering is enforced at the controller level — the service always computes full results, but the controller strips person-level fields before responding to users without profiles.view permission.

The Data Model

Four new tables in the app schema:

patient_feature_vectors     — One row per patient per source. Demographics,
                              condition/drug/procedure concept arrays (JSONB),
                              z-scored lab vector, genomic variants, 512-dim
                              pgvector embedding, dimensions_available.
                              Unique on (source_id, person_id).

source_measurement_stats    — Population-level measurement statistics per source.
                              Mean, stddev, n_patients, quartiles per measurement
                              concept. Used for z-score normalization.

similarity_dimensions       — Admin-configurable dimension definitions with default
                              weights. Six seeded dimensions, extensible.

patient_similarity_cache    — Result caching keyed on (source, person, mode,
                              weights_hash). 1-hour TTL. Prevents redundant
                              computation for identical queries.

The patient_feature_vectors table carries both structured data (for interpretable scoring) and the dense embedding (for ANN search) in the same row. This co-location means a single query can filter by demographics, retrieve the embedding for ANN, and return structured features for re-ranking — no joins required.

Feature Extraction at Scale

The ComputePatientFeatureVectors Horizon job processes patients in batches of 500. For each batch:

Demographics from person table — age bucketed into 5-year intervals
Conditions from condition_occurrence joined to concept_ancestor (0-3 levels of separation) — ancestor rollup
Measurements from measurement — latest value per concept, z-scored against source_measurement_stats
Drugs from drug_exposure joined to concept_ancestor and concept — ingredient-level rollup
Procedures from procedure_occurrence — distinct procedure concepts
Genomics from genomic_variants — gene/pathogenicity tuples

On the Pancreatic Cancer Corpus (361 patients), full extraction takes 3 seconds. The Acumenus CDM (1 million patients) processes at approximately 8,000 patients per minute — around 2 hours for the full population. The measurement statistics (top 50 measurement types by patient count, minimum 10 patients, non-zero standard deviation) are computed once upfront and take 5-10 seconds.

The job is idempotent — it uses updateOrCreate on the (source_id, person_id) unique key, so re-running it on the same source updates existing vectors and adds new patients without duplicates.

What We Reused (and Why It Matters)

One of the most satisfying aspects of this build was how much existing Parthenon infrastructure we could leverage:

Existing Component	How We Reused It
`PatientFeatureExtractor` (PopulationRisk)	Pattern for demographics/conditions/measurements extraction
`FeatureBuilderInterface` (Analysis/Features)	Modular feature extraction pattern with 6 implementations
`SapBERT service` (ai/services/sapbert.py)	Core of embedding generation — encode concept names to 768-dim vectors
`pgvector` + `search_nearest` pattern	Already deployed for concept embeddings, extended for patient embeddings
`SourceContext`	Dynamic schema isolation — one codebase works across all CDM sources
`ConceptResolutionService`	Ancestor concept rollup for condition/drug hierarchies
Horizon queue infrastructure	Background job processing with monitoring via dashboard
`PatientProfileService`	Integrated for contextual "Find Similar" entry point
Spatie RBAC	Permission-based tiered access (patient-similarity.view, profiles.view)

We didn't build a similarity engine from scratch. We built a new composition of capabilities that Parthenon had been developing for months — embeddings, vector search, feature extraction, schema isolation, RBAC — and surfaced them through a new lens.

The API Surface

Seven endpoints behind auth:sanctum with Spatie RBAC:

Method	Endpoint	Permission	Purpose
POST	`/patient-similarity/search`	patient-similarity.view	Single-patient similarity search
POST	`/patient-similarity/search-from-cohort`	patient-similarity.view + cohorts.view	Cohort-seeded similarity search
GET	`/patient-similarity/compare`	patient-similarity.view + profiles.view	Head-to-head patient comparison
POST	`/patient-similarity/export-cohort`	patient-similarity.view + cohorts.create	Export results as cohort definition
GET	`/patient-similarity/dimensions`	patient-similarity.view	List configurable dimensions
POST	`/patient-similarity/compute`	patient-similarity.compute	Trigger feature extraction
GET	`/patient-similarity/status/{sourceId}`	patient-similarity.view	Extraction status + staleness

The search endpoint accepts user-adjustable weights:

{
  "person_id": 1,
  "source_id": 47,
  "mode": "interpretable",
  "weights": {
    "demographics": 1.0,
    "conditions": 3.0,
    "measurements": 2.0,
    "drugs": 1.0,
    "procedures": 1.0,
    "genomics": 5.0
  },
  "limit": 25,
  "min_score": 0.3
}

Boosting genomics to 5.0 makes the engine prioritize shared variant profiles. Zeroing out demographics removes age/gender/race from the scoring entirely. The weights are fully user-controlled, per-search.

Validation Results

Pancreatic Cancer Corpus (361 patients, 4 dimensions)

Metric	Value
Extraction time	3 seconds
Search latency (interpretable)	~200ms
Dimensions available	demographics, conditions, measurements, drugs
Top match for person_id=1	Person 341: 0.87 overall (demo 1.0, conditions 0.90, labs 0.60, drugs 1.0)
Missing dimensions	procedures (null), genomics (null) — correctly excluded from scoring

Custom weight validation: boosting conditions to 3.0 correctly reranked Person 141 (95.9% condition overlap) above Person 341 (89.8% conditions but perfect demographics).

Acumenus CDM (1M patients, 5 dimensions)

Metric	Value
Extraction rate	~8,000 patients/minute
Dimensions available	demographics, conditions, measurements, drugs, procedures
Top match for person_id=1	Person 985: 0.72 overall (demo 0.80, conditions 0.82, labs 0.53, drugs 0.58, procedures 0.86)
Missing dimensions	genomics (null) — correctly excluded

The lower overall scores on Acumenus are expected — with a million diverse patients, even the best match will have more divergence than in a specialized 361-patient corpus.

The Frontend Experience

The Patient Similarity page follows Parthenon's dark clinical theme and offers:

Search form with source selector, patient ID input, and dimension weight sliders (0-5, step 0.5)
Mode toggle between Interpretable and Embedding
Results table with ranked patients, overall score, and per-dimension score bars (teal >0.7, gold >0.4, grey below)
Compare link on each result row for head-to-head patient analysis
Staleness indicator showing when feature vectors were last computed with a "Recompute" action
Search mode toggle between "Single Patient" and "From Cohort" for both entry workflows
Export as Cohort button for saving result sets as cohort definitions

Contextual entry points are embedded throughout the platform:

Patient Profile page: a "Find Similar Patients" button pre-fills the search with the current patient and source
Cohort Definitions: a "Find Similar to Cohort" action opens the similarity page in cohort-seed mode

What's Next

The engine ships today with Phases 1-4 complete. Phase 5 remains as a backlog of advanced capabilities:

Temporal similarity — consider when conditions, treatments, and events occurred relative to each other, not just which ones
Imaging radiomics — tumor volumetrics and radiomic features from DICOM via Orthanc
Clinical notes NLP — embed note_nlp content for text-based phenotype matching
Learned patient embeddings — train a Patient2Vec or transformer model on Parthenon's CDM data for temporal-aware embeddings
Weighted Personalized PageRank — implement the Oracle PLM graph algorithm as an alternative to vector-based scoring
Cross-source federated similarity — find patients in the Pancreatic Cancer Corpus who are similar to an Acumenus patient, blending data across CDM sources without co-locating patient records
Tumor Board integration — the Molecular Tumor Board's existing genomic matching will be unified with the similarity engine, so clinicians see genomic and clinical similarity in one view

Conclusion

The Patient Similarity Engine is the kind of feature that seems obvious in retrospect — of course a research platform should let you find similar patients across all available clinical dimensions. But implementing it correctly required solving a specific, non-trivial composition of challenges: multi-modal feature extraction from OMOP CDM, missing-dimension tolerance, dual algorithmic approaches with shared interpretability, bidirectional cohort integration, HIPAA-conscious tiered access, and source-agnostic architecture that works from 361 patients to 1 million.

What makes this a milestone for Parthenon isn't the similarity scoring itself — Jaccard and cosine distance have been around for decades. It's the integration. Patient similarity is woven into the cohort builder, the patient profile, the tumor board, and the permission system. It's not a standalone tool bolted onto the side. It's a new lens through which every other capability in the platform becomes more powerful.

Thirteen commits. Four phases. Six dimensions. One button: Find Similar Patients.

Poseidon and Vulcan: The Gods of Continuous Data Ingestion

Sat, 28 Mar 2026 18:00:00 GMT

Healthcare data does not arrive in neat packages. It streams — continuously, chaotically, from dozens of transactional systems that were never designed to talk to each other. EHR encounters appear as HL7 ADT messages. Lab results materialize through OBX segments hours after the draw. Radiology reports surface from PACS archives with inconsistent coding. Claims trickle in from clearinghouses days or weeks after the visit. Genomic panels arrive as VCF files from external laboratories with their own nomenclatures and timelines.

Transforming this unruly sea of clinical data into a coherent, research-ready OMOP Common Data Model is the central engineering challenge of any outcomes research platform. And until now, Parthenon handled it the same way most platforms do: as a series of one-time bulk loads. Upload a file. Map the concepts. Write the CDM. Move on.

That era is over.

Today we introduce two new engines to the Parthenon pantheon — Vulcan and Poseidon — purpose-built for the reality of continuous healthcare data integration.

Vulcan: God of the FHIR

In Roman mythology, Vulcan was the god of fire and the forge — the divine craftsman who shaped raw materials into instruments of power. His forge burned at the heart of Mount Etna, transforming crude ore into the weapons and tools that the other gods depended on.

In Parthenon, Vulcan occupies an analogous role. He is the FHIR integration engine — the system that connects directly to EHR servers, extracts clinical data through standardized FHIR R4 interfaces, and forges it into OMOP CDM records ready for analysis.

What Vulcan Does

Vulcan operates through a connection-backed bulk sync architecture. Data stewards attach a registered FHIR server connection to an ingestion project, then trigger incremental or full exports. The pipeline handles the rest:

FHIR Bulk Data Access ($export) — Vulcan initiates SMART Backend Services or anonymous bulk export requests against FHIR R4 servers. It manages the asynchronous polling lifecycle — submitting the export, monitoring the status endpoint, downloading NDJSON files when ready, and handling the inevitable timeouts and retries that bulk exports entail.

Connection Management — Each FHIR connection is a named configuration: server URL, authentication mode (SMART Backend Services with JWKS, client credentials, or anonymous for public test servers), target resource types, group identifiers for filtered exports, and incremental sync tracking. Connections are registered once by administrators and reused across projects.

Incremental Sync — After the initial full export, subsequent syncs request only resources modified since the last successful run. Vulcan tracks the _since parameter per connection, ensuring that each sync captures new admissions, updated lab results, and corrected diagnoses without re-processing the entire dataset.

Workspace Operations Console — The Vulcan workspace provides real-time visibility into sync operations: connection status, last sync time, record counts, mapping coverage percentages, and a full history of sync runs with extraction and mapping metrics. Sync controls are immediate — one button for incremental refresh, another for full re-export.

NDJSON Bundle Sandbox — For ad-hoc validation, Vulcan includes a sandbox mode where individual FHIR bundles or NDJSON files can be uploaded directly for concept mapping spot-checks — useful for verifying that a new server's coding conventions map cleanly before committing to a full sync.

The Architecture of FHIR at Scale

Vulcan's design reflects the operational reality of FHIR bulk data access. Public test servers like HAPI R4 and Firely are useful for development but unreliable for sustained bulk exports. Production Epic, Cerner, and MEDITECH deployments behave differently — they enforce rate limits, require SMART Backend Services authentication with rotating JWKS keys, and produce NDJSON files that can exceed gigabytes for large patient populations.

Vulcan handles this through a queue-driven architecture. Each sync run dispatches a RunFhirSyncJob onto Laravel Horizon's Redis-backed queue. The job manages the full export lifecycle asynchronously — polling status endpoints, downloading resources, mapping FHIR codes to OMOP concepts, and writing CDM records — while the frontend auto-refreshes every 10 seconds to reflect progress. If the export fails or times out, the run is marked with a clear error message and the connection remains ready for retry.

The key insight: FHIR integration in healthcare is inherently asynchronous and failure-prone. Vulcan's architecture embraces this rather than fighting it. Every operation is resumable, every failure is visible, and every run produces an auditable record of what was extracted, what was mapped, and what was written.

Poseidon: Ruler of the Data Seas

Where Vulcan commands the fire of FHIR, Poseidon rules the seas — the vast, churning ocean of transactional data that flows from every clinical system in the enterprise.

In mythology, Poseidon wielded his trident to control the waves, calm storms, and shake the earth itself. In Parthenon, Poseidon is the CDM refresh orchestration engine — powered by dbt (Data Build Tool) for SQL-based transformations and Dagster for dependency-aware scheduling and observability. He takes the raw data that Aqueduct stages and transforms it into a living, breathing OMOP CDM that stays current as the underlying sources change.

Why Poseidon Exists

Aqueduct — Parthenon's existing ingestion pipeline — handles the initial ETL brilliantly: file upload, profiling, AI-assisted concept mapping, schema mapping, and CDM writing. But Aqueduct operates on a batch paradigm. You upload data, map it, write it, and the job is done.

Healthcare data sources are not batch systems. EHR databases accumulate new encounters hourly. LIMS systems process lab results continuously. PACS archives ingest imaging studies around the clock. Claims feeds arrive on weekly or monthly cycles. Each of these sources produces data that must flow into the CDM incrementally — without duplicating existing records, without violating foreign key constraints, and without requiring a full rebuild every time.

This is the problem Poseidon was designed to solve.

The dbt Transformation Layer

At Poseidon's core is a dbt project — a collection of SQL-based models that define how raw staged data transforms into OMOP CDM tables. Each CDM table (person, visit_occurrence, condition_occurrence, drug_exposure, measurement, observation, procedure_occurrence, and more) is a dbt model with:

Incremental Materialization — Poseidon's models use dbt's incremental materialization strategy with merge semantics. On each run, only new or modified records are processed. The WHERE modified_date > last_run_date filter ensures that a nightly refresh of a million-patient CDM processes only the day's new encounters — not the entire history.

Dependency-Aware Execution — dbt understands the directed acyclic graph (DAG) of table dependencies. person must load before visit_occurrence. Visits must exist before condition_occurrence can reference them via foreign keys. observation_period depends on the union of all clinical event tables. Poseidon respects this dependency graph automatically — no manual ordering, no failed runs from FK violations.

Schema Tests — Every CDM model carries built-in data quality assertions: not-null constraints on required fields, uniqueness checks on primary keys, foreign key relationships validated against the vocabulary and person tables, accepted-value checks on concept IDs, and temporal plausibility tests (no events before birth, no events after death). These tests run as part of every refresh, catching data quality issues before they propagate into analyses.

Vocabulary-Aware Transformations — Poseidon's custom macros (concept_lookup, standard_concept) perform source-to-standard concept mapping within dbt SQL. Source codes from EHR systems are resolved to standard OMOP concepts through the shared vocabulary schema — the same vocabulary that powers Parthenon's Concept Explorer and Hecate semantic search.

Schema Routing — A custom generate_schema_name macro routes each model to the correct PostgreSQL schema per source. The same dbt models can produce CDM tables in omop, synpuf, irsf, pancreas, or any other source schema — controlled by a single variable at run time.

The Dagster Orchestration Layer

dbt handles the what — Dagster handles the when, how, and what-if:

Software-Defined Assets — Every CDM table is a Dagster asset backed by a dbt model. Dagster tracks the materialization state of each asset — when it was last refreshed, whether the refresh succeeded, and what downstream assets depend on it. The asset graph provides a complete lineage view from staging tables through intermediate transformations to final CDM tables.

Per-Source Scheduling — Different data sources have different cadences. EHR feeds might refresh nightly at 2 AM. LIMS data might arrive hourly. Claims feeds might land weekly. Poseidon supports per-source cron schedules, each with its own cadence, dbt selector (e.g., tag:ehr or source:staging_acumenus), and activation state.

Event-Driven Sensors — Beyond cron schedules, Poseidon can watch for events: new rows in a staging table, a FHIR webhook notification from Vulcan, or a file drop in a monitored directory. When the sensor fires, Poseidon automatically triggers the appropriate refresh pipeline.

Manual Triggers — Data stewards can trigger incremental or full refreshes on demand through the Poseidon operations console — useful for ad-hoc loads, post-mapping corrections, or testing new source integrations.

The Operations Console

Poseidon's frontend is a single-page operations console designed for the daily reality of data stewardship — not a DevOps dashboard, but a clinical data control tower:

Overview Metrics — Active schedules, runs in progress, success/failure counts at a glance.

Source Schedules — Each configured source shows its schedule type (cron, sensor, or manual), cron expression, last run time, next scheduled run, and run count. Activate, pause, or trigger runs directly from the schedule card.

Recent Runs — A live table of recent pipeline executions with source, run type, status, trigger method, and duration. Click any run to expand inline details: rows inserted, rows updated, models materialized, tests passed and failed, and full error messages for failed runs.

CDM Freshness — A grid view of every CDM asset with its last materialization timestamp. Stale assets (not refreshed in 24+ hours) are highlighted in gold — immediately visible, immediately actionable.

Asset Lineage — A tiered dependency view showing the flow from staging through intermediate transformations to CDM tables and quality models. Not a decorative graph — a diagnostic tool for understanding impact when a source fails or a model changes.

How They Work Together

Vulcan and Poseidon are not competing systems. They occupy different positions in the data lifecycle and are designed to complement each other:

EHR / FHIR Server
      |
      v
  [ Vulcan ]  ------>  FHIR Bulk Export  ------>  Staged Data
                                                      |
Flat Files / DB                                       |
      |                                               v
      v                                        [ Poseidon ]
  [ Aqueduct ]  ------>  Profiling + Mapping         |
      |                                               v
      v                                     Incremental CDM
  Staged Data  -------------------------------->  Refresh
                                                      |
                                                      v
                                              OMOP CDM Tables
                                                      |
                                                      v
                                            Achilles / DQD / Analyses

Vulcan handles the FHIR-specific integration layer: connecting to servers, managing authentication, handling the bulk export lifecycle, and staging FHIR resources as relational data. Once staged, the data enters the same pipeline as any other source.

Poseidon handles the transformation layer: taking staged data from any source (Vulcan, Aqueduct file uploads, direct database connections) and maintaining the CDM through incremental, dependency-aware, vocabulary-mapped, quality-tested refreshes.

Aqueduct remains the one-time bulk ETL tool: file upload, profiling, AI-assisted concept mapping, schema mapping, and initial CDM writing. It is the craftsman's workshop where new data sources are onboarded. Once the mappings are confirmed, Poseidon takes over for ongoing maintenance.

Together, they transform Parthenon from a platform that receives data to one that continuously integrates it — a living analytical environment where the CDM reflects the current state of the clinical enterprise, not a snapshot from the last quarterly load.

The Naming Convention

Parthenon's feature naming follows the architecture of classical mythology:

Feature	Namesake	Domain
Parthenon	The temple of Athena	The platform itself — wisdom through evidence
Aqueduct	Roman water engineering	Bulk data ingestion and ETL pipelines
Vulcan	God of fire and the forge	FHIR integration — forging interoperability standards into CDM
Poseidon	God of the sea	Continuous data orchestration — commanding the waves of transactional data
Achilles	Greatest warrior of Troy	Data characterization — relentless, thorough, exhaustive
Hecate	Goddess of crossroads	Semantic vocabulary search — navigating the intersections of meaning
Abby	(Athena's owl)	AI assistant — intelligence through accumulated knowledge
Ares	God of war	Data quality dashboard — aggressive defense of data integrity

Each name is chosen not just for flavor but for functional resonance. Vulcan forges raw FHIR resources into structured CDM records. Poseidon governs the tidal rhythms of data flow. The names tell you what each system does if you know the mythology — and they make the platform memorable for those who don't.

What This Means for Research

The practical impact of continuous ingestion is profound:

Near-real-time cohort surveillance — Cohort definitions that previously reflected quarterly snapshots now reflect yesterday's admissions. Researchers can monitor recruitment criteria as patients enter the system, not after the fact.

Faster time to analysis — When a new data source is onboarded through Aqueduct and handed off to Poseidon, subsequent updates are automatic. The analyst's CDM stays current without manual intervention.

Reduced data engineering burden — Data stewards configure a schedule once. Poseidon handles the recurring execution, monitors for failures, and surfaces freshness issues. The human role shifts from executing pipelines to overseeing them.

Improved data quality — Every Poseidon refresh runs dbt's built-in schema tests and custom quality assertions. Data quality is validated on every load, not as an afterthought.

Auditable provenance — Every sync run, every CDM refresh, every test outcome is recorded. When a researcher asks "when was this data last updated?" or "did any quality checks fail?", the answer is one click away.

Looking Ahead

Vulcan and Poseidon represent Phase 1 and Phase 5 of a six-phase implementation plan. The remaining phases will add:

Core dbt models covering all 20+ OMOP CDM clinical tables with incremental materialization
Dagster sensors and schedules for fully automated, event-driven pipeline execution
Aqueduct-to-Poseidon handoff — confirmed mappings automatically generate dbt models
Production hardening — retry policies, alerting, run history management, and Dagit UI proxy

The gods have taken their stations. The data flows.

Vulcan and Poseidon are available now in Parthenon's Data Ingestion module. Navigate to the Poseidon or Vulcan tabs to begin configuring continuous ingestion for your data sources.

Building a Clinically Intelligent Risk Scoring Engine on OMOP CDM

Sat, 28 Mar 2026 12:00:00 GMT

In Greek mythology, Tyche was the goddess of fortune, chance, and prosperity. Depicted with a cornucopia of abundance and the wheel of fate, she governed the unpredictable forces that determined whether a city would flourish or fall. The ancient Greeks understood that outcomes are shaped by forces beyond individual control — health, circumstance, and probability. In the Parthenon pantheon, Tyche presides over population risk scoring: the quantification of clinical probability, the stratification of patients by the likelihood of outcomes they cannot fully control, and the transformation of uncertainty into actionable intelligence.

We built a population risk scoring engine that runs 20 validated clinical risk calculators against any OMOP CDM dataset — then immediately realized the approach was wrong. This post covers what we built, why we tore it apart, and the v2 architecture that replaced "run everything on everyone" with cohort-scoped, recommendation-driven clinical risk analysis.

The Problem with "Run All"

Clinical risk scores are precision instruments. A Framingham Risk Score was designed for adults aged 30-74 without prior cardiovascular events. CHADS2-VASc only applies to patients with atrial fibrillation. MELD is for liver disease severity. Running all 20 scores against a pancreatic cancer cohort produces a page full of "low" and "uncomputable" — clinically meaningless results that make the platform look naive.

But that's exactly what v1 did. We implemented 20 risk calculators, wired them to a "Run All" button, and watched the results pour in. Framingham returned "uncomputable" for 66% of our cancer patients (no lipid panels). CHADS2-VASc returned 0 for everyone (no atrial fibrillation). Charlson returned mean CCI of 0.37 for a cohort where every single patient has cancer — because the concept IDs were wrong.

That last part was the wake-up call.

Where Hallucinated Concepts Go to Die

Our first Charlson implementation used concept ID 4178681 for "any malignancy." It seemed right. The code was clean. The SQL ran without errors. The score computed to 0.37 for a cohort of 361 pancreatic cancer patients who should all score at least 2.

We queried the vocabulary:

SELECT concept_id, concept_name FROM vocab.concept WHERE concept_id = 4178681;

concept_id	concept_name
4178681	Dermatological complication of procedure

Not malignancy. A dermatological complication. The concept ID was fabricated — confidently wrong, plausibly formatted, and catastrophically misleading. Every patient in our cancer cohort was being matched against a skin procedure concept. Of course the CCI was near zero.

This wasn't an edge case. Ten of our twenty score implementations had the same problem: concept IDs pulled from training data rather than queried from the actual OMOP vocabulary. Some were close enough to pass a cursory review. Others were entirely fictional.

The fix was straightforward but non-negotiable: every concept ID must be verified against vocab.concept at development time, and resolved via concept_ancestor at runtime. No exceptions. No "I'm pretty sure this is right." Query the vocabulary or don't write the code.

The Vocabulary Is the Source of Truth

OMOP CDM's strength is its standardized vocabulary. Concept hierarchies, ancestor relationships, and cross-vocabulary mappings are the foundation that makes population-level analytics work. Ignoring them — or approximating them from memory — defeats the purpose.

Here's what the correct Charlson malignancy lookup looks like:

-- "Malignant neoplastic disease" (443392) is the verified ancestor
SELECT concept_id, concept_name FROM vocab.concept WHERE concept_id = 443392;
-- Returns: Malignant neoplastic disease

-- Verify our PDAC concept is a descendant
SELECT min_levels_of_separation
FROM vocab.concept_ancestor
WHERE ancestor_concept_id = 443392
  AND descendant_concept_id = 4180793; -- Malignant tumor of pancreas
-- Returns: 3 (three levels of separation — it IS a descendant)

One query. Definitive answer. Our pancreatic cancer concept (4180793) sits three levels below the general malignancy ancestor (443392) in the SNOMED hierarchy. Every patient with PDAC now correctly matches the Charlson "any malignancy" condition group.

We verified all 17 Charlson condition groups this way:

Group	Ancestor	Verified Concept
MI	4329847	Myocardial infarction
CHF	319835	Congestive heart failure
Malignancy	443392	Malignant neoplastic disease
Metastatic tumor	432851	Metastatic malignant neoplasm
Diabetes	201820	Diabetes mellitus
COPD	255573	Chronic obstructive pulmonary disease
Renal disease	46271022	Chronic kidney disease
HIV/AIDS	439727	Human immunodeficiency virus infection
...	...	...

With verified ancestors and runtime descendant resolution, the Charlson now correctly scores our pancreatic cancer cohort: 226 patients at CCI=2 (cancer only), 135 patients at CCI=3 (cancer + Type 2 diabetes).

From "Run All" to Recommendation-Driven

The concept ID fix was necessary but not sufficient. The fundamental design was still wrong: presenting 20 scores to every user for every cohort. A researcher studying pancreatic cancer doesn't need CURB-65 (pneumonia severity) or STOP-BANG (sleep apnea risk). Showing them alongside Charlson creates noise and erodes trust.

v2 Architecture: Cohort-Scoped Risk Analysis

The redesigned engine is built around a simple principle: risk scores are only meaningful when applied to the right population. The system should know which scores apply and recommend them.

The flow:

Researcher selects a target cohort (e.g., "All PDAC Patients" — 361 subjects)
The recommendation engine profiles the cohort: demographics, condition prevalence, measurement availability
Based on the profile, it recommends applicable scores with relevance reasons:
- Charlson CCI: Recommended — "100% of cohort has malignancy conditions; 37% have diabetes"
- FIB-4 Index: Recommended — "Liver function relevant for chemo hepatotoxicity monitoring; labs available"
- CHADS2-VASc: Not applicable — "Less than 1% atrial fibrillation prevalence in cohort"
Researcher confirms selection
Scores execute scoped to the cohort membership, storing patient-level results

Score Eligibility Criteria

Each score declares its eligibility as structured criteria, not just a human-readable string:

public function eligibilityCriteria(): array
{
    return [
        'population_type' => 'universal',
        // Universal scores (Charlson, Elixhauser) apply to any cohort.
        // Condition-specific scores (CHADS2-VASc, MELD) require prerequisite conditions.
        // Age-restricted scores (Framingham, SCORE2) need patients in the right age range.
    ];
}

The recommendation engine uses these criteria plus the cohort's actual clinical profile to make intelligent suggestions. A cardiovascular screening cohort gets Framingham and Pooled Cohort Equations. A liver disease cohort gets MELD and Child-Pugh. A cancer cohort gets Charlson, Elixhauser, and Multimorbidity Burden.

Runtime Concept Resolution

Instead of hardcoded concept IDs in SQL templates, v2 scores declare clinical condition groups with verified ancestor concepts:

public function conditionGroups(): array
{
    return [
        ['label' => 'Myocardial infarction', 'ancestor_concept_id' => 4329847, 'weight' => 1],
        ['label' => 'Malignant neoplastic disease', 'ancestor_concept_id' => 443392, 'weight' => 2],
        ['label' => 'Metastatic malignant neoplasm', 'ancestor_concept_id' => 432851, 'weight' => 6],
        // ...
    ];
}

At execution time, the ConceptResolutionService resolves each ancestor to its full descendant set via concept_ancestor. This means:

Different vocabulary versions produce correct results automatically
No hardcoded concept IDs in scoring logic
The vocabulary is always the source of truth, queried live

Results are cached for one hour to avoid redundant ancestor lookups across multiple score executions.

Pure Computation, Separate Data Access

v1 scores were SQL templates — the scoring logic was tangled with data access. A Charlson score was a 200-line SQL CTE chain that both fetched conditions and computed weights. Debugging meant reading SQL. Testing meant running against a database.

v2 separates these concerns:

PatientFeatureExtractor — queries condition_occurrence, measurement, and person tables for the entire cohort in one efficient batch
Score.compute() — a pure PHP function that receives extracted features and returns a score. No database access. Testable with mock data.

public function compute(array $patientData): array
{
    // $patientData contains: age, gender, conditions (as ancestor IDs), measurements
    // Returns: score value, risk tier, confidence, completeness, missing components
}

This makes each score independently testable, debuggable, and auditable. The Charlson compute() method is 50 lines of clear PHP logic with explicit supersession rules (metastatic trumps malignancy, severe liver trumps mild liver).

Patient-Level Persistence

v1 stored only population summaries — mean scores and tier counts. Useful for dashboards, useless for research. v2 stores every patient's individual score:

SELECT person_id, score_value, risk_tier, confidence
FROM app.risk_score_patient_results
WHERE score_id = 'RS005' AND risk_tier = 'moderate'
ORDER BY score_value DESC;

This enables:

Patient-level drill-through from any risk tier to the Patient Profile
Using risk scores as cohort inclusion criteria (future: "Charlson >= 3" as a cohort filter)
Exporting patient-level risk stratification for downstream analysis
Comparing risk distributions across cohorts

The 20 Scores

Parthenon ships with 20 validated clinical risk calculators spanning six clinical domains:

Category	Scores	Key Use Case
Cardiovascular	Framingham, Pooled Cohort Equations, CHA2DS2-VASc, HAS-BLED, SCORE2, TIMI, GRACE, CHADS2, RCRI	CV event prediction, stroke risk in AF, bleeding risk, pre-operative cardiac risk
Comorbidity	Charlson CCI, Elixhauser, Multimorbidity Burden	Overall disease burden, mortality prediction, resource utilization
Hepatic	MELD, Child-Pugh, FIB-4	Liver transplant priority, cirrhosis severity, fibrosis staging
Pulmonary	CURB-65, STOP-BANG	Pneumonia severity, sleep apnea screening
Metabolic	Metabolic Syndrome Score, DCSI	Metabolic risk clustering, diabetes complications
Musculoskeletal	FRAX	Osteoporotic fracture risk

Each score implements the same v2 interface. Adding a new score means implementing one PHP class with ~100 lines of code: eligibility criteria, condition/measurement groups, risk tiers, and a compute() method.

Running It on the Pancreatic Cancer Corpus

Our test dataset: 361 patients with pancreatic ductal adenocarcinoma (PDAC) across three sub-cohorts — 21 PANCREAS-CT imaging patients, 168 CPTAC-PDA pathology patients, and 172 TCGA-PAAD genomics patients. Full clinical trajectories: visits, labs, drugs, conditions, procedures, specimens, 1,227 clinical notes, and genomic mutation profiles (KRAS/TP53/SMAD4/CDKN2A).

We ran the recommendation engine against the "All PDAC Patients" cohort:

Recommended:

Charlson CCI — universal applicability, 100% have malignancy conditions
Elixhauser Index — universal, captures T2DM, cachexia, DVT
Multimorbidity Burden — broad comorbidity assessment
FIB-4 — liver function labs available, relevant for chemotherapy hepatotoxicity monitoring

Not applicable:

CHADS2-VASc, CHADS2 — less than 1% atrial fibrillation
MELD, Child-Pugh — no primary liver disease
CURB-65 — no pneumonia diagnoses
Framingham, PCE, SCORE2 — missing lipid panels for most patients

This is exactly what a clinical researcher would expect. The engine's recommendations align with clinical judgment because they're derived from the actual data, not from assumptions about what a cancer cohort "probably" needs.

Charlson CCI Results

Tier	Patients	Mean CCI	Interpretation
Low (0-2)	226	2.0	Cancer only — no additional comorbidities
Moderate (3-4)	135	3.0	Cancer + one comorbidity (typically T2DM)

All 361 patients correctly score at least 2 (any malignancy, weight 2). The 37% with Type 2 diabetes score 3 (malignancy + diabetes, weight 1). No patient scores below 2. No patient is "uncomputable." The vocabulary hierarchy resolution works.

What's Next

The v2 backend is complete. Remaining work:

Frontend analysis creator — cohort selector with recommendation cards, score selection, execution modal (replicating the Achilles UX pattern)
Results visualization — tier distribution charts, patient drill-through tables
Score migration — converting the remaining 19 scores from v1 SQL templates to v2 pure-compute implementations
Cohort builder integration — using risk scores as cohort inclusion criteria ("Charlson >= 3 AND KRAS mutant" as a single cohort definition)

The architectural lesson: clinical analytics tools must respect clinical context. A risk score without population awareness is just a number. A risk score that knows when it's relevant — and when it's not — is a clinical decision support tool.

Technical Summary

Component	Technology
Score Engine	Laravel 11 / PHP 8.4
Vocabulary Resolution	vocab.concept_ancestor (runtime, cached)
Feature Extraction	Bulk SQL with DISTINCT ON, PostgreSQL ANY()
Patient Storage	app.risk_score_patient_results (indexed by cohort + person)
Execution Tracking	AnalysisExecution polymorphism + RiskScoreRunStep
Score Interface	PopulationRiskScoreV2Interface with pure compute()
Database	PostgreSQL 17, OMOP CDM v5.4

All 20 scores, the recommendation engine, and the execution pipeline are open source under Apache 2.0.

The Magical Ladies of Parthenon

Fri, 27 Mar 2026 00:00:00 GMT

In Greek mythology, the great temple atop the Acropolis housed not just Athena, but an entire pantheon of divine figures — each wielding a unique gift. Parthenon, our unified OHDSI outcomes research platform, follows the same philosophy. Behind the scenes, four mythological women power the intelligence layer that transforms raw clinical data into actionable research: Hecate, Phoebe, Ariadne, and Arachne.

From left to right: Hecate (torch-bearer of hidden knowledge), Ariadne (thread-spinner of vocabulary mappings), Phoebe (oracle of concept relationships), and Arachne (weaver of the federated network).

Each of these engines appears throughout the Parthenon interface as a distinctive "Powered by" pill — teal for Hecate, gold for Phoebe, crimson for Ariadne, and violet for Arachne. They aren't cosmetic labels. They represent four fundamentally different approaches to the same grand challenge: helping researchers find the right concepts, build complete concept sets, map between vocabularies, and execute studies across a distributed network of clinical databases.

This post tells the story of who they are, what they do, and how they came to life.

Hecate: The Torch-Bearer of Hidden Knowledge

Color: Teal (#2DD4BF) | Domain: Semantic concept search | Technology: Vector embeddings + Qdrant

In mythology, Hecate stood at crossroads with a torch in each hand, illuminating paths hidden from mortal sight. In Parthenon, she does the same for clinical concepts.

The Problem She Solves

Traditional vocabulary search is keyword-based. Search for "heart attack" and you'll find concepts named "heart attack" — but you might miss myocardial infarction, STEMI, acute coronary syndrome, or troponin elevation. Clinical researchers think in medical concepts, not in exact vocabulary strings. The gap between how a researcher thinks about a condition and how OMOP CDM encodes it can mean the difference between a complete cohort and a dangerously incomplete one.

How She Works

Hecate operates through a three-layer architecture:

Embedding Layer (Ollama + EmbeddingGemma-300M): Every standard concept in the OMOP vocabulary (1,968,694 of them) is passed through a medical-domain embedding model running locally via Ollama. Each concept name becomes a 768-dimensional vector that captures its semantic meaning, not just its characters.
Vector Index (Qdrant): These ~2 million vectors are stored in a Qdrant collection called meddra, with cosine similarity indexing. When a researcher types a query, Hecate embeds the query text through the same model and performs approximate nearest-neighbor search against the full vocabulary.
Concept Resolution (PostgreSQL): The nearest vectors map back to OMOP concept IDs through a pairs file (1.94 million unique concept names), and the full concept metadata (domain, vocabulary, class, standard status) is resolved from PostgreSQL.

What Makes Her Special

Search for "sugar disease" and Hecate returns Diabetes mellitus (SNOMED 201820) at 0.93 similarity. Search for "broken hip" and she returns Fracture of neck of femur alongside Hip fracture and Intertrochanteric fracture. She understands medical synonymy, abbreviations, and even casual descriptions — because the embedding model learned those relationships from medical literature.

She also powers the autocomplete in Parthenon's vocabulary browser, the concept search within the ETL mapping tool (Aqueduct), and the concept picker in cohort definitions.

The Numbers

Metric	Value
Total concepts embedded	1,968,694
Phase 1 (Clinical)	705,294 concepts
Phase 2 (Drug/RxNorm)	1,263,400 concepts
Embedding dimension	768
Model	EmbeddingGemma-300M (local)
Index	Qdrant v1.17, cosine similarity
Query latency	~50ms typical

Phoebe: The Oracle of Concept Relationships

Color: Gold (#C9A227) | Domain: Concept set recommendations | Technology: Pre-computed co-occurrence network from 22 global data sources

Phoebe was the Titan of prophecy and radiant intellect — grandmother of Apollo and Artemis, keeper of the Oracle at Delphi before Apollo claimed it. In Parthenon, she whispers to researchers: "You're building a concept set for diabetes — have you considered these 733 related concepts?"

The Problem She Solves

Building a comprehensive concept set is one of the hardest tasks in observational research. A researcher creating a cohort for "Type 2 Diabetes" needs to decide: should I include Diabetes mellitus type 2 without complication? What about Diabetic neuropathy? Insulin resistance? HbA1c measurement? The OMOP vocabulary contains millions of concepts with complex hierarchical and lateral relationships. Missing a critical concept can bias an entire study.

How She Works

Phoebe is powered by the OHDSI concept_recommended dataset — a pre-computed network of 3,768,447 concept-to-concept recommendation pairs, derived from analyzing concept usage patterns across 22 real-world healthcare databases spanning 6 countries and 272 billion clinical records.

The recommendations come in five relationship types:

Relationship	Count	What It Captures
Lexical via standard	1,383,892	Concepts with similar names in standard vocabularies
Ontology-descendant	1,111,848	Child concepts in the vocabulary hierarchy
Ontology-parent	1,095,982	Parent concepts in the vocabulary hierarchy
Patient context	135,033	Concepts that co-occur in the same patients across databases
Lexical via source	41,692	Concepts with similar names in source vocabularies

The Patient context relationships are the most valuable — they represent real-world clinical co-occurrence patterns. If patients with Diabetes mellitus frequently also have records for Diabetic retinopathy screening, that relationship is captured even though the two concepts are in different domains and different vocabulary hierarchies.

What Makes Her Special

When a researcher selects concept 201820 (Diabetes mellitus), Phoebe returns 733 recommended concepts spanning complications (neuropathy, retinopathy, nephropathy), related measurements (HbA1c, fasting glucose), medications (metformin, insulin), and associated conditions (metabolic syndrome, obesity). She surfaces concepts that a researcher should consider based on how the global OHDSI network actually uses them together.

She's integrated into Parthenon's Concept Set Editor — as you add concepts to your set, Phoebe aggregates recommendations across all included concepts, deduplicates them, and ranks by relevance. The panel is collapsible and non-intrusive, but when expanded, it's a revelation.

The Data Pipeline

The concept_recommended dataset is published by OHDSI through the Broadsea project and is based on the ConceptPrevalence study led by Anna Ostropolets. We load it into a vocab.phoebe table and query it directly — no external service dependency, sub-millisecond response times.

Ariadne: The Thread-Spinner of Vocabulary Mappings

Color: Crimson (#9B1B30 / #E85A6B) | Domain: AI-assisted source-to-standard concept mapping | Technology: RAG pipeline + LLM reasoning

Ariadne gave Theseus a ball of thread to navigate the Labyrinth and slay the Minotaur. In Parthenon, she gives data engineers a thread through the labyrinth of source-to-standard vocabulary mapping — arguably the most labor-intensive step in any OMOP ETL pipeline.

The Problem She Solves

When a hospital's EHR uses the code "DM2" for Type 2 Diabetes, someone needs to map that to OMOP concept 201826 (Type 2 diabetes mellitus). When a lab system reports "GLU-F" for fasting glucose, someone needs to find LOINC concept 2345-7 (Glucose [Mass/volume] in Serum or Plasma). A typical ETL project involves mapping thousands of source codes, and each mapping requires domain expertise, vocabulary knowledge, and careful judgment.

How She Works

Ariadne operates as an AI mapping assistant in Parthenon's Mapping Assistant page. She combines:

Hecate's semantic search to find candidate standard concepts for each source code
Vocabulary context from concept hierarchies, relationships, and domain constraints
LLM reasoning to evaluate candidates and suggest the best mapping with a confidence score and rationale

The researcher sees a side-by-side interface: source codes on the left, Ariadne's suggestions on the right. Each suggestion includes the recommended standard concept, a confidence percentage, the mapping type (direct, lookup, transform), and a natural-language explanation of why this mapping makes sense.

What Makes Her Special

Ariadne doesn't just pattern-match strings. She understands that "BP systolic" should map to a Measurement domain concept, not a Condition. She knows that drug mappings should target RxNorm Clinical Drug concepts, not ingredient-level concepts. She respects the OMOP conventions for concept class, domain, and standard status — because she's been trained on the vocabulary structure itself.

She also learns from the mappings you accept. As you work through a mapping project, the patterns you confirm help her make better suggestions for subsequent codes. She's a tireless assistant who gets smarter as you work.

Arachne: The Weaver of the Federated Network

Color: Violet (#8B5CF6 / #A78BFA) | Domain: Federated study execution | Technology: OHDSI Arachne Central integration

Arachne was the mortal weaver who challenged Athena herself — her tapestries so perfect that the goddess transformed her into a spider, forever weaving intricate webs that connect distant points. In Parthenon, Arachne weaves a web of federated data nodes, enabling studies to execute across multiple institutions without centralizing patient data.

The Problem She Solves

The fundamental tension in multi-site clinical research: you need data from many hospitals to achieve statistical power, but you can't (and shouldn't) move patient data to a central location. HIPAA, GDPR, and institutional policies all forbid it. The traditional solution — months of IRB negotiations, data use agreements, and manual result aggregation — makes large-scale studies impractical.

How She Works

Arachne integrates with OHDSI Arachne Central, a federated execution platform. The workflow:

Study Design (Parthenon): A researcher designs their study — cohort definitions, analysis packages, outcome measures — entirely within Parthenon's study workspace.
Node Discovery (Arachne): Parthenon queries Arachne Central for available data nodes — institutions that have registered their OMOP CDM databases and agreed to participate in federated analyses.
Distribution (Arachne): With one click, the researcher distributes their analysis package to selected nodes. Arachne Central handles authentication, package delivery, and execution coordination.
Execution (Remote): Each data node runs the analysis locally against its own OMOP CDM database. Patient-level data never leaves the institution. Only aggregate results (counts, statistics, effect estimates) are returned.
Aggregation (Parthenon): Results flow back through Arachne Central into Parthenon, where they're displayed in a unified results viewer with per-node breakdowns.

What Makes Her Special

Arachne makes the federated model invisible to the researcher. You don't need to know which hospitals are participating, what their IRB requirements are, or how to package an R script for remote execution. You design your study, click "Distribute," and watch results arrive from across the network.

The Federated Execution tab in Parthenon's study workspace shows real-time status for each node — queued, running, completed, or failed — with the ability to drill into per-node results. It transforms what used to be a months-long coordination effort into a same-day operation.

The Pantheon Together

These four engines are independent but complementary. A typical research workflow touches all of them:

Hecate helps you find the concepts you're looking for, even when you don't know the exact vocabulary terms
Phoebe helps you complete your concept set by recommending related concepts you might have missed
Ariadne helps you map your source data to the OMOP standard, so your local data is compatible with the global network
Arachne helps you execute your study across that global network, bringing federated evidence to bear on your research question

They're named after figures from Greek mythology not as a whimsical branding exercise, but because each one's mythological role maps precisely to their function in the platform. Hecate illuminates hidden paths. Phoebe prophesies connections. Ariadne provides the thread through the labyrinth. Arachne weaves the web that connects distant nodes.

Together, they make Parthenon more than a tool — they make it an intelligent research companion that understands clinical vocabularies, anticipates researcher needs, and bridges the gap between local data and global evidence.

The Magical Ladies of Parthenon are all open-source, built on OHDSI standards, and running in production at Acumenus Data Sciences. If you'd like to learn more about any of them, explore the Parthenon documentation or reach out to the team.

Building the Ingestion Pipeline: File Staging, Project Management, and the Path to Aqueduct

Thu, 26 Mar 2026 00:00:00 GMT

A massive day on the ingestion front — 87 commits landed in Parthenon today, almost entirely focused on building out a brand-new end-to-end data ingestion pipeline. We now have a fully wired system for creating ingestion projects, uploading raw files, staging them into a schema-isolated PostgreSQL environment, and handing off to Aqueduct for ETL. This has been a long time coming.

The Ingestion Pipeline: From Zero to Staged Data

The headline work today is the ingestion subsystem — a cohesive feature that takes a researcher from "I have some CSV files" to "my data is staged and ready for CDM mapping," all within the Parthenon UI.

Project Model and Access Control

Everything starts with IngestionProject — a new Eloquent model and accompanying Laravel policy (aacf41c93). Projects act as the top-level container for a researcher's raw data, tracking lifecycle state from initial creation through file upload, staging, and ultimately a ready status that unlocks downstream actions. The policy enforces ownership and role-based access from the start, ensuring researchers only see and act on their own projects.

A dedicated set of form requests and a full IngestionProjectController (f48992b5b) wire up the REST surface — create, list, show, and status-transition endpoints — all sitting behind properly scoped middleware. Notably, a follow-up fix (60bd93bf7) patched a gap where the ingestion routes were missing permission middleware entirely; that's now resolved and serves as a reminder to audit new route groups at the point of creation rather than after.

Queue-Based File Staging

The core of the pipeline is StageFileJob (58ed82726), a queued Laravel job that handles the heavy lifting of getting uploaded files into a usable database structure. Each file gets dispatched independently, meaning multi-file uploads process in parallel without blocking the UI. The job hands off to StagingService (28797e458), which is responsible for:

Schema creation: Each ingestion project gets its own isolated PostgreSQL schema, preventing cross-project data bleed during the staging phase.
Data loading: Reads uploaded files and bulk-loads rows into the staging schema, handling type inference at the column level.

Alongside staging, we introduced a column and table name sanitizer (aacf41c93) that handles the unglamorous but critical job of cleaning arbitrary user-supplied headers into valid SQL identifiers. It handles reserved word collisions, strips illegal characters, and deduplicates columns — exactly the kind of defensive logic that prevents subtle downstream failures when researchers upload files with headers like "order", "select", or "patient id (v2) [final]".

Frontend: Project List, Detail View, and Multi-File Upload

The UI side kept pace with the backend. New React hooks and API bindings (01a657dd0) wrap all the ingestion endpoints, and a project list component gives researchers a dashboard view of their active and completed ingestion projects. The Upload Files tab was restructured (a7b2c59d4) to support multi-file selection with per-file status indicators — upload progress, staging status, and any errors surface inline rather than in a toast that disappears.

The project detail view is the centrepiece here: it shows project metadata, file status, and — once the project reaches ready — an Open in Aqueduct button.

Auto-Creation and Aqueduct Handoff

Two commits tie the lifecycle together neatly. When a project transitions to ready status (all files staged without error), the system automatically creates a staging Source record (e0efbb89b) — the entity that Aqueduct uses to know where to pull data from. No manual configuration step required.

The Open in Aqueduct button (fbea80b04) then deep-links into Aqueduct with that source pre-selected, dropping the researcher directly into the ETL mapping workflow with their data already wired up. This is the kind of cross-tool integration that makes the platform feel like a platform rather than a collection of loosely related tools.

On the Horizon: Abby 2.0 Phase 3

While today's work was all ingestion, the devlog notes from last week signal what's coming next on the AI side. Abby 2.0 Phase 3 — the Semantic Knowledge Graph — is in active planning. The design calls for a KnowledgeGraphService that traverses concept_ancestor and concept_relationship tables with Redis-backed caching, paired with a DataProfileService that builds a living coverage profile of the institution's CDM: temporal range, domain density, vocabulary completeness, and proactive gap warnings.

The goal is to give Abby genuine relational understanding of clinical concepts — so when a researcher asks about a condition with thin data at this institution, she warns them before they build a cohort on a foundation of 12 patients. That work will touch ai/app/knowledge/, the live context pipeline in chroma/live_context.py, and the context assembler. Expect those commits to start landing soon.

What's Next

Ingestion error handling: Surface per-row staging errors back to the UI, and define retry semantics for StageFileJob on transient failures.
Schema lifecycle management: Staged schemas need a cleanup path — either on project deletion or after successful CDM load in Aqueduct.
Abby Phase 3 kickoff: KnowledgeGraphService and DataProfileService implementation, starting with the OMOP hierarchy traversal and Redis caching layer.
Staging source permissions: Review whether auto-created Sources inherit project-level ACLs correctly or need explicit permission wiring.

Solid day. The ingestion pipeline has been a missing piece for researchers who want to bring their own data into the platform without going through a manual DBA-assisted ETL setup. Today's work makes that self-service path real.

Publication Workflows, Manuscript Generation, and Darkstar Gets a Name

Thu, 26 Mar 2026 00:00:00 GMT

A massive day on Parthenon with 193 commits landing across the platform. The headlining work: a near-complete publication/manuscript workflow that takes study analyses all the way to a formatted, auto-numbered document preview, plus a long-overdue rename of the R Analytics Runtime to Darkstar — the name it's been running under in Docker all along.

Publication Workflow: From Study Results to Manuscript

The most substantial feature push today was on the publish module, which is rapidly becoming a first-class citizen in the Parthenon platform. The goal is to let researchers go from completed study analyses directly to a publication-ready manuscript — without leaving the platform.

Manuscript Structure Overhaul

The section editor previously organized content around analysis types (cohort, characterization, PLP, etc.). That framing made sense from an engineering perspective but doesn't match how manuscripts are actually written. Today's refactor (b7411cd78) replaced that structure with a research-question-driven manuscript layout — Introduction, Methods, Results, Discussion — which is how journals and regulatory submissions expect content to be organized.

This is a subtle but important shift: the platform now speaks the language of the researcher, not the pipeline.

Element Toggles and Section Configurability

Two commits (2efc99095, 94bf9eb15) wired up the full toggle system between DocumentConfigurator and SectionEditor. Each section can now independently show or hide tables, narrative text, and diagrams. The configurator acts as the source of truth, propagating toggle state down to the section editors — a clean unidirectional data flow that should make this easy to extend as more element types are added.

ResultsTable Component

A new ResultsTable component (c2406012b) handles publication-style rendering of analysis results — think formatted cells, appropriate significant figures, and layout that maps to what you'd see in a journal table. Crucially, tables and figures in the preview are now auto-numbered (8a85a80e6), so Table 1, Table 2, Figure 1, etc. update dynamically as sections are toggled on or off. Anyone who's manually renumbered tables in a Word document at midnight before a submission deadline knows why this matters.

Analysis Picker Improvements

The analysis picker (c2406012b) gained two quality-of-life improvements: a Select All per study checkbox, and automatic pre-selection of the studyId when navigating to the publish page from a specific study. The latter pairs with a new Generate Manuscript button added to the Studies page (f208b2e52) — one click takes you to the publish workflow with your study already in context.

Narrative Generation and Bug Fixes

Two fix commits (dc4d19e05, 3b4f21103) addressed real issues surfacing during end-to-end testing of the publish workflow:

Study analyses now load with their associated executions, which is required for the publish workflow to have the data it needs to generate content.
Narrative generation is now properly wired end-to-end, 95% confidence intervals are included in result summaries, unlisted analysis types are handled gracefully, and several test failures introduced during the refactor were resolved.

These aren't glamorous fixes, but they're the difference between a feature that demos well and one that actually works.

Darkstar: The R Analytics Runtime Gets Its Name

The R Analytics Runtime has been called "Darkstar" in Docker configurations for a while, but the System Health admin UI and backend were still referring to it as r or "R Analytics Runtime." Today's work (b3a265ecb and associated devlog) brought everything into alignment.

Backend and API

SystemHealthController.php now uses the service key darkstar (matching the Docker service name) and the display name "Darkstar." The health card message is more informative too — instead of a generic status, it now shows something like "R 4.4.2, 20 HADES packages loaded" at a glance.

The getDarkstarMetrics() method replaces the old getRMetrics() and returns structured package version groups alongside runtime diagnostics (memory usage, JVM status, JDBC connectivity). On the R side, darkstar/api/health.R bumped to version 0.3.0 and now enumerates 20 OHDSI HADES packages and 12 Posit/CRAN infrastructure packages using utils::packageVersion() with per-package error handling — so a missing package surfaces cleanly rather than crashing the health endpoint.

Frontend: DarkstarPackagesPanel

The ServiceDetailPage.tsx component gained a new DarkstarPackagesPanel that renders both package groups as 4-column grids showing package name and installed version. The panel is intentionally excluded from the generic nested metrics renderer to avoid double-rendering, while flat metrics (R version, uptime, memory, JVM/JDBC) continue to display in the standard Metrics section.

For anyone debugging environment drift between deployments — "why is CohortMethod 5.2.1 on prod but 5.3.0 on staging?" — having this surfaced directly in the admin UI is a meaningful operational improvement.

OHDSI HADES packages tracked: SqlRender, DatabaseConnector, Andromeda, Cyclops, FeatureExtraction, ResultModelManager, EmpiricalCalibration, ParallelLogger, CohortMethod, PatientLevelPrediction, SelfControlledCaseSeries, EvidenceSynthesis, CohortGenerator, CohortDiagnostics, DeepPatientLevelPrediction, CohortIncidence, Characterization, Strategus, and more.

What's Next

The publish workflow is close to a functional end-to-end demo — the remaining gaps are around export (PDF/DOCX rendering) and integrating narrative generation with live analysis results rather than mocked data. That's the next frontier.

On the Darkstar side, the package version display is a foundation for something more useful: version pinning, environment validation, and potentially automated alerts when package versions drift from a known-good baseline. The data is now there; the tooling around it can follow.

It was a good day to be building outcomes research infrastructure.

The Arrival of Ares to Parthenon

Wed, 25 Mar 2026 00:00:00 GMT

If you've worked in the OHDSI ecosystem, you know the pain: Atlas for cohort definitions, Achilles Results Viewer for characterization, a DQD dashboard for data quality, spreadsheets for feasibility assessments, and a prayer that everyone's looking at the same release of the same data. Ares changes that. Today we're announcing Ares v2 — Parthenon's network-level data observatory — a single unified module that replaces the fragmented constellation of OHDSI data characterization tools with 10 purpose-built analytical panels, 60+ API endpoints, and a clinical UI designed for researchers who need answers, not workarounds.

This is the biggest feature release in Parthenon's history.

What Ares Replaces

To appreciate what Ares does, consider what a typical OHDSI site coordinator juggles today:

Atlas + WebAPI for browsing data source reports and Achilles results
Achilles Results Viewer (an R Shiny app) for characterization dashboards
DQD Dashboard (another Shiny app, or raw CSVs) for data quality trending
Custom R scripts for cross-source comparison of concept prevalence
Spreadsheets for tracking which sources have which domains, when they were last refreshed, and whether they're suitable for a given study
Email threads for annotating data events and coordinating between data stewards and researchers
No tooling at all for cost analytics, diversity assessments, or FDA Diversity Action Plan compliance

Each tool has its own authentication, its own data model, its own release cycle, and its own way of defining "source." Ares collapses all of this into a single tab within Parthenon's Data Explorer, backed by the same PostgreSQL database, the same RBAC system, and the same API infrastructure that powers every other module.

The 10 Panels

Ares is organized as a hub with 10 analytical panels, each addressing a distinct research operations question. Here's what we built and why.

1. Network Overview — Situational Awareness in 5 Seconds

The first thing a data coordinator needs every morning is a status board. Network Overview provides exactly that: one row per data source, with DQ trend sparklines, freshness indicators (color-coded with STALE badges for sources >30 days without a refresh), domain coverage rings, and person counts. An auto-generated alert banner surfaces the three most common operational emergencies — DQ score drops >5%, stale data, and unmapped code spikes — before you even start looking.

The DQ Radar toggle overlays Kahn framework dimensions (completeness, conformance value, conformance relational, plausibility atemporal, plausibility temporal) as a radar chart per source. Comparing radar "shapes" across sources immediately reveals dimensional weaknesses that aggregate scores hide. A source with 95% overall DQ but 40% plausibility temporal has a very different problem than one with 85% across all dimensions evenly.

2. Concept Comparison — The Question Every Network Study Starts With

"How prevalent is Type 2 Diabetes across our network?" is the single most common question in OHDSI network research. Concept Comparison answers it with four view modes:

Single Concept: Bar chart showing rate per 1,000 persons across all sources, with confidence interval error bars
Multi-Concept: Grouped bar chart comparing 2-5 concepts side-by-side
Attrition Funnel: TriNetX-style horizontal funnel showing patient attrition as criteria are layered
Temporal: Line chart tracking prevalence across releases over time

The killer feature here is the Crude / Age-Sex Adjusted toggle. Comparing a pediatric hospital's diabetes rate against a Medicare claims database using crude rates is meaningless — the demographics are completely different. When you toggle to age-sex standardized rates (using the US Census 2020 reference population), the comparisons become valid. A footnote documents the standardization method for reproducibility.

We also added CDC Benchmark Lines — when national prevalence data is available, a dashed reference line shows where each source sits relative to the expected rate. And you can compare entire Concept Sets, not just individual concepts — "all T2DM medications" across the network in one chart.

3. DQ History — Quality is a Trajectory, Not a Snapshot

A DQ score at a single point in time tells you almost nothing. Was it always this bad? Did it get worse after the last ETL? Did someone fix the completeness issues from Q3?

DQ History tracks quality over time with four tabs:

Trends: Line chart of overall DQ pass rate per release, with background zones (green >90%, amber 80-90%, red <80%). Click any release point to open a delta table showing every check that changed status.
Heatmap: Category-by-release grid, color-coded by pass rate. Instantly spot which quality categories are degrading over time.
Cross-Source: Overlay DQ trend lines from multiple sources on one chart for direct comparison.
SLA: Admin-only view where data stewards set minimum pass rate targets per DQ category. Compliance bars show actual vs. target with error budget remaining — like an SRE error budget, but for data quality.

Each DQ check also gets its own 6-point sparkline showing its individual pass/fail history. Annotations from team members appear as markers on the trend chart, providing institutional context for data events.

4. Coverage Matrix — What Data Do You Actually Have?

The coverage matrix is a domain-by-source grid that answers the most fundamental question in study design: does this source have the data I need?

Three view modes (record counts, per-person density, and temporal date ranges) give different perspectives. The Expected vs. Actual toggle is particularly powerful — it compares what domains a source type (claims vs. EHR vs. registry) should have against what's actually present, flagging gaps as MISS and unexpected domains as BONUS.

The observation_period column gets a gold accent border because it's the single most important domain for study design — everything downstream depends on it.

5. Feasibility — Can Your Network Support This Study?

Feasibility assessment is where Ares goes from descriptive to prescriptive. Define your study criteria — required domains, concepts, visit types, date ranges, minimum patient count — and Ares evaluates every source against them.

Results include per-criterion scores with weighted composite scoring (domain 20%, concept 30%, visit 15%, date 15%, patient 20%) and a clear ELIGIBLE/INELIGIBLE verdict. But the real value is in the Impact Analysis waterfall chart, which shows which single criterion eliminates the most sources. When you need to relax a constraint to reach your enrollment target, this tells you which constraint to relax.

The CONSORT Flow diagram visualizes progressive source exclusion through each criterion gate — the same format used in clinical trial publications, now applied to site selection.

And for sources that pass feasibility, the Patient Arrival Forecast projects monthly patient accrual with confidence intervals, showing when you'll reach your target enrollment. It's the difference between "this source is eligible" and "this source will get you 500 patients by September."

Criteria sets can be saved as templates and shared across the research team — define your study's requirements once, reuse them as the network evolves.

6. Diversity — FDA Diversity Action Plans Built In

The FDA's 2024 Diversity Action Plan guidance fundamentally changed clinical trial enrollment. Sites now need to demonstrate — quantitatively — that their data sources represent diverse populations. Ares provides this out of the box.

The Overview tab shows Simpson's Diversity Index per source (0-1 scale, higher = more diverse), with gender/race/ethnicity breakdowns and benchmark overlay lines. The DAP Gap tab lets you set enrollment targets by demographic dimension and see which sources meet or miss them in a red/green matrix.

The Geographic tab goes deeper: state-level distribution bars, number of states covered, and — critically — an Area Deprivation Index (ADI) histogram showing socioeconomic representation. A network that covers 30 states but only draws from affluent ZIP codes isn't truly diverse. The ADI data quantifies this.

Pooled view lets you select multiple sources and see combined demographics across the pooled population — essential for multi-site study planning.

7. Releases — Version Control for Data

Every ETL run produces a new release of a data source. Ares tracks these with per-source release cards showing CDM version, vocabulary version, ETL version, and notes. Each card has an expandable diff panel showing what changed: person count deltas, record count deltas, DQ score changes, vocabulary version changes, and domain-level deltas.

The Swimlane timeline puts all sources on one horizontal axis with release dots positioned by date — immediately revealing which sources are updated regularly and which are falling behind. The Calendar view (GitHub contributions-style heatmap) shows release density by day across the network.

ETL provenance metadata — who ran it, what code version, how long it took — is captured when available, providing an audit trail for regulatory and reproducibility purposes.

8. Unmapped Codes — AI-Assisted Vocabulary Remediation

Unmapped source codes are the single biggest data quality problem in OMOP CDM implementations. Ares prioritizes them using an impact score (record count multiplied by domain weight — condition codes weighted 1.0, drug 0.9, procedure 0.8) so you focus mapping effort where it matters most.

The Pareto chart demonstrates the 80/20 rule visually: the top 20 unmapped codes typically account for 80%+ of all unmapped records. The Treemap view shows unmapped codes grouped by vocabulary, revealing whether the problem is concentrated in a single vocabulary or spread across many.

The standout feature is AI Mapping Suggestions: expand any unmapped code row to see the top 5 standard concept suggestions ranked by confidence (0-100%), powered by pgvector concept embedding similarity. Click Accept to stage a mapping — it doesn't write to the CDM directly; an admin must promote approved mappings. This is the same AI mapping infrastructure that powers Parthenon's Aqueduct ETL module, now integrated directly into the data quality workflow.

Export in Usagi-compatible CSV format means teams using OHDSI's standard mapping tool can seamlessly integrate Ares's prioritized list into their existing workflows.

9. Annotations — Institutional Memory for Data Events

Data events happen constantly: ETL runs complete, schema changes deploy, quality scores drop, researchers discover unexpected patterns. Without a structured way to capture this context, institutional knowledge lives in email threads and Slack messages that nobody can find six months later.

Ares Annotations provides a structured note system with four tag types:

Data Event (teal) — something happened in the data
Research Note (gold) — researcher observation or insight
Action Item (crimson) — something that needs to be done
System (indigo) — auto-generated by the platform

Annotations support threaded discussions (one level of nesting) for data steward-to-researcher conversations, and can be created directly from chart interactions — click a data point on a DQ trend chart and add context without leaving the visualization.

10. Cost Analysis — Healthcare Economics at Network Scale

Cost data in OMOP CDM is notoriously tricky. The cost table contains multiple cost types (charged, paid, allowed) that can differ by 3-10x, and mixing them in the same analysis is the #1 cost study error. Ares addresses this head-on with a cost type filter that applies globally across all cost views, with an amber warning banner when multiple types exist.

Six tabs cover the full cost analytics workflow: summary cards with Per Patient Per Year (PPPY) metrics, box-and-whisker distributions per domain (revealing the skewness that averages hide), care setting breakdowns, cross-source comparisons, top cost driver concepts, and monthly trends.

60+ API Endpoints, One Authentication Layer

Every panel is backed by a RESTful API under /api/v1/, split into network-scoped endpoints (cross-source analytics) and source-scoped endpoints (per-source detail). All endpoints require Sanctum authentication and RBAC permission checks — no public access to clinical data characterization.

Network-scoped endpoints include:

GET  /network/ares/overview              — Network health KPIs
GET  /network/ares/alerts                — Auto-generated alerts
GET  /network/ares/compare               — Single concept prevalence
GET  /network/ares/compare/standardized  — Age-sex adjusted rates
GET  /network/ares/coverage              — Domain x source matrix
GET  /network/ares/diversity             — Demographics + Simpson's index
GET  /network/ares/diversity/geographic  — State distribution + ADI
POST /network/ares/diversity/dap-check   — FDA DAP gap analysis
POST /network/ares/feasibility           — Run assessment
GET  /network/ares/cost/compare          — Cross-source cost comparison

Source-scoped endpoints cover DQ history, unmapped codes with AI suggestions, cost analytics, release management, annotations, and more — over 30 endpoints per source.

Rate-limited (throttled) endpoints protect computationally expensive operations like age-sex standardization, concept set comparisons, and patient arrival forecasts.

Role-Based Access

Ares respects Parthenon's RBAC hierarchy:

Capability	Viewer	Researcher	Data Steward	Admin
View all panels	Yes	Yes	Yes	Yes
Run feasibility assessments	-	Yes	Yes	Yes
Create annotations	-	Yes	Yes	Yes
Accept AI mapping suggestions	-	-	Yes	Yes
Set DQ SLA targets	-	-	Yes	Yes
Promote mappings to CDM	-	-	-	Yes

New users get viewer role by default — they can see everything but can't modify anything. This follows Parthenon's principle of least privilege.

Why "Ares"?

In Greek mythology, Ares is the god of war — but also of courage, strategy, and the willingness to confront hard truths. In OHDSI, data characterization is exactly that: confronting the hard truths about your data before you bet a clinical study on it. A network overview that hides quality problems isn't helping anyone. A feasibility assessment that ignores demographic bias produces misleading results. Ares doesn't sugarcoat — it shows you the DQ radar with its lopsided dimensions, the unmapped codes with their Pareto distribution, the diversity gaps with their ADI histograms.

The name also fits architecturally. In the Parthenon — both the building and the platform — Ares stands alongside Athena (wisdom, represented by our Abby AI assistant), Apollo (prediction, represented by the analytics engine), and Asclepius (healing, represented by the clinical data model). Each deity governs a domain. Ares governs the hard operational truths that make everything else possible.

What This Means for the OHDSI Community

Ares v2 in Parthenon represents something that hasn't existed in the OHDSI ecosystem before: a unified, multi-source data observatory with modern web UI, AI-assisted mapping, standardized rate comparisons, feasibility assessment with arrival forecasting, FDA DAP compliance checking, cost analytics, and institutional annotation — all in one authenticated application with role-based access control.

The individual capabilities aren't new to the community. Achilles has characterized data for years. DQD has tracked quality. Atlas has browsed results. What's new is having all of it in one place, backed by a single API, with cross-source analytics that work at network scale rather than one-source-at-a-time.

For network study coordinators: you no longer need five tools and three spreadsheets to answer "which sites should participate in this study."

For data stewards: you can track quality trajectories, set SLA targets, and monitor unmapped code remediation in the same interface where researchers browse characterization results.

For researchers: feasibility assessment with patient arrival forecasting means you can make quantitative enrollment projections, not just "this source has enough patients."

For compliance teams: FDA Diversity Action Plan gap analysis is built in, with geographic and socioeconomic diversity metrics that go beyond simple demographic breakdowns.

What's Next

Ares v2 ships with the full 10-panel suite, but there's more on the roadmap:

Automated quality alerting — Push notifications (email, Slack) when DQ scores drop below SLA targets or sources go stale
Federated Ares — Cross-institution characterization without moving data, leveraging Parthenon's federated study framework
Longitudinal concept tracking — Automated detection of concept prevalence anomalies (sudden spikes or drops that may indicate coding practice changes or ETL errors)
Cost modeling — Predictive cost modeling for study budgeting based on historical cost distributions and enrollment projections

Ares is live now at parthenon.acumenus.net under Data Explorer > Ares. Log in, click the tab, and see your network's data like you've never seen it before.

Ares v2 was developed as part of Parthenon's mission to replace the fragmented OHDSI tool ecosystem with a single, unified platform for outcomes research. For questions, feedback, or feature requests, reach out to the Acumenus team.

Achilles Reliability Hardening: A Big Day for OHDSI Analytics

Wed, 25 Mar 2026 00:00:00 GMT

Today was one of those satisfying days where two major workstreams converged: we pushed the Ares data quality module from skeleton to a fully featured analytics suite with four distinct intelligence phases, and we permanently fixed a cluster of compounding bugs that had been making Achilles characterization runs fragile on large real-world datasets. Both efforts move Parthenon meaningfully closer to being a production-grade OHDSI research platform.

Ares Parity+ Milestone: From Stub to Suite

The headline work today was the completion of the Ares Parity+ milestone — a multi-phase build that brings Ares data quality analytics into Parthenon as a first-class citizen. The full design spec and devlog were committed alongside the code (docs(ares): add devlog and design specs), so future contributors have a clear paper trail for every architectural decision.

Backend Foundation

The backend work started with AresController, wiring up release and annotation API routes, and the ares:backfill-releases Artisan command for migrating legacy release data into the new schema. These two pieces together mean Parthenon can ingest historical Ares output and track new releases going forward without any manual data surgery.

Frontend Shell: Hub Dashboard, Releases & Annotations

The first frontend phase (feat: add Ares tab frontend) established the hub dashboard, releases list, and annotations views. This is the scaffolding everything else hangs off — a consistent navigation frame and data-loading pattern that the subsequent phase components slot into cleanly.

Phase 2 — Quality Intelligence

Phase 2 (feat(ares): implement Phase 2 Quality Intelligence) delivers the analytical meat that Ares users expect: DQ history trending, unmapped source codes exploration, and domain continuity checks. These views surface the data quality signals that are otherwise buried in raw Ares JSON exports, making them actionable directly inside Parthenon rather than requiring a separate Ares UI instance.

Phase 3 — Network Intelligence

Phase 3 (feat(ares): implement Phase 3 Network Intelligence) adds the collaborative research layer: site comparison, population coverage metrics, demographic diversity analysis, and feasibility assessment. This is particularly valuable for multi-site OHDSI network studies where understanding which sites have sufficient data for a given research question is half the battle.

Phase 4 — Cost Analysis

Phase 4 (feat(ares): implement Phase 4 Cost Analysis) rounds out the milestone with CostService, dedicated cost endpoints, and CostView. The hub skeletons are in place for further expansion. Healthcare cost data is notoriously messy in CDM mappings, so having a dedicated analysis surface for it — rather than treating it as just another domain — reflects how researchers actually use it.

CI Cleanup

The Ares build also came with a round of CI fixes: a recharts Tooltip formatter cast to any for strict TypeScript compatibility, PHPStan and TypeScript error resolution across Ares components, and a Pint auto-fix pass that also removed a stale AchillesRunSummary import and corrected react-joyride export references. These aren't glamorous commits, but a green CI pipeline is what lets us ship confidently.

Achilles Engine Reliability Hardening

Separately, the devlog for Phase 14 — Achilles Engine Reliability Hardening documents the root cause analysis and fixes for a set of compounding bugs that had made every characterization run on the SynPUF dataset (source 47, ~100M+ row measurement table) fragile. Smaller datasets like Eunomia never surfaced these issues, which is exactly why production-scale testing matters.

Four bugs were identified and fixed:

Bug 1 (the killer): Non-resumable retries. RunAchillesJob used AchillesRun::create(), which hit a unique constraint on retry after a timeout. Replaced with AchillesRun::updateOrCreate() — the job is now fully idempotent across retry attempts.

Bug 2: Timeout too short. The 1-hour timeout ($timeout = 3600) was simply not enough — analysis 1811 alone (measurement records by concept by year-month) takes ~116 minutes on SynPUF. Bumped to 3 hours ($timeout = 10800), with $tries increased to 3 and a 30-second backoff.

Bug 3: No analysis-level resume. A run that completed 111 of 127 analyses and then died would restart from analysis 1 on retry, throwing away up to 175 minutes of completed work. The fix adds resume capability to AchillesEngineService so restarts pick up where they left off.

Bug 4: Zombie "running" status. Without a failed() method on RunAchillesJob, any failed run stayed in status=running indefinitely. The UI showed perpetually active jobs with no recovery path. The new failed() handler marks runs as failed with a timestamp, restoring operator visibility.

Worth noting: status remains excluded from $fillable per HIGHSEC 3.1 — all status transitions go through explicit update() calls, not mass assignment. The reliability improvements don't compromise that security invariant.

What's Next

With the Ares Parity+ milestone shipped, the immediate priority is integration testing across all four phases against a real Ares output directory — particularly the network comparison views, which depend on multi-source data being present. We'll also be looking at paginating the cost endpoint responses as cost data can be voluminous.

On the Achilles side, the next step is validating the resume logic under controlled timeout conditions in a staging environment before we consider SynPUF source 47 fully unblocked. Once that's confirmed stable, we can look at parallelizing the slower analyses (1811 in particular) to bring total characterization time down to something more reasonable for routine use.

It was a dense day, but the platform is measurably more capable and more reliable for it.

Full HADES Parity: Parthenon Now Supports All 12 OHDSI Database Dialects

Wed, 25 Mar 2026 00:00:00 GMT

One of OHDSI's greatest strengths is database agnosticism. The HADES ecosystem — via SqlRender and DatabaseConnector — lets researchers write analyses once and run them against SQL Server, PostgreSQL, Oracle, Snowflake, BigQuery, and seven other platforms without modification. Today, Parthenon achieved full parity with that capability: all 12 HADES-supported database dialects are now covered across both the PHP SQL translator and the R runtime.

Why This Matters

OMOP CDM databases live everywhere. Academic medical centers often run Oracle or SQL Server. Cloud-native organizations are increasingly moving to Snowflake or BigQuery. Federated networks span multiple database platforms simultaneously. If you're building a platform that replaces Atlas and WebAPI, you can't afford to be PostgreSQL-only in your SQL rendering — even if your internal database is PostgreSQL.

Parthenon has always used PostgreSQL as its production database, but the SQL translation layer is critical for two capabilities:

Query Library rendering — OHDSI's standard SQL templates are written in T-SQL (SQL Server syntax). When a researcher executes a query from the library, it gets translated to the target source's dialect at render time.
Federated analysis — Each Source in Parthenon can point to a different database with its own dialect. A study might pull cohorts from a local PostgreSQL CDM, run against a collaborator's Snowflake warehouse, and compare with results from an Oracle-backed registry. The HadesBridgeService handles the connection abstraction; the SQL translator handles the syntax.

The 12 Dialects

OHDSI's SqlRender package (the canonical R/Java SQL translation layer) supports these 12 database platforms:

#	Dialect	SQL Family	Typical Deployment
1	SQL Server	T-SQL (canonical source)	Enterprise on-prem, Azure SQL
2	PostgreSQL	ANSI SQL	Academic, cloud, Parthenon internal
3	Oracle	PL/SQL	Large health systems, pharma
4	Redshift	PostgreSQL variant	AWS data warehouses
5	Snowflake	ANSI SQL variant	Cloud analytics
6	BigQuery	GoogleSQL	Google Cloud OMOP deployments
7	Azure Synapse	T-SQL variant	Microsoft cloud OLAP
8	Spark / Databricks	SparkSQL	Big data / lakehouse
9	Apache Hive	HiveQL	Hadoop ecosystems
10	Apache Impala	Impala SQL	Hadoop real-time queries
11	IBM Netezza	PostgreSQL variant	Enterprise data warehouses
12	DuckDB	PostgreSQL variant	Embedded analytics, local dev

Parthenon's Two Translation Layers

Parthenon translates OHDSI SQL in two places, each serving a different part of the stack:

PHP: `OhdsiSqlTranslator`

The PHP translator (backend/app/Services/SqlRenderer/OhdsiSqlTranslator.php) handles server-side SQL rendering for the Query Library, Achilles analysis templates, and any custom SQL that needs to target a non-PostgreSQL source. It converts T-SQL constructs — DATEADD, DATEDIFF, GETDATE(), CHARINDEX, LEN, ISNULL, COUNT_BIG, CONVERT, TOP N, DATEFROMPARTS — into dialect-appropriate equivalents.

The translation groups dialects by SQL family:

PostgreSQL family (PostgreSQL, Redshift, Netezza, DuckDB) — INTERVAL arithmetic, EXTRACT, POSITION, COALESCE, LIMIT
Oracle — ADD_MONTHS, MONTHS_BETWEEN, TRUNC(SYSDATE), FETCH FIRST N ROWS ONLY
BigQuery — DATE_ADD/DATE_DIFF with interval syntax, CURRENT_DATE()
Snowflake — Native DATEADD/DATEDIFF (same names, different argument order from T-SQL)
Spark family (Spark, Hive, Impala) — DATE_ADD with interval syntax
T-SQL family (SQL Server, Synapse) — pass-through (canonical format)

R Runtime: `connection.R`

The Darkstar R runtime (r-runtime/R/connection.R) wraps OHDSI's DatabaseConnector package, which handles JDBC connections to all supported platforms. When Parthenon dispatches a HADES analysis (CohortMethod, PatientLevelPrediction, SCCS), the HadesBridgeService translates the Source model into a connection spec that the R runtime uses to create a DatabaseConnector::connectionDetails object. SqlRender handles the SQL translation natively within R.

Adding DuckDB: A Three-Line Change

The gap we closed today was DuckDB — supported in the R runtime's DatabaseConnector but missing from the PHP translator. The fix was anticlimactic:

// In the match expression:
'duckdb' => $this->toPostgresql($sql),

// In the supported dialects list:
'duckdb',

DuckDB's SQL dialect is effectively PostgreSQL-compatible. It supports EXTRACT, CURRENT_DATE, INTERVAL arithmetic, LIMIT, COALESCE, LENGTH, POSITION, and CAST — all the constructs our PostgreSQL translator already handles. No new translation methods, no edge cases, no special handling.

This is by design. DuckDB was built as an embeddable analytical database with a familiar SQL interface. For OHDSI use cases — particularly local development, testing, and lightweight CDM exploration — DuckDB is an excellent option: it runs in-process, requires no server, and handles analytical workloads efficiently.

Dialect Coverage Matrix

Here's the final state of dialect coverage across Parthenon's stack:

Dialect	PHP Translator	R Runtime	Source UI	Status
PostgreSQL	Yes	Yes	Yes	Production-tested
SQL Server	Yes	Yes	Yes	Translated, untested at scale
Oracle	Yes	Yes	Yes	Translated, untested at scale
Redshift	Yes	Yes	Yes	Translated, untested at scale
Snowflake	Yes	Yes	Yes	Translated, untested at scale
BigQuery	Yes	Yes	Yes	Translated, untested at scale
Synapse	Yes	Yes	Yes	Pass-through (T-SQL)
Spark	Yes	Yes	Yes	Translated, untested at scale
Hive	Yes	Yes	Yes	Translated, untested at scale
Impala	Yes	Yes	Yes	Translated, untested at scale
Netezza	Yes	Yes	Yes	Translated, untested at scale
DuckDB	Yes	Yes	Yes	New today

What's Next

Full dialect coverage is table stakes for OHDSI platform interoperability, but coverage and correctness are different things. The next steps are:

Integration testing — We need to validate the PHP translator against real CDM queries on at least SQL Server and Oracle, the two most common non-PostgreSQL OMOP deployments in clinical research networks.
Federated study execution — With the connection plumbing in place, the goal is to demonstrate a study that federates across two different database platforms within Parthenon's study execution framework.
DuckDB for local development — DuckDB could replace the PostgreSQL dependency for developers who want to run Parthenon locally without a full database server. A lightweight CDM loader that writes to a DuckDB file would dramatically simplify onboarding.

The OHDSI ecosystem's commitment to database agnosticism is one of its strongest differentiators. Parthenon now fully inherits that capability — 12 dialects, two translation layers, one unified research platform.

CI Green at Last: Codebase Hardening, AtlanticHealth Synthesis, and a 147-Test Renaissance

Sun, 22 Mar 2026 00:00:00 GMT

After months of a perpetually red CI pipeline, today marks a turning point for Parthenon: 92 commits, a full-spectrum codebase review, a complete AtlanticHealth patient synthesis pipeline, and — most satisfying of all — every CI job green. Here's how we got there.

The CI Pipeline Was Never Green (Until Today)

The most impactful work today was a ~6-hour, five-phase codebase hardening sprint that touched virtually every layer of the stack. The starting state was grim: CI failing on every push, 6% test coverage, and four files well past their size limits. The ending state: all six CI jobs passing, 147 new tests written, and a documented methodology for keeping things that way.

The failure modes were stacking and masking each other, which made the pipeline feel intractable. Once we untangled them, the root causes were addressable one by one:

37 TypeScript errors in the investigation module — mostly Lucide icon casting issues, incorrect property access on PaginatedResponse (.data vs .items), and useRef strict mode violations. Fixed with proper LucideProps typing and a pass to remove dead code.
80+ Pint code style violations — Pint 1.29 quietly introduced the fully_qualified_strict_types rule. We resolved these by running auto-format through a Docker Pint container pinned to the same version as CI, ensuring parity. The final straggler — a single_quote and unary_operator violation in MorpheusPatientService — was cleaned up in commit 7ad77af.
11 PHPStan errors outside the baseline — caused by the strict_types changes shuffling what PHPStan was tracking. Regenerated the baseline (33 → 31 known errors) and committed it cleanly.
6 Python test failures — the FastAPI app was still using the deprecated @app.on_event("startup") pattern. Migrated to the modern lifespan context manager.
CI database schema mismatches — the CI environment was still referencing legacy schema names (vocab, cdm, achilles_results) instead of the current ones (omop, results, gis). A PostGIS extension failure was also aborting migration transactions mid-run.

The fix methodology is now codified as an internal ADR so future contributors have a clear playbook when CI goes red.

AtlanticHealth Synthesis Pipeline: 3,250 Patients, MIMIC-Standard

On the data generation side, we shipped a complete AtlanticHealth synthesis pipeline today. The headline: 3,250 synthetic patients with full MIMIC-standard data, generated end-to-end through a multi-phase pipeline.

Phases 4–7 were added to cover the full clinical picture: procedure events, microbiology results, and input/output events. Earlier phases handle the patient cohort, admissions, and diagnoses. The result is a realistic, MIMIC-schema-compatible dataset sourced from AtlanticHealth's structure — which required adapting the labevents, chartevents, and transfers queries to match AtlanticHealth's actual schema (commit c5f05e83).

We also cleaned up \\N bulk-import artifacts left over from PostgreSQL COPY operations on AtlanticHealth source data (commit 37b871063). These null-sentinel strings were leaking into text fields and causing downstream parsing issues — a subtle bug that would have been painful to debug later in the OMOP conversion layer.

This synthetic dataset is foundational: it gives us a realistic, large-scale cohort for testing the Morpheus ETL pipeline without touching any real patient data.

Morpheus UX: Dataset Parameter Persistence

A smaller but user-facing fix worth calling out: the dataset query parameter was being dropped when users switched tabs or clicked breadcrumb navigation inside Morpheus. This meant the UI would silently lose context, forcing users to re-select their dataset. The fix ensures the parameter is persisted through tab switches and breadcrumb navigation — a subtle but frustrating UX regression that's now resolved (commit 36222e5).

Codebase Architecture: ADRs, Docs, and Decomposition

Part of the hardening sprint involved structural improvements that won't show up in feature metrics but matter enormously for maintainability:

8 Architecture Decision Records (ADRs) written, covering decisions that were previously implicit or tribal knowledge.
11 new documentation pages across five previously underdocumented modules.
4 oversized files decomposed — each was more than 3× the project's file size guideline. Breaking these apart improves testability and makes the codebase easier to navigate.
Docker hardening — the development and CI Docker configurations were reviewed and tightened.

Going from zero ADRs to eight in a single session is a significant knowledge capture moment. These documents will pay dividends the next time someone asks "why does it work this way?"

Dependency Updates

We also rolled forward several key dependencies today:

Vite 8 and plugin-react 6 — keeping the frontend build toolchain current.
Ollama 0.6 and LangChain 1 — AI integration libraries bumped to latest stable.
sentence-transformers and transformers — Python AI requirements updated.
laravel/tinker 3.0.0 — bumped from 2.11.1.

None of these are risky upgrades in isolation, but doing them together while CI is green (rather than red) makes it much easier to catch any regressions they introduce.

What's Next

With CI green and a solid synthetic dataset in hand, the immediate priorities are:

OMOP ETL validation — run the AtlanticHealth synthetic cohort through the Morpheus OMOP conversion pipeline and validate concept mapping coverage.
Test coverage growth — 147 new tests is a great start from 6%, but we want to reach a meaningful floor (targeting 40%+) before the next major feature push.
PHPStan baseline reduction — the 31 known errors in the baseline are technical debt. Now that CI is stable, we can chip away at these systematically.
Investigation module hardening — the TypeScript fixes today were correctness patches; a deeper review of the investigation module's data flow is warranted.

Today was a grind in the best sense — the kind of session where you clear out months of accumulated friction and leave the codebase meaningfully better for everyone who touches it next.

Keeping the Lights On: Documentation Sync and Daily Dev Log Infrastructure

Sun, 22 Mar 2026 00:00:00 GMT

A quieter day on the Parthenon platform — today's commits centered on documentation infrastructure rather than feature work, with automated help content synchronization and the daily development blog pipeline keeping the platform's knowledge base fresh and up to date.

Documentation Infrastructure: The Unsung Hero

It's easy to overlook documentation commits in favor of splashier feature work, but today's two commits to the Parthenon repository represent something quietly important: the machinery that keeps developers and users informed is itself being maintained and improved.

Auto-Sync Help Content (`bcb10bbf4`)

The first commit landed just before midnight on March 21st — an auto-sync of the platform's help documentation. This kind of automated synchronization is a cornerstone of keeping a living platform like Parthenon from developing the dreaded documentation drift, where the actual behavior of the system and what the docs say about it gradually diverge until they're barely recognizable as describing the same product.

For a unified OHDSI outcomes research platform serving healthcare analysts, accurate help content isn't just a nice-to-have. Researchers relying on Parthenon to configure cohort definitions, execute population-level effect estimation studies, or interpret characterization outputs need to trust that the guidance they're reading reflects the system they're actually using. An auto-sync pipeline that pulls help content in lockstep with the platform itself is a meaningful safeguard against confusion downstream.

If you're working in this area of the codebase, the auto-sync mechanism is worth understanding well. It ensures that whenever platform behavior changes — whether through a backend update, a new analytics module, or a UI workflow revision — the corresponding help text follows automatically rather than waiting for someone to remember to update it manually.

Daily Dev Blog Post (`cd6dadff6`)

The second commit adds today's daily development blog post to the repository — which is, in a pleasantly recursive way, the post you're reading right now. The dev blog pipeline is part of how the Acumenus team maintains transparency about ongoing development, giving both internal collaborators and external platform users a running narrative of what's changing and why.

Keeping this cadence going on quieter days matters just as much as on the days when major features land. A consistent record of even incremental progress — documentation updates, dependency bumps, infrastructure maintenance — tells a more honest story of how a platform like Parthenon actually evolves over time. It also creates a searchable audit trail that's proven useful more than once when tracking down when a particular behavior changed or why a certain architectural decision was made.

Why Documentation Days Matter

There's a temptation in developer culture to treat documentation work as lesser than "real" engineering. But for a platform operating in the OHDSI ecosystem — where reproducibility, transparency, and methodological rigor are foundational values — the documentation layer is part of the science, not separate from it.

Parthenon aims to be a unified environment where outcomes researchers can move from study design through execution to result interpretation without leaving the platform. Every time a help article is stale, a workflow is undocumented, or a developer blog goes dark, that unified experience frays a little. Today's commits, modest as they are, push in the right direction.

What's Next

With the documentation infrastructure ticking along reliably, the focus will shift back toward feature and platform work in the coming days. A few areas on the near-term radar:

Analytics module development — continued work on expanding Parthenon's native OHDSI study execution capabilities, with particular attention to result visualization and interpretation workflows.
Platform integrations — ongoing coordination across the broader Acumenus suite to ensure Parthenon's analytics outputs connect cleanly with companion tools.
Help content expansion — now that the auto-sync pipeline is running cleanly, there's an opportunity to invest in the content itself, filling gaps in the help documentation for newer platform features.

Quiet days are good days. The foundation stays solid, and tomorrow we build on it.

Welcome to Acropolis: One Command from Clone to Production

Sat, 21 Mar 2026 00:00:00 GMT

Eighteen Docker services. Three environment files. A reverse proxy with auto-TLS. Database admin GUI. Container management dashboard. Enterprise SSO. And if you want the full stack? One command:

python3 install.py --with-infrastructure

This is the story of how we built Acropolis — the infrastructure layer that turns Parthenon from a research application into a production platform — and what we learned when we decided to ship it inside the same repository.

Why Infrastructure Belongs in the Application

For two months, Parthenon ran in production with a manual deployment story. Apache sat in front, configured by hand. No auto-TLS — I renewed certificates manually. No container management UI — if a researcher reported a problem, I SSHed in and ran docker compose ps. No centralized log view — I grep'd through container logs one at a time.

This works when you're the only operator. It stops working the moment someone else needs to deploy it.

The OHDSI community has a deployment problem that mirrors ours. Atlas requires a WebAPI backend (Java), an R runtime, a CDM database, and a web server. Each has its own configuration. Most institutions spend weeks getting Atlas running, and many never get past the installation phase. We built Parthenon to collapse that complexity. But we'd only collapsed the application complexity — the infrastructure was still manual.

So we built Acropolis.

What Acropolis Provides

Acropolis is not a separate application. It's a production infrastructure layer that wraps Parthenon with everything an operator needs:

Layer	Service	What It Does
Reverse Proxy	Traefik v3.3	Auto-TLS via Let's Encrypt, subdomain routing for every service, HTTP→HTTPS redirect
Container Management	Portainer CE	Web GUI for Docker — restart containers, view logs, manage volumes
Database Admin	pgAdmin 4	Pre-configured with Parthenon's PostgreSQL connection
Workflow Automation	n8n	ETL pipelines, quality check automation, alerting (Enterprise)
BI Dashboards	Apache Superset 4.1	SQL analytics and visualization over OMOP CDM data (Enterprise)
Data Catalog	DataHub v0.15	Track data lineage from raw sources through OMOP to analysis outputs (Enterprise)
SSO	Authentik 2025.2	SAML/OIDC identity provider for all services (Enterprise)

Two editions: Community (Traefik + Portainer + pgAdmin, free under Apache 2.0) and Enterprise (adds n8n, Superset, DataHub, Authentik — license-gated).

After installation, every service gets a subdomain:

https://parthenon.acumenus.net     — The research platform
https://portainer.acumenus.net     — Container management
https://pgadmin.acumenus.net       — Database administration
https://grafana.acumenus.net       — Monitoring dashboards
https://ai.acumenus.net            — AI service (MedGemma)
https://jupyter.acumenus.net       — JupyterHub notebooks
https://solr.acumenus.net          — Search administration
https://darkstar.acumenus.net      — R analytics runtime
https://n8n.acumenus.net           — Workflow automation (Enterprise)
https://superset.acumenus.net      — BI dashboards (Enterprise)

All with automatic TLS certificates. No nginx config files. No manual cert rotation.

The Two-Repo Problem

Acropolis started as a separate repository. The logic was clean: Parthenon is the application, Acropolis is the infrastructure. Separate concerns, separate repos, separate release cycles.

In practice, this created a coordination nightmare.

The Acropolis installer needed to know Parthenon's container names. Parthenon's compose file defined them. If we renamed a service — say, r-runtime became darkstar — the Acropolis service registry broke silently. Traefik routed to a container that no longer existed.

The Acropolis installer also needed to run Parthenon's installer. We solved this with a subprocess call:

# The old way: Acropolis shelling out to Parthenon
subprocess.run([
    "python3", str(parthenon_path / "install.py"),
    "--defaults-file", str(defaults_file),
])

This meant Acropolis had to clone or locate the Parthenon repo, manage path resolution across the two repos, pass credentials through a temporary JSON file, and then detect what Parthenon's installer had done after the fact. Four topology modes — fresh_install, local, remote, standalone — each with different code paths.

When we tested this on a VM, three bugs surfaced in the first run:

Port detection used bind() instead of connect_ex() — bind() requires elevated privileges on ports below 1024. Acropolis couldn't check if ports 80 and 443 were free on a fresh Ubuntu 24.04 install.
Docker Compose prefixed the network name — Parthenon's compose file declared a network called parthenon, but Docker Compose automatically prepended the project name, creating parthenon_parthenon. Acropolis checked for parthenon and didn't find it.
Internal services flagged as "unknown" — PHP, PostgreSQL, Redis, Horizon, and other non-routable containers showed up in Docker network inspection but weren't in the curated service registry. The installer prompted the user to configure Traefik routes for parthenon-php — a backend container that should never be exposed.

All three were fixable. But they were symptoms of a deeper issue: two repos that had to agree on implementation details but couldn't enforce that agreement at build time.

The final straw was a port mismatch we discovered during the consolidation. The Acropolis service registry listed nginx at port 8082:

CuratedService("nginx", "parthenon-nginx", 8082, "parthenon", "always")

But 8082 is the host-mapped port. Inside the Docker network — where Traefik connects — nginx listens on port 80. The static Traefik config file (traefik/dynamic/parthenon.yml) had the correct port, because it was written by hand. But the auto-generator in routing.py read from the registry and would produce:

# Wrong — 8082 is the host port, not the container port
services:
  parthenon-parthenon:
    loadBalancer:
      servers:
        - url: "http://parthenon-nginx:8082"

The same mismatch existed for three other services: python-ai (8002 vs 8000), morpheus-ingest (8004 vs 8000), and jupyterhub (8888 vs 8000). All had host-mapped ports in the registry where container-internal ports belonged.

This class of bug is invisible in manual testing — the static config works fine. It only surfaces when the auto-generator runs during a fresh installation. And it would have been impossible if both the container definitions and the service registry lived in the same repository.

The Consolidation

We moved everything into Parthenon/acropolis/:

acropolis/
├── installer/              14 Python modules (~2,000 lines)
│   ├── cli.py              Phase orchestrator
│   ├── topology.py         Parthenon detection (simplified to local-only)
│   ├── editions.py         Community / Enterprise selection
│   ├── discovery.py        24-service curated registry
│   ├── config.py           Domain, TLS, credentials collection
│   ├── network.py          Docker network bridging
│   ├── deploy.py           Docker compose orchestration + health polling
│   ├── routing.py          Traefik dynamic config generation
│   ├── generator.py        Day-2 CLI script generator
│   ├── verify.py           Post-install smoke tests
│   ├── preflight.py        System validation
│   ├── state.py            Resume-on-failure state machine
│   └── utils.py            Docker, network, and password utilities
├── docker-compose.base.yml       Traefik + acropolis_network
├── docker-compose.community.yml  Portainer + pgAdmin
├── docker-compose.enterprise.yml n8n + Superset + DataHub + Authentik
├── traefik/                      Static + dynamic route configs
├── config/                       pgAdmin servers, Superset config
├── k8s/                          Helm charts + Kustomize overlays
└── tests/                        6 unit test files + smoke test

The key architectural changes:

Direct Import Instead of Subprocess

The Acropolis installer now imports Parthenon's installer as a Python module:

# The new way: direct import in the same repo
from installer.cli import run as run_parthenon_installer
run_parthenon_installer(pre_seed={
    "admin_email": config.parthenon_admin_email,
    "admin_name": config.parthenon_admin_name,
    "admin_password": config.parthenon_admin_password,
    "app_url": f"https://parthenon.{config.domain}",
    "timezone": config.timezone,
})

No temporary credentials file. No path resolution. No subprocess exit code interpretation. If the Parthenon installer raises an exception, the Acropolis installer catches it in the same process.

Topology Simplified to Local-Only

The four topology modes collapsed into one. In a monorepo, Parthenon is always the parent directory:

ACROPOLIS_ROOT = Path(__file__).resolve().parent.parent  # acropolis/
PARTHENON_ROOT = ACROPOLIS_ROOT.parent                    # Parthenon/

The fresh_install mode (clone Parthenon from GitHub) is gone. The remote mode (connect to Parthenon on another host) is gone. The standalone mode (Acropolis without Parthenon) is gone. The installer detects whether Parthenon's containers are already running and installs them if not.

Stable Network Name

We added name: parthenon to the Docker network definition:

networks:
  parthenon:
    name: parthenon    # Prevents Docker from prefixing as "parthenon_parthenon"
    driver: bridge

This one line eliminated the network detection logic that had to check three candidate names.

The Service Registry: 24 Containers, Mapped

Acropolis maintains a curated registry of every Parthenon container. This registry drives two things: Traefik route generation and post-install health checks.

CURATED_SERVICES = [
    # Routable — exposed through Traefik with subdomains
    CuratedService("nginx",           "parthenon-nginx",           80,    "parthenon",    "always"),
    CuratedService("darkstar",        "parthenon-darkstar",        8787,  "darkstar",     "always"),
    CuratedService("python-ai",       "parthenon-ai",              8000,  "ai",           "always"),
    CuratedService("morpheus-ingest", "parthenon-morpheus-ingest", 8000,  "morpheus",     "always"),
    CuratedService("solr",            "parthenon-solr",            8983,  "solr",         "if_running"),
    CuratedService("jupyterhub",      "parthenon-jupyterhub",      8000,  "jupyter",      "if_running"),
    CuratedService("grafana",         "parthenon-grafana",         3000,  "grafana",      "if_running"),
    CuratedService("study-agent",     "parthenon-study-agent",     8765,  "study-agent",  "if_running"),
    CuratedService("hecate",          "parthenon-hecate",          8080,  "hecate",       "if_running"),
    # ... plus 15 more (reverb, prometheus, whiterabbit, fhir-to-cdm, orthanc, etc.)

    # Internal — recognized but never routed
    CuratedService("php",       "parthenon-php",       9000, "", "internal"),
    CuratedService("postgres",  "parthenon-postgres",  5432, "", "internal"),
    CuratedService("redis",     "parthenon-redis",     6379, "", "internal"),
    CuratedService("horizon",   "parthenon-horizon",   0,    "", "internal"),
    CuratedService("chromadb",  "parthenon-chromadb",  8000, "", "internal"),
    # ... plus 4 more monitoring containers
]

Every port in this registry is the container-internal port — the one Traefik connects to over the Docker network. Not the host-mapped port. This distinction cost us four bugs before we learned it.

The registry also drives auto-discovery. When the installer scans the Docker network, it matches running containers against this list. Known containers get their predefined subdomain. Unknown containers prompt the user: "Expose parthenon-custom-service through Traefik?"

The Network Bridge

Parthenon and Acropolis services run on separate Docker networks. This is intentional — Parthenon's internal services (PHP, PostgreSQL, Redis) should not be accessible from the Acropolis network, and vice versa.

The bridge works through selective attachment. During Phase 6 (Network Setup), the installer connects only the routable Parthenon containers to acropolis_network:

┌─────────────────────────────────────────────────────────┐
│                    acropolis_network                      │
│                                                          │
│  traefik ──→ parthenon-nginx:80                          │
│          ──→ parthenon-darkstar:8787                     │
│          ──→ parthenon-ai:8000                           │
│          ──→ parthenon-grafana:3000                      │
│          ──→ parthenon-solr:8983                         │
│                                                          │
│  portainer ──→ /var/run/docker.sock                      │
│  pgadmin   ──→ host.docker.internal:5432                 │
│                                                          │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│                    parthenon (internal)                   │
│                                                          │
│  nginx ↔ php ↔ postgres ↔ redis ↔ horizon               │
│  python-ai ↔ chromadb ↔ study-agent                     │
│  darkstar ↔ solr ↔ hecate ↔ qdrant                     │
│                                                          │
└─────────────────────────────────────────────────────────┘

Containers like parthenon-nginx exist on both networks simultaneously. They can reach internal services via the parthenon network and receive external traffic from Traefik via acropolis_network.

This is rolled back automatically if the installation fails. The rollback disconnects each container from acropolis_network and removes the network if it was created during installation.

The Unified Installer

The final product is a single entry point with two modes:

# Application only — local development, no infrastructure overhead
python3 install.py

# Full stack — application + infrastructure (Traefik, Portainer, pgAdmin, Enterprise)
python3 install.py --with-infrastructure

The --with-infrastructure flag runs the Acropolis orchestrator, which in turn calls the Parthenon installer internally. The combined flow:

Acropolis Phase 1 — Preflight: Docker version, daemon running, ports 80/443 free, disk space.

Acropolis Phase 2 — Topology: Detect whether Parthenon is already running. If yes, skip Parthenon installation. If no, run it after configuration.

Acropolis Phase 3 — Edition: Community or Enterprise. Enterprise requires a license key (ACRO-XXXX-XXXX-XXXX format).

Acropolis Phase 4 — Service Discovery: Enumerate running Parthenon containers and match against the 24-service registry.

Acropolis Phase 5 — Configuration: Domain, TLS mode (Let's Encrypt, self-signed, or none), timezone, per-service credentials (pgAdmin, Portainer, and optionally n8n, Superset, DataHub, Authentik).

Parthenon Phases 1-9 — If Parthenon isn't running: preflight, configuration (pre-seeded from Acropolis), Docker pull/build/start, Laravel bootstrap (composer, migrate, seed), Eunomia demo data, frontend build, Solr indexing, admin account creation.

Acropolis Phase 6 — Network: Create acropolis_network, connect routable Parthenon containers.

Acropolis Phase 7 — Deploy: docker compose up -d for infrastructure services, health polling with live-updating terminal table.

Acropolis Phase 8 — Routing: Generate Traefik dynamic configs for every discovered service. WebSocket support for Laravel Reverb. Security headers and compression middleware.

Acropolis Phase 9 — Verification: Smoke test every service. Generate acropolis.sh day-2 operations script. Display URL matrix with credentials.

State persistence through .install-state.json means any phase can fail and the installer resumes from the last completed phase. Credentials never touch the state file — they're stored separately in .install-credentials with chmod 0600.

Day-2 Operations

After installation, the generated acropolis.sh script handles ongoing operations:

./acropolis.sh up              # Start infrastructure services
./acropolis.sh down            # Stop everything
./acropolis.sh status          # Health overview of all services
./acropolis.sh logs [service]  # Follow logs
./acropolis.sh urls            # Print full URL matrix
./acropolis.sh backup          # Backup all volumes to timestamped archives
./acropolis.sh smoke-test      # Re-run health checks
./acropolis.sh update          # Pull latest images and restart

This script is standalone bash with embedded configuration — no Python dependency for day-2 ops. It knows which compose files to use (base, community, enterprise) and which domain to display in URLs.

What We Learned

Host ports and container ports are different. Obvious in hindsight. Docker's 8082:80 mapping means port 8082 on the host, port 80 in the container. Traefik, running inside Docker, connects via the Docker network to port 80. We got this wrong in four services and found it only during consolidation — the static Traefik config had been hand-written with the correct ports, masking the bug in the auto-generator.

Docker Compose network naming is unpredictable. A network declared as parthenon in a compose file whose project name is also parthenon becomes parthenon_parthenon. Adding name: parthenon to the network definition forces the exact name. One YAML line saved us a detection function that checked three candidate names.

Monorepos enforce interface contracts. When the service registry and the container definitions live in the same repository, a rename shows up in git diff. When they're in separate repos, it shows up in production. We caught four port mismatches, one network name issue, and would have caught the container rename from r-runtime to darkstar automatically if we'd been in a monorepo from the start.

Subprocess calls across repos are fragile. Path resolution, environment variable inheritance, exit code semantics, credential passing through temporary files — every one of these is a failure mode. A direct Python import eliminates all of them.

Infrastructure as code means infrastructure in the same repo as the code. Not in a separate repo that references the code. Not in a wiki page. In the same git blame history, the same CI pipeline, the same pull request review.

What's Next

The Acropolis layer is functional but young. On the roadmap:

Authentik SSO integration with Parthenon's Sanctum auth — OIDC provider in Authentik, client in Laravel, so researchers sign in once for the entire platform.
Pre-built Superset dashboards for OMOP CDM — demographic breakdowns, condition prevalence, drug utilization, and data quality metrics, all pointing at Parthenon's PostgreSQL.
n8n workflow templates — automated Achilles runs on new data loads, DQD quality gates, Slack notifications on analysis completion.
Kubernetes Helm chart finalization — the chart structure exists but values need pinning for production HA deployments.

The Acropolis-v2 repository is now deprecated. All development continues in the Parthenon monorepo under acropolis/.

If you're deploying Parthenon and want the full infrastructure stack:

git clone https://github.com/sudoshi/Parthenon.git
cd Parthenon
python3 install.py --with-infrastructure

That's it. One repo, one command, one platform.

The Rise of Darkstar: How We Rebuilt the OHDSI R Runtime for Production

Fri, 20 Mar 2026 23:59:00 GMT

Every platform has a weak link. For Parthenon, it was the R container.

PHP handled 200 concurrent API requests without breaking a sweat. Python served AI inference with async workers. PostgreSQL managed million-row queries across six schemas. Redis cached sessions at sub-millisecond latency. And then there was R — single-threaded, fragile, running bare Rscript as PID 1 with no supervision, no timeouts, and a health check that lied.

This is the story of how we tore it down and built Darkstar — a production-grade R analytics engine that runs OHDSI HADES analyses concurrently, recovers from crashes automatically, and executes 35% faster than the container it replaced.

The Inheritance

Parthenon didn't start from scratch. We inherited the R runtime architecture from OHDSI Broadsea, the community's standard Docker deployment for the OMOP CDM analytics stack. Broadsea ships a single R container running Plumber v1 — the venerable HTTP API framework for R that's been the community standard since 2017.

And for what Broadsea was designed to do — run a single analysis at a time on a researcher's laptop — Plumber v1 is perfectly fine. It's simple, well-documented, and every OHDSI tutorial uses it.

But Parthenon isn't a single-user research tool. It's a multi-tenant clinical research platform serving 18 users across multiple institutions, running CohortMethod estimations, PatientLevelPrediction models, Self-Controlled Case Series analyses, Cohort Diagnostics, and Characterization reports against a million-patient OMOP CDM database. Simultaneously.

That's where things fell apart.

The Breaking Point

The first sign of trouble was a Slack message from a researcher: "My estimation has been running for 20 minutes. Is the system down?"

It wasn't down. Another user had kicked off a CohortMethod propensity score matching job five minutes earlier. Because Plumber v1 is single-threaded, every subsequent request — health checks, status queries, the second user's estimation — queued behind that first analysis with zero feedback.

Here's what was actually happening inside the container:

┌──────────────────────────────────────────────────────────┐
│  PID 1: Rscript plumber_api.R                            │
│         └─ plumber v1 (SINGLE THREAD)                    │
│              ├─ /health         → BLOCKED (behind job)   │
│              ├─ /estimation/run → RUNNING (20 min)       │
│              ├─ /prediction/run → QUEUED (no feedback)   │
│              └─ /status         → BLOCKED                │
│                                                          │
│  Docker health check: curl localhost:8787/health          │
│    interval: 600s (TEN MINUTES between checks)           │
│    Response: {"status":"ok"} (even if JVM is dead)       │
│                                                          │
│  No JDBC timeouts. No process supervision.               │
│  No garbage collection. No crash recovery.               │
└──────────────────────────────────────────────────────────┘

Over the following weeks, I cataloged five distinct failure modes:

Blocked health checks. Docker's health probe couldn't reach the /health endpoint because the single thread was locked in a Cox regression. After 5 retries at 600-second intervals (50 minutes!), Docker finally marked the container unhealthy. But by then, the analysis had probably finished — and the restart killed the cleanup.

Ghost containers. With 10-minute health check intervals, a crashed R process sat undetected. The Laravel backend got connection refused errors and returned generic 500s. Users saw "analysis failed" with no explanation.

Hung JDBC connections. Twice I watched the R process freeze completely — not crashed, not high-CPU, just stuck. strace showed it blocked on a socket read to PostgreSQL with no timeout set. The database had closed the connection during a long-running covariate extraction, but R didn't know. The only fix was docker compose restart r-runtime, which killed any active analysis.

Unsafe disconnects. DatabaseConnector::disconnect() throws if the connection is already dead. Several endpoint files had bare disconnect() calls in their cleanup code. A disconnect error would mask the actual analysis result and return a 500 to the user — even though the analysis had completed successfully. The results were computed, stored in R memory, and then lost because the HTTP response errored on cleanup.

Memory creep. Long-running sessions accumulated R objects across requests with no GC strategy. The default JVM garbage collector would pause unpredictably — sometimes 2-5 seconds — during large covariate matrix operations. Eventually the heap ran out and rJava calls started throwing OutOfMemoryError.

I spent weeks applying band-aids: extending health check intervals to avoid false-positive restarts, adding retry logic in Laravel's RService, telling users to "wait for the current analysis to finish." But the core problem was architectural. Plumber v1 is single-threaded by design. No amount of application-level workarounds fixes that.

The Decision

On March 17, 2026, I decided to stop patching and start rebuilding. The goal was simple:

Replace the entire R runtime infrastructure with something that can handle concurrent requests, recover from crashes, and not lie about its health.

The constraints were equally clear:

Every HADES analysis that worked before must work identically after. Zero breaking changes.
The 12 existing API endpoint files must be portable. We're not rewriting CohortMethod integration.
Memory budget: 32GB container limit, shared between R and the JVM.
Cold start under 2 minutes (HADES package loading is unavoidably heavy).

Phase 1: Stop the Bleeding (March 4)

Before the big rewrite, I made two immediate changes to buy time.

First, the health check interval dropped from 600 seconds to 30. Three failures at 30-second intervals means the container is marked unhealthy in 90 seconds instead of 50 minutes. I also added start_period: 120s to account for HADES package loading — without this, Docker would kill the container before R even finished booting.

Second, I tuned the JVM. The default garbage collector pauses unpredictably during large operations. Switching to G1GC with MaxGCPauseMillis=200 keeps pauses short. Combined with R_MAX_VSIZE=24Gb for the R vector heap, this eliminated the OOM crashes and GC stalls:

environment:
  - _JAVA_OPTIONS=-Xmx8g -Xms2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
  - R_MAX_VSIZE=24Gb

Phase 2: An Honest Health Check (March 7)

The old health check was four lines of R that returned {"status":"ok"} unconditionally. The JVM could be dead, memory at 95%, JDBC driver missing — and it would still say "ok."

I replaced it with a deep validation endpoint that checks five things on every 30-second probe:

HADES packages loadable — requireNamespace() for CohortMethod, PatientLevelPrediction, DatabaseConnector. Catches corrupted installs.
JVM alive — actually creates a Java object via rJava::.jnew(). If the heap is exhausted, this fails.
Memory usage — gc() returns current consumption. Alerts at 87% of the 32GB limit.
JDBC driver present — verifies /opt/jdbc/postgresql-42.7.3.jar exists. (This was a real bug — the volume mount at /app was clobbering the driver.)
Uptime tracking — detects unexpected restarts. If uptime drops to zero when nobody restarted the container, something crashed.

{
  "status": "ok",
  "service": "parthenon-r-runtime",
  "version": "0.2.0",
  "r_version": "4.4.3",
  "uptime_seconds": 3847,
  "checks": {
    "packages": true,
    "jvm": true,
    "memory_used_mb": 4821.3,
    "memory_ok": true,
    "jdbc_driver": true
  }
}

When any check fails, status changes to "degraded". Docker still gets a 200 (so it doesn't restart mid-analysis), but the Laravel backend knows not to submit new work.

Phase 3: JDBC Timeouts (March 8)

The hung-connection problem was insidious. R would issue a SQL query, the database would close the connection during a long covariate extraction, and R would sit on a socket read forever. No timeout. No error. Just silence.

I added explicit JDBC timeouts to every PostgreSQL connection string:

socketTimeout=300        # Kill queries hung at socket level (5 min)
connectTimeout=30        # Fail fast if DB unreachable
loginTimeout=30          # Fail fast if auth hangs
tcpKeepAlive=true        # Detect dead connections via TCP probes

Then I wrapped every DatabaseConnector::disconnect() call in a tryCatch. There were 10 disconnect call sites across 6 endpoint files. Each one got a safe_disconnect() wrapper that logs the error but doesn't throw — so a dead connection during cleanup never masks a successful analysis result.

Phase 4: Async Job Registry (March 10)

Even with health check and timeout improvements, the fundamental problem remained: Plumber v1 is single-threaded. While I planned the full migration to plumber2, I built an interim solution using callr::r_bg().

The idea: instead of blocking the HTTP thread for 20 minutes, dispatch the analysis to a background R subprocess and return a job ID immediately. The Laravel backend polls for completion.

POST /jobs/submit   → dispatch to callr::r_bg(), return {job_id}
GET  /jobs/status/X → check if background process finished
POST /jobs/cancel/X → kill background process

Each job runs in its own R process with full HADES environment. The main Plumber thread stays free for health checks and status queries. Job results are cached in memory with a 5-minute TTL.

This was a stopgap, but it proved the pattern that Darkstar would later implement properly with mirai daemons.

Phase 5: The Big Migration (March 17-19)

Three days. Complete infrastructure overhaul.

Plumber v1 → Plumber2 0.2.0. Plumber2 is the async-first successor to Plumber, designed for production workloads. It uses the httpuv2 event loop and supports native integration with mirai for concurrent execution.

mirai 2.6.1 with 3 daemon workers. mirai ("future" in Japanese) provides persistent R worker processes. Instead of spawning a new callr::r_bg() process per job, mirai maintains 3 pre-warmed daemon workers that share the HADES package load. Each daemon is a separate R process with ~3GB memory footprint (R heap + JVM heap).

s6-overlay for process supervision. The legacy container ran bare Rscript as PID 1. If the process crashed, Docker's restart policy would recreate the container — a 60-second cold start including HADES package loading. With s6-overlay, PID 1 is a proper init system. If the Plumber process crashes, s6 restarts it inside the same container in seconds. The JVM stays warm. The JDBC driver stays loaded.

The new architecture:

┌──────────────────────────────────────────────────────────────┐
│  Docker: parthenon-darkstar (s6-overlay as PID 1)            │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐  │
│  │  s6-overlay (init system, signal handling, supervision) │  │
│  │    └─ plumber2 event loop (non-blocking)               │  │
│  │         ├─ /health      → instant (deep validation)    │  │
│  │         ├─ /estimation  → dispatched to mirai daemon   │  │
│  │         ├─ /prediction  → dispatched to mirai daemon   │  │
│  │         └─ /sccs        → dispatched to mirai daemon   │  │
│  │                                                        │  │
│  │  mirai daemon pool:                                    │  │
│  │    ├─ daemon 1: [IDLE]     ← ready for work            │  │
│  │    ├─ daemon 2: [RUNNING CohortMethod, 12min elapsed]  │  │
│  │    └─ daemon 3: [IDLE]     ← ready for work            │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  Memory: 32GB limit (~3GB per daemon + 3GB event loop)       │
│  JDBC: socketTimeout=300s, connectTimeout=30s, tcpKeepAlive  │
│  JVM: G1GC, -Xmx8g, MaxGCPauseMillis=200ms                 │
│  Health: 30s interval, deep validation, degraded state       │
│  Crash recovery: s6 auto-restart, exit code/signal logging   │
└──────────────────────────────────────────────────────────────┘

The migration required rewriting all 12 endpoint files from Plumber v1 syntax to Plumber2's router API. The core analysis logic — the HADES function calls, the SQL generation, the result transformation — remained untouched. Only the HTTP layer changed.

The Dockerfile went from a simple install.packages("plumber") to a multi-stage, 7-layer build:

Layer	Contents	Purpose
1	plumber2, mirai, rJava, duckdb	Native compilation (Rust toolchain for plumber2's waysign dependency)
2	DatabaseConnector, SqlRender, Andromeda	OHDSI connectivity
3	Cyclops, FeatureExtraction	Analytics core
4	CohortMethod, PLP, SCCS, EvidenceSynthesis	HADES analysis packages
5	DeepPatientLevelPrediction	Deep learning (optional)
6	CohortDiagnostics, CohortGenerator	Cohort tools
7	Strategus	Study orchestration

Each layer is cached independently. A code change in the R API files only rebuilds the final application stage — a 30-second rebuild instead of the 45-minute full HADES compilation.

Phase 6: Namespace Warmup (March 19)

One last optimization. Cold start time was ~60 seconds because R lazy-loads package namespaces on first use. The first health check after boot would take 8 seconds instead of 118ms because it triggered CohortMethod compilation.

I added a build-time warmup step that forces all HADES packages to compile their bytecode during docker build:

RUN Rscript -e " \
  suppressMessages({ \
    library(rJava); .jinit(); \
    library(DatabaseConnector); \
    library(CohortMethod); \
    library(PatientLevelPrediction); \
    library(SelfControlledCaseSeries); \
    library(EvidenceSynthesis); \
  }); \
"

This moved the compilation cost from runtime to build time. Cold start dropped from 60 seconds to ~40 seconds.

The Benchmark

On March 19, I ran the legacy container (Plumber v1, pre-hardening commit c76884236) and Darkstar side by side against the same OMOP CDM database, executing the same CohortMethod estimation spec.

Health Probe Responsiveness During Analysis

Both containers ran a 2-minute analysis. I probed /health every 5 seconds during execution.

Metric	Legacy	Darkstar	Change
Health probes OK	13/24 (54%)	17/24 (71%)	+31%
Probes blocked	11/24 (46%)	7/24 (29%)	36% fewer
Max consecutive blocked	11 (55s dark)	7 (35s)	20s faster recovery

The legacy container went completely dark for 55 seconds straight — nearly a minute where no request of any kind could be served. Darkstar recovered responsiveness 20 seconds sooner. The remaining 35-second blocking window happens during the synchronous JDBC connection establishment and initial SQL burst, which locks the R process handling the request. Once CohortMethod transitions to its computation phase, the plumber2 event loop regains control and health probes resume.

Execution Performance

Both containers ran the identical pipeline: data extraction, covariate building, propensity score fitting. Both hit the same clinical error at the same point (high covariate-treatment correlation — a study design issue, not a container issue).

Metric	Legacy	Darkstar	Change
R execution time	102.8s	66.3s	35% faster
Wall time	168s	159s	5% faster

The 35% speedup comes from three sources: G1GC reducing GC pause overhead, namespace warmup eliminating first-request compilation, and the larger JVM heap reducing garbage collection frequency.

Cold Start

Container	Cold Start
Legacy	2s
Darkstar	4s

Darkstar is 2 seconds slower due to s6-overlay init and mirai daemon startup. This is a one-time cost at container creation — an acceptable tradeoff for crash recovery and process supervision.

The Bugs We Found Along the Way

Building Darkstar wasn't just an infrastructure project. Running real HADES analyses against real clinical data surfaced bugs that would have been invisible in a test environment.

1. Silent covariate exclusion bypass. CohortMethod::createCovariateSettings(excludedConceptIds = c(1234)) was being silently ignored because we were passing the IDs in the wrong argument position. Patients were getting propensity scores contaminated by the exposure concept.

2. CohortMethod v6 API break. Between v5 and v6, every function switched from positional arguments to Args objects: createPs(cohortMethodData, population) became createPs(cohortMethodData, population, createPsArgs = createCreatePsArgs()). Every endpoint needed updating.

3. jsonlite auto-simplification. R's jsonlite::toJSON(simplifyVector = TRUE) was converting single-element arrays into scalar values. A cohort with one patient would serialize as "person_id": 42 instead of "person_id": [42]. Laravel's JSON decoder would then treat it as an integer instead of an array, breaking downstream processing.

4. PLP non-serializable objects. PatientLevelPrediction returns S3 objects with custom print methods, environment closures, and external pointers that jsonlite can't serialize. We had to write custom extraction functions to pull the numeric results out of the PLP result objects.

5. SCCS anchor normalization. The SCCS package expects era_start as an anchor value, but our frontend sent era start (no underscore). R silently accepted the invalid anchor and computed results with a different reference point.

Production Validation

Between March 7 and March 20, Darkstar processed 5 original research studies with 37 cohort definitions and 29 analysis configurations against a million-patient OMOP CDM:

CKD Progression Study — ACEi vs CCB comparative effectiveness on renal outcomes. 73K propensity-score matched pairs. HR=0.989. 9-14 minute execution.
Post-MI Secondary Prevention — Aspirin vs Clopidogrel on recurrent MACE. Stratified Cox regression with 12 negative control outcomes.
Prediabetes Metformin Study — Metformin vs watchful waiting on T2DM progression. PS-stratified Cox with 8 outcome definitions.
Statin Primary vs Secondary Prevention — IHD vs no-IHD composite MACE risk in statin users.
Hypertension vs Metabolic Syndrome — Multi-cohort MACE risk comparison with PS stratification.

Every analysis completed successfully. Every result was clinically plausible. Every execution was tracked through the Jobs page with live progress bars.

The Name

On March 20, 2026, we renamed parthenon-r to parthenon-darkstar. The old name was descriptive — "R runtime." The new name reflects what it became: a hardened, production-grade engine that runs in the background, processes the heaviest workloads in the stack, and never asks for attention.

Sixteen files changed. Zero breaking changes. The HADES packages don't know. The OMOP CDM doesn't know. The researchers don't know. They just see their analyses finish faster and more reliably than before.

What Darkstar Is

Capability	Legacy	Darkstar
Concurrent requests	1 (everything queues)	3 mirai daemons + event loop
Health monitoring	10-min interval, trivial check	30s interval, deep validation
Process supervision	None (bare Rscript as PID 1)	s6-overlay auto-restart
JDBC resilience	No timeouts	300s socket, 30s connect, TCP keepalive
Crash recovery	Docker restart → 60s cold start	s6 in-container restart → seconds
GC strategy	Default JVM GC (2-5s pauses)	G1GC with 200ms pause target
Memory management	Default R limits	24GB vector heap, 8GB JVM
Cold start	60s	40s (namespace warmup)
R execution speed	Baseline	35% faster

What's Next

Darkstar currently runs HADES analyses that were originally designed for batch execution on a single workstation. The next frontier is volcano plots for CodeWAS — running per-concept logistic regressions across thousands of OMOP concepts to generate effect estimates and p-values for phenome-wide association studies.

The infrastructure is ready. CohortMethod already produces hazard ratios and p-values per outcome. The mirai daemon pool can handle the concurrent workload. The async job registry can track thousands of sub-analyses. The D3 visualization layer in the frontend has forest plot patterns ready to extend.

Darkstar was built to handle exactly this kind of workload: computationally expensive, highly parallelizable, and too important to fail silently.

Darkstar is open source as part of Parthenon. The container definition, plumber2 API, and s6-overlay configuration are in docker/r/ and r-runtime/ in the Parthenon repository.

Workbench Launcher and the Single-Database Migration Plan

Fri, 20 Mar 2026 00:00:00 GMT

A big architectural day on Parthenon: we shipped the new Workbench launcher experience and drafted the formal plan to collapse our multi-database mess into a single, schema-isolated parthenon database. A noticeably cleaner codebase on the other side.

The old FinnGen entry point was awkward — a toolset dropdown buried inside the FinnGen UI that never made much sense to new users. Today we replaced it with a proper Workbench launcher.

What changed

Routing was restructured cleanly:

/workbench → new WorkbenchLauncherPage (the hub)
/workbench/finngen → FinnGen tool (previously at /workbench)

This gives us a scalable pattern. Every future tool gets its own sub-route under /workbench/*, and the launcher is the natural entry point rather than an afterthought.

WorkbenchLauncherPage renders a responsive toolset grid. Each tile is a ToolsetCard component that displays the tool's name, description, status badge (e.g., Active, Beta, Coming Soon), and an accent glow tied to the tool's color identity. The cards pull from a central toolset registry — a typed ToolsetDescriptor array that will be the single source of truth as we add more tools. Adding a new tool to the Workbench is now a one-liner in the registry.

Sidebar navigation was updated to always show the Workbench link regardless of context. Previously it only appeared when you were already inside a Workbench tool, which made discoverability poor. Workbench is now a first-class citizen in the nav.

Inside FinnGen, the toolset dropdown has been replaced with a simple back-to-Workbench link. The UI is significantly less cluttered.

Studies: Phase B Integration Test Complete

The studies module passed its Phase B integration checkpoint today. Seven cohorts were generated and their counts were verified against data exploration results. This is exactly the kind of manual validation checkpoint that catches discrepancies between the cohort generation pipeline and what's actually in the CDM — worth calling out explicitly in the log. Phase B passing means we're on track for the next milestone.

The Single-Database Migration Plan

This is the most consequential architectural decision documented today, even though the implementation work starts tomorrow. The plan lives in single-database-migration-plan.md and the core idea is straightforward: one database, multiple schemas, full schema isolation.

The problem it solves

Parthenon currently ships with two physical databases (ohdsi and a secondary) and seven named Laravel connections, with search paths scattered across fifteen-plus .env variables. This configuration has caused repeated data-loss incidents in the past — usually because an environment was misconfigured and a query landed in the wrong schema. It's also a documentation and onboarding nightmare.

The target architecture

Everything moves into a single parthenon database with schemas doing the isolation work:

Schema	Purpose
`app`	Users, roles, cohorts, sources, studies, analyses
`omop`	CDM + vocabulary (standard OHDSI layout)
`results`	Achilles / DQD output
`gis`	Geospatial tables
`eunomia`	Demo dataset
`eunomia_results`	Demo Achilles results
`public`	Laravel internals (migrations, jobs, cache)

Seven connections collapse to five, and all five point at the same database. The cdm and vocab connections merge into omop. The docker_pg connection goes away entirely.

The .env simplification

Before: 15+ database variables, some redundant, some subtly wrong across environments. After:

DB_HOST=pgsql.acumenus.net
DB_PORT=5432
DB_DATABASE=parthenon
DB_USERNAME=smudoshi
DB_PASSWORD=acumenus

Search paths are hardcoded in database.php because they're structural — a given connection always hits the same schema regardless of environment. The only thing that legitimately varies per environment is the host, credentials, and database name. Separating structural config from environmental config is the right call here and will prevent an entire class of misconfiguration bugs.

What's Next

Single-database migration implementation — update database.php, migrate .env templates, write the schema consolidation migrations, and test against the full connection matrix.
Workbench registry expansion — now that the ToolsetDescriptor pattern is in place, start populating the registry with upcoming tools currently in planning.
Studies Phase C — Phase B is green, Phase C begins.
FinnGen UX polish — the back-to-Workbench link is functional but the transition animation needs work.

Today felt like a good cleanup-and-foundation day. The Workbench has real bones, and we have a credible plan for a database architecture that won't bite us anymore.

Evidence Investigation Goes Full-Stack: FinnGen Retirement, Multi-Dataset Morpheus, and the Road to Volcano Plots

Fri, 20 Mar 2026 00:00:00 GMT

A massive 116-commit push today centered almost entirely on maturing the Evidence Investigation workbench — from retiring the old FinnGen UI to hardening the investigation experience with proper navigation, KPI metrics, URL-synced state, and ARIA accessibility. We also landed multi-dataset support in Morpheus and set the stage for one of the most requested features on the roadmap: volcano plots powered by the newly-renamed Darkstar R runtime.

FinnGen Workbench Retirement

The legacy FinnGen workbench has been officially decommissioned. The dedicated FinnGen card on the workbench landing page (c41f7afbc) now launches Evidence Investigation instead, and the old FinnGen workbench code has been removed entirely (a667f94ca). This isn't just a cleanup — it's a consolidation of intent. Evidence Investigation is the unified surface for exploring GWAS signals, phenotype associations, and concept-level evidence, and FinnGen data fits naturally within that framing rather than deserving its own siloed experience.

If you're working on workbench routing, note that the card rewiring lives in the workbench feature directory and the deprecated component has been fully pruned, so there's no dead code to worry about.

Evidence Investigation: A Day of Hardening

The bulk of today's commits were focused on making Evidence Investigation feel like a production-grade tool rather than a prototype. Here's what changed:

Navigation & Layout (`ab7530fe4`)

The investigation view now has a proper top bar with a title, breadcrumb trail, and back navigation. This sounds small but it's critical UX — users were getting lost when drilling into sub-views with no clear path back to the workbench. The breadcrumbs also provide context for where a particular evidence thread lives within the broader investigation.

KPI Metrics & ContextCard (`0b2a2185c`)

The ContextCard component was significantly enhanced to surface KPI metrics — high-level summary statistics that orient the analyst before they dive into domain-level evidence. URL-synced sub-tabs were also wired in here, meaning deep links into specific sub-views now work correctly and browser history behaves as expected.

URL-Synced Domain, Sidebar States & Error Handling (`fcf5c919c`)

Domain selection is now reflected in the URL, so sharing a link to "I'm looking at the Drug domain for concept X" actually works. Sidebar loading states were added to prevent the jarring empty-panel flash during data fetches, and execute error handling ensures analysts see a meaningful message rather than a silent failure when a backend query goes wrong.

LeftRail, ARIA & Responsive Layout (`6b3b25811`)

The LeftRail component received attention on three fronts: clickable counts (so analysts can click a domain count to navigate directly to it), a sidebar badge showing active evidence pins, and a full pass of ARIA roles for screen reader compatibility. Responsive layout fixes round this out — the investigation view now holds together on narrower viewports.

GWAS Catalog Endpoints & EvidencePinService (`7514f14e6`, `d1e310592`)

Two targeted fixes corrected the GWAS Catalog API calls to use the proper findByDiseaseTrait and findByGene endpoints, and EvidencePinService was updated to correctly thread concept_ids and gene_symbols through to those calls. These were silent failures before — the UI looked fine but no GWAS data was actually being fetched.

Morpheus: Multi-Dataset Support (`f86ec2342`)

Morpheus gained a dataset selector, parameterized queries, and a registry table today. Previously, Morpheus queries ran against a single implicit dataset — a significant limitation for any platform claiming to be multi-CDM. The dataset selector allows analysts to choose which CDM they're querying against, the queries are now parameterized accordingly, and the registry table tracks which datasets have been analyzed. This is foundational infrastructure for the cross-CDM comparison workflows that are coming later this quarter.

PostgreSQL Numeric Type Fix (`aa02db2be`)

A subtle but painful bug: durationHours was coming back from PostgreSQL as a string-typed numeric, causing downstream arithmetic to silently produce NaN. Wrapping it in Number() is a one-line fix, but finding it required actually debugging a Morpheus duration calculation that was returning nonsense values. Worth noting for anyone writing queries against PostgreSQL columns that look like numbers but arrive as strings in certain ORM/driver configurations.

On the Horizon: Volcano Plots via Darkstar

Today's work laid groundwork documented in volcano-plot-darkstar-handoff.md for what's coming next. The CodeWASResults.tsx component currently renders a placeholder where an interactive volcano plot will live. The blocker hasn't been the visualization layer — it's been the data. The current CodeWAS backend only returns {label, count} aggregate signals with no per-concept statistical significance data.

That changes with Darkstar. The R runtime container (recently renamed from parthenon-r to parthenon-darkstar, service name darkstar in docker-compose) already computes per-outcome {log_hr, p_value, ci_95_lower, ci_95_upper} via CohortMethod in r-runtime/api/estimation.R. The plumbing to call it from Laravel is straightforward — config('services.r_runtime.url') resolves to http://darkstar:8787. The implementation task is connecting CodeWAS results to a new Darkstar endpoint and rendering the volcano plot with those coordinates.

What's Next

Volcano plot implementation — wire CodeWASResults.tsx to Darkstar's estimation endpoint and render a proper interactive log_HR vs -log10(p) scatter plot with significance thresholds
Cross-CDM comparison in Morpheus — the dataset registry table sets up the UI; the backend aggregation layer needs to follow
Evidence Investigation polish — the pin/unpin workflow and evidence export are the two remaining rough edges before this can be considered feature-complete
Darkstar endpoint expansion — PatientLevelPrediction feature importance scores are available in the container but not yet surfaced anywhere in the frontend; a feature importance panel for PLP models is a natural next step

Today was a grind in the best sense — lots of small fixes that collectively make Evidence Investigation feel solid enough to hand to a real analyst. The foundation is there. Now we build upward.

Fortifying Parthenon: Codebase Health Audit, E2E Regression Guards, and the StudyAgent Fork

Thu, 19 Mar 2026 00:00:00 GMT

A big day on the quality and resilience front: 34 commits landed in Parthenon focused on a comprehensive codebase health audit, a major expansion of our Playwright E2E test suite, and a fork of the StudyAgent submodule. No flashy new features today — instead, we did the unglamorous but essential work of making sure what we've already built actually works, is safe to change, and won't silently break in production.

Full Codebase Health Audit

The day started with a full audit of the Parthenon codebase, documented in e2e-regression-guard-plan.md and the accompanying devlog entry. The audit surfaced several deferred issues across type safety, modal consistency, error handling, and empty state guidance — the kind of paper cuts that accumulate invisibly until they become real user-facing bugs.

Four of those audit items were resolved in a single focused fix commit (f3359b5a5):

Type safety gaps — tightened TypeScript types in areas where any or loose inference had crept in
Modal consistency — standardized modal open/close behavior across components that had drifted from the shared pattern
Empty state guidance — added meaningful empty states where components were previously rendering blank space or [object Object] to end users
Error handling deduplication — Phase 6 of the audit cleanup (5e621c8db) extracted a shared getErrorMessage utility to replace scattered, inconsistent error-to-string coercions throughout the app

The getErrorMessage refactor is small but high-leverage: previously, different parts of the codebase handled thrown errors differently (some assumed Error objects, others assumed strings, some did nothing). Centralizing that logic means we get consistent, human-readable error messages everywhere without thinking about it.

E2E Test Suite Expansion

The audit made one thing painfully clear: several production bugs — a crashing FHIR Export page, ingestion jobs rendering as [object Object], genomics hardcoded sourceId values, gene filter buttons that didn't actually filter — would have been caught immediately by E2E tests. They weren't caught because those E2E tests didn't exist.

We fixed that today with three test commits:

Smoke Suite: 53 Tests, 8 New Routes (`f0c10e804`)

The smoke suite now covers 53 routes, up from the previous 29. New routes added include /admin/fhir-export (now tested in its "coming soon" state), /admin/solr, and several others that had been added to the app but never wired into the test harness. Critically, we added [object Object] detection — every page load now asserts that the rendered text does not contain the literal string [object Object], catching serialization bugs before they reach users.

Regression Guard Specs: 7 Specs for Audit Findings (`7c7054683`)

Each bug surfaced in the audit now has a dedicated regression guard spec. The spec table maps directly to the audit findings:

Bug	Guard test
Ingestion API envelope not unwrapped	Verify job list renders real text, not `[object Object]`
Gene buttons don't filter	Click gene → assert ClinVar input contains gene name
History loses query metadata	Generate query → open history → assert explanation is non-empty
Dashboard rows not keyboard-accessible	Tab to row → Enter → assert navigation occurred

These aren't happy-path tests — they're specifically designed to catch regressions of known past failures.

Cross-Feature Journey Tests: 12 Tests Across 5 Specs (`eca464bd4`)

Beyond regression guards, we added 5 cross-feature journey specs covering end-to-end user workflows that span multiple modules. These are the tests most likely to catch integration breakage when two independently-working features interact unexpectedly.

One important housekeeping fix also landed here: 15e7ea23d ensures that admin@acumenus.net is never used in auth E2E tests. Using a shared admin account in parallel test runs is a classic source of flaky, order-dependent failures. Auth tests now use isolated test credentials.

Implementation Plan Documented (`f0c10e804`)

The full three-phase E2E regression guard rollout plan is now documented in docs/e2e-regression-guard-plan.md. Phase 1 (fix and baseline existing tests) is largely complete. Phases 2 and 3 — Page Object Model implementation and CI enforcement — are queued for the coming days.

StudyAgent Submodule Fork

f4cec79c5 forks the study-agent submodule from its upstream source to sudoshi/StudyAgent. This gives us full control over the StudyAgent codebase — we can apply Parthenon-specific changes, pin dependencies, and iterate without waiting on upstream. The fork is now the canonical submodule reference going forward.

What's Next

Phase 2 of the E2E plan: Implement the Page Object Model architecture outlined in E2E_TEST_PLAN.md. The infrastructure is in place; we need to lift the raw selector strings in specs into reusable page objects.
CI enforcement: Wire the expanded smoke and regression guard suites into the CI pipeline so no PR can merge if E2E tests are red. Today we proved the tests find real bugs — next step is making them mandatory.
StudyAgent integration: With the fork in place, begin adapting StudyAgent to Parthenon's auth and data model conventions.
Remaining audit items: The health audit flagged more than the four items fixed today. We'll work through the backlog systematically, using the regression guard framework to ensure fixes stay fixed.

Today was a day of paying down technical debt with receipts — every bug we documented got a test, every inconsistency got a fix, and the codebase came out measurably more trustworthy than it started.

Parthenon Blog

Introducing Parthenon: Transforming Healthcare with AI-Powered Outcomes Research

Why We Built This​

What Parthenon Does​

The Architecture​

The AI Imperative​

Building in Public​

What's Next​

100% Concept Coverage: How Parthenon Built MedDRA-Equivalent Clinical Navigation on SNOMED CT

The Problem: A Hierarchy Browser That Made No Clinical Sense​

Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries​

How OMOP Assigns Domains​

How SNOMED Organizes Concepts​

The Severed Hierarchy​

The Fix: Cross-Domain SNOMED Tree Builder​

Phase 1: Remove the Domain Filter on Parents​

Phase 2: Propagate Cross-Domain Parent Chains​

Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent​

Why MedDRA Navigation Matters​

The Clinical Groupings Table​

Closing Every Coverage Gap​

MedDRA SOC Parity Map​

The Anchor Verification Problem​

Multi-Anchor Navigation UX​

Why This Matters​

For Clinical Researchers​

For Cohort Building​

For the OHDSI Ecosystem​

For Vocabulary Governance​

Technical Summary​

What's Next​

Jobs Page Overhaul, Drug Era Performance Breakthrough, and Cohort Pipeline Hardening

Jobs Page: From Partial View to Full Platform Visibility​

What Was Broken​

What We Shipped​

ETL Performance: drug_era Goes From Overnight to Minutes​

Cohort Generation: Eight Bugs Closed​

AI Memory & Infrastructure Housekeeping​

What's Next​

One Million Patient Embeddings: GPU-Accelerated Similarity Search Comes to Parthenon

The Silent Failure​

Bug #1: Integer Concepts vs. String Validation​

Bug #2: Dict Lab Vector vs. List Validation​

Bug #3: Batch Response Key Mismatch​

The Compound Effect​

The Fixes​

Type Coercion at the API Boundary​

Lab Vector Dict Handling​

Batch Response Re-keying​

From CPU to GPU: The Ollama Migration​

The Embedding Service Refactor​

768 Dimensions: The Full Encoder Width​

Demographics and Measurements: Numeric, Not Encoded​

The 123x Speedup: Batch Deduplication​

The Production Run: 1,007,007 Patients​

Data Richness Across Sources​

What This Enables​

Sub-Second Similar Patient Search​

Cross-Source Phenotypic Matching​

Cohort Discovery via Centroid Search​

Embedding-Powered Analytics​

Genomics-Aware Similarity​

Foundation for Federated Similarity​

Architecture: The Final Pipeline​

Fallback Guarantees​

Performance Characteristics​

Embedding Generation​

Similarity Search (Embedding Mode)​

Storage​

Lessons Learned​

1. Silent Failures Are Architecture Bugs​

2. Validate at the Seam​

3. Deduplication Beats Parallelism​

4. Tinker Has a Hidden Iteration Limit​

5. One Patient Can Break the Pipeline​

What's Next​

Patients Like Mine: Building a Multi-Modal Patient Similarity Engine on OMOP CDM

The Gap: From Genomic Identity to Clinical Phenotype​

Landscape Research: What Exists and What Doesn't​

The Oracle Approach: Weighted PageRank​

Why We Built This

What Parthenon Does

The Architecture

The AI Imperative

Building in Public

What's Next

The Problem: A Hierarchy Browser That Made No Clinical Sense

Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries

How OMOP Assigns Domains

How SNOMED Organizes Concepts

The Severed Hierarchy

The Fix: Cross-Domain SNOMED Tree Builder

Phase 1: Remove the Domain Filter on Parents

Phase 2: Propagate Cross-Domain Parent Chains

Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent

Why MedDRA Navigation Matters

The Clinical Groupings Table

Closing Every Coverage Gap

MedDRA SOC Parity Map

The Anchor Verification Problem

Multi-Anchor Navigation UX

Why This Matters

For Clinical Researchers

For Cohort Building

For the OHDSI Ecosystem

For Vocabulary Governance

Technical Summary

What's Next

Jobs Page: From Partial View to Full Platform Visibility

What Was Broken

What We Shipped

ETL Performance: `drug_era` Goes From Overnight to Minutes

Cohort Generation: Eight Bugs Closed

AI Memory & Infrastructure Housekeeping

What's Next

The Silent Failure

Bug #1: Integer Concepts vs. String Validation

Bug #2: Dict Lab Vector vs. List Validation

Bug #3: Batch Response Key Mismatch

The Compound Effect

The Fixes

Type Coercion at the API Boundary

Lab Vector Dict Handling

Batch Response Re-keying

From CPU to GPU: The Ollama Migration

The Embedding Service Refactor

768 Dimensions: The Full Encoder Width

Demographics and Measurements: Numeric, Not Encoded

The 123x Speedup: Batch Deduplication

The Production Run: 1,007,007 Patients

Data Richness Across Sources

What This Enables

Sub-Second Similar Patient Search

Cross-Source Phenotypic Matching

Cohort Discovery via Centroid Search

Embedding-Powered Analytics

Genomics-Aware Similarity

Foundation for Federated Similarity

Architecture: The Final Pipeline

Fallback Guarantees

Performance Characteristics

Embedding Generation

Similarity Search (Embedding Mode)

Storage

Lessons Learned

1. Silent Failures Are Architecture Bugs

2. Validate at the Seam

3. Deduplication Beats Parallelism

4. Tinker Has a Hidden Iteration Limit

5. One Patient Can Break the Pipeline

What's Next

The Gap: From Genomic Identity to Clinical Phenotype

Landscape Research: What Exists and What Doesn't

The Oracle Approach: Weighted PageRank

The Academic Frontier: Embeddings and Meta-Paths

The OHDSI Ecosystem

What Was Missing

Architecture: Split Responsibility

The Six Dimensions

1. Demographics