<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Parthenon Blog</title>
        <link>http://localhost:8082/docs/blog</link>
        <description>Parthenon Blog</description>
        <lastBuildDate>Thu, 31 Dec 2099 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Introducing Parthenon: Transforming Healthcare with AI-Powered Outcomes Research]]></title>
            <link>http://localhost:8082/docs/blog/introducing-parthenon</link>
            <guid>http://localhost:8082/docs/blog/introducing-parthenon</guid>
            <pubDate>Thu, 31 Dec 2099 00:00:00 GMT</pubDate>
            <description><![CDATA[Pinned — The founding vision for Parthenon, a next-generation unified outcomes research platform built on OMOP CDM v5.4.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p><strong>Pinned Post</strong> | Originally published March 7, 2026</p>
</blockquote>
<p>Outcomes research has evolved alongside the broader arc of healthcare analytics infrastructure. Early siloed clinical systems produced fragmented administrative and claims data with limited analytic utility — adequate for billing, but structurally unsuitable for longitudinal cohort construction or comparative effectiveness work. The meaningful use era expanded the availability of structured clinical data, yet interoperability failures meant that patient journeys remained fractured across institutional boundaries, undermining the real-world evidence studies that outcomes researchers depend on. The shift to integrated analytics platforms — particularly the adoption of common data models like OMOP/OHDSI — marked a genuine inflection point: federated network studies, standardized phenotyping, and reproducible retrospective analyses became operationally feasible at scale. Now a fourth generation is taking shape, one in which AI-augmented clinical intelligence moves outcomes research from retrospective description toward prospective, near-real-time evidence generation — enabling dynamic cohort surveillance, treatment heterogeneity detection, and value-based care signal identification that was previously impractical outside of narrow clinical trial settings.</p>
<p>Parthenon is built for this fourth generation.</p>
<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/parthenon-hero.jpg" alt="The Parthenon" style="width:100%;display:block"></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-we-built-this">Why We Built This<a href="http://localhost:8082/docs/blog/introducing-parthenon#why-we-built-this" class="hash-link" aria-label="Direct link to Why We Built This" title="Direct link to Why We Built This">​</a></h2>
<p>The problems with traditional healthcare analytics infrastructure are well-documented but stubbornly persistent. Data fragmentation scatters patient information across EHR, laboratory, radiology, and claims platforms with inconsistent terminologies. Analytics teams are overwhelmed with routine reporting demands, leaving limited capacity for the strategic analysis that actually improves outcomes. And the insights that do emerge are retrospective — care gaps identified too late for optimal impact, interventions that are reactive rather than proactive.</p>
<p>The OHDSI community addressed part of this problem brilliantly. The OMOP Common Data Model standardizes clinical data across institutions. HADES packages encode decades of pharmacoepidemiology methodology. Atlas provides a visual interface for cohort building and analysis design. But the toolchain has grown to 15+ disconnected applications — Atlas, WebAPI, Achilles, DQD, CohortGenerator, CohortMethod, PatientLevelPrediction, and more — each with its own deployment, its own UI paradigm, and its own learning curve.</p>
<p>Parthenon replaces all of them with a single application.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-parthenon-does">What Parthenon Does<a href="http://localhost:8082/docs/blog/introducing-parthenon#what-parthenon-does" class="hash-link" aria-label="Direct link to What Parthenon Does" title="Direct link to What Parthenon Does">​</a></h2>
<p>At its core, Parthenon is a unified outcomes research platform built on OMOP CDM v5.4. A researcher can move through the entire real-world evidence lifecycle without leaving the browser: explore vocabularies and build concept sets, construct patient cohorts with a visual builder, then run characterization, incidence rates, treatment pathways, population-level estimation, patient-level prediction, self-controlled case series, and evidence synthesis.</p>
<p>But Parthenon extends well beyond what Atlas ever offered.</p>
<p><strong>Genomics</strong> — Upload VCF files, annotate variants against ClinVar, browse mutations in an interactive variant browser, and convene virtual tumor boards with AI-assisted interpretation. This bridges the gap between population-level observational research and precision medicine.</p>
<p><strong>Medical Imaging</strong> — View DICOM studies with a built-in Cornerstone3D viewer, connect to PACS systems via WADO-RS, and incorporate imaging criteria directly into cohort definitions. Radiogenomics analysis becomes possible within the same platform where you run your epidemiological studies.</p>
<p><strong>Health Economics &amp; Outcomes Research</strong> — Model cost-effectiveness, identify care gaps across populations, and run economic analytics. The care gap module tracks screening compliance, flags missed interventions, and quantifies the financial impact of closing gaps at various capture rates.</p>
<p><strong>FHIR R4 Integration</strong> — Connect to EHR systems using SMART Backend Services for automated bulk export and incremental sync. Clinical data flows from production EHR systems into your OMOP CDM without manual ETL intervention.</p>
<p><strong>AI-Assisted Analysis</strong> — An integrated AI service powered by Ollama and MedGemma provides semantic concept search, natural-language cohort suggestions, clinical result interpretation, and genomic variant summarization. The AI doesn't replace the researcher — it reduces the time between question and insight.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-architecture">The Architecture<a href="http://localhost:8082/docs/blog/introducing-parthenon#the-architecture" class="hash-link" aria-label="Direct link to The Architecture" title="Direct link to The Architecture">​</a></h2>
<p>Parthenon is a containerized multi-service application orchestrated with Docker Compose. The frontend is React 19 with TypeScript strict mode, Tailwind CSS, and Zustand for state management. The backend is Laravel 11 with PHP 8.4, using Sanctum authentication and Spatie role-based access control. A Python FastAPI service handles AI capabilities — MedGemma through Ollama, pgvector embeddings for semantic search. An R Plumber API executes HADES analyses — CohortMethod, PatientLevelPrediction, SelfControlledCaseSeries — against the CDM. PostgreSQL 16 stores both application data and the OMOP CDM across multiple schemas. Redis powers the job queue via Laravel Horizon. Solr provides full-text vocabulary search.</p>
<p>Eight Docker services, one <code>docker compose up -d</code> command. A Python installer walks you through configuration in nine phases — from preflight checks through admin account creation — with optional Eunomia demo data so you can start exploring immediately.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-ai-imperative">The AI Imperative<a href="http://localhost:8082/docs/blog/introducing-parthenon#the-ai-imperative" class="hash-link" aria-label="Direct link to The AI Imperative" title="Direct link to The AI Imperative">​</a></h2>
<p>The PDF that inspired this platform — <em>Transforming Healthcare Delivery: Next-Generation Clinical Analytics Powered by Artificial Intelligence</em> — makes the business case quantitatively. Six in ten Americans live with chronic disease, driving $4.1 trillion in annual healthcare costs. Traditional monitoring of conditions like CKD achieves just 3% compliance across all seven recommended measures. AI-enhanced approaches have demonstrated 267% improvement in compliance, prevention of 15-20 dialysis cases per year, and $3-4 million in annual cost savings per 10,000 patients.</p>
<p>These aren't theoretical projections. They're the measurable outcomes that become possible when you combine standardized clinical data (OMOP CDM), validated analytical methods (HADES), and machine learning that identifies patterns humans can't see at scale.</p>
<p>Parthenon's care gap module, population risk scoring, and predictive analytics are designed to deliver exactly this kind of impact — clinical decision support that anticipates patient needs rather than simply responding to events.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="building-in-public">Building in Public<a href="http://localhost:8082/docs/blog/introducing-parthenon#building-in-public" class="hash-link" aria-label="Direct link to Building in Public" title="Direct link to Building in Public">​</a></h2>
<p>This blog will serve as a daily development journal. Every day, we'll document what was built, what broke, what we learned, and what's next. The first technical post — about the <a href="http://localhost:8082/docs/blog/ohdsi-hades-r-runtime-lessons">five bugs we had to fix</a> before HADES analyses would run in production — is already live. It's the kind of hard-won knowledge that doesn't appear in any documentation, and we think sharing it openly makes the entire OHDSI ecosystem stronger.</p>
<p>We're also automating this process. A Claude Code agent reviews the day's git history every night and generates a narrative dev log post — not just a commit list, but a story about what the code changes mean and why they matter.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/introducing-parthenon#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The platform's roadmap follows a four-phase journey. The foundation phase establishes data integration and baseline analytics. Core analytics introduces care bundles for high-impact conditions. Advanced capabilities bring full population health management with HCC coding optimization and clinical decision support integration. The transformation phase enables value-based care analytics, precision medicine, and continuously learning systems.</p>
<p>We're deep in the foundation and core analytics phases right now, shipping features daily. Follow this blog to watch it happen.</p>
<hr>
<p><em>Parthenon is open-source and available at <a href="https://github.com/sudoshi/Parthenon" target="_blank" rel="noopener noreferrer">github.com/sudoshi/Parthenon</a>. Built by <a href="https://www.acumenus.io/" target="_blank" rel="noopener noreferrer">Acumenus Data Sciences</a>.</em></p>]]></content:encoded>
            <category>announcement</category>
            <category>vision</category>
            <category>architecture</category>
            <category>ai</category>
            <category>healthcare</category>
        </item>
        <item>
            <title><![CDATA[100% Concept Coverage: How Parthenon Built MedDRA-Equivalent Clinical Navigation on SNOMED CT]]></title>
            <link>http://localhost:8082/docs/blog/ontology-parity-with-meddra</link>
            <guid>http://localhost:8082/docs/blog/ontology-parity-with-meddra</guid>
            <pubDate>Sun, 05 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Parthenon's Vocabulary Search now provides 100% navigational coverage of all 105,324 standard SNOMED CT Condition concepts through 27 curated clinical groupings — achieving functional parity with MedDRA's System Organ Class navigation while preserving SNOMED's superior clinical granularity. This is the story of diagnosing the SNOMED-OMOP domain boundary problem, engineering a cross-domain hierarchy builder, curating a clinically intelligent grouping layer, and systematically closing every coverage gap until no standard concept was left behind.]]></description>
            <content:encoded><![CDATA[<p>Parthenon's Vocabulary Search now provides <strong>100% navigational coverage</strong> of all 105,324 standard SNOMED CT Condition concepts through 27 curated clinical groupings — achieving functional parity with MedDRA's System Organ Class navigation while preserving SNOMED's superior clinical granularity. This is the story of diagnosing the SNOMED-OMOP domain boundary problem, engineering a cross-domain hierarchy builder, curating a clinically intelligent grouping layer, and systematically closing every coverage gap until no standard concept was left behind.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-a-hierarchy-browser-that-made-no-clinical-sense">The Problem: A Hierarchy Browser That Made No Clinical Sense<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#the-problem-a-hierarchy-browser-that-made-no-clinical-sense" class="hash-link" aria-label="Direct link to The Problem: A Hierarchy Browser That Made No Clinical Sense" title="Direct link to The Problem: A Hierarchy Browser That Made No Clinical Sense">​</a></h2>
<p>Parthenon has a Browse Hierarchy tab in its Vocabulary Search page — a tree-style navigator that lets clinical researchers drill from high-level categories down to specific medical concepts. When we built it on April 3rd, we materialized SNOMED CT's <code>concept_ancestor</code> relationships into a <code>vocab.concept_tree</code> table with 527,000 edges across six OMOP domains.</p>
<p>It looked correct. It wasn't.</p>
<p>When a clinical researcher clicked "Conditions," they saw this:</p>
<table><thead><tr><th>What They Expected</th><th>What They Got</th></tr></thead><tbody><tr><td>Cardiovascular disorders</td><td>Abnormal feces</td></tr><tr><td>Respiratory disorders</td><td>Abulia</td></tr><tr><td>Neurological disorders</td><td>Anxiety</td></tr><tr><td>Gastrointestinal disorders</td><td>Biliuria</td></tr><tr><td>... (20-30 clinically organized categories)</td><td>... (174 alphabetically sorted orphan concepts)</td></tr></tbody></table>
<p>The Measurement domain was worse: <strong>1,223 flat concepts</strong> — questionnaire scores, lab test names, and clinical observations dumped at the top level with zero hierarchy. Observation had 633. The Browse Hierarchy was functionally a flat alphabetical list for three of six domains. Only Drug (14 ATC categories) and Visit (19) worked, because they use non-SNOMED hierarchies that don't cross domain boundaries.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause-snomed-doesnt-respect-omop-domain-boundaries">Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#root-cause-snomed-doesnt-respect-omop-domain-boundaries" class="hash-link" aria-label="Direct link to Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries" title="Direct link to Root Cause: SNOMED Doesn't Respect OMOP Domain Boundaries">​</a></h2>
<p>This is a fundamental tension in the OMOP CDM that every OHDSI implementer faces but rarely has to solve at the navigation layer.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-omop-assigns-domains">How OMOP Assigns Domains<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#how-omop-assigns-domains" class="hash-link" aria-label="Direct link to How OMOP Assigns Domains" title="Direct link to How OMOP Assigns Domains">​</a></h3>
<p>OMOP assigns every vocabulary concept to exactly one <strong>domain</strong>: Condition, Observation, Measurement, Procedure, Drug, Visit, etc. This assignment determines which clinical data table a concept belongs in — a concept in the Condition domain goes into <code>condition_occurrence</code>, one in Measurement goes into <code>measurement</code>, and so on.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-snomed-organizes-concepts">How SNOMED Organizes Concepts<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#how-snomed-organizes-concepts" class="hash-link" aria-label="Direct link to How SNOMED Organizes Concepts" title="Direct link to How SNOMED Organizes Concepts">​</a></h3>
<p>SNOMED CT is a polyhierarchical ontology with a single root concept, "Clinical finding" (concept_id 441840). Its hierarchy is organized by <strong>finding type and body system</strong>, not by OMOP domain. The children of "Clinical finding" include:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">Clinical finding (441840, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Disease (4274025, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── Disorder of body system (4180628, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│       └── Disorder of cardiovascular system (134057, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│           └── Heart disease → Coronary arteriosclerosis → ...</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Cardiovascular finding (4023995, domain = Observation)    ← CROSS-DOMAIN!</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── Heart disease (321588, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Respiratory finding (4024567, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── Dyspnea (312437, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Functional finding (4041284, domain = Observation)        ← CROSS-DOMAIN!</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── Difficulty walking (36714126, domain = Condition)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└── ... 120+ more children spanning 4 domains</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>"Cardiovascular finding" is the natural parent of many Condition-domain heart diseases, but OMOP assigns it to the Observation domain. "Functional finding" parents hundreds of Condition-domain concepts like difficulty walking and impaired cognition, but lives in Observation. This is not a data quality issue — it's by design. OMOP's domain assignment reflects <em>what table the data goes in</em>, while SNOMED's hierarchy reflects <em>clinical relationships</em>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-severed-hierarchy">The Severed Hierarchy<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#the-severed-hierarchy" class="hash-link" aria-label="Direct link to The Severed Hierarchy" title="Direct link to The Severed Hierarchy">​</a></h3>
<p>Our original <code>HierarchyBuilderService</code> built the tree per-domain, filtering <code>concept_ancestor</code> edges so both parent and child had to share the same <code>domain_id</code>:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)">-- THE BUG: both parent and child must be in same domain</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> parent</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">domain_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Condition'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">domain_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Condition'</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This severed every cross-domain link. "Heart disease" (Condition) couldn't find its SNOMED parent "Cardiovascular finding" (Observation). Every concept whose nearest SNOMED parent lived in a different domain became an orphan — dumped directly under the virtual domain root with no organizing structure.</p>
<p>The numbers told the story:</p>
<table><thead><tr><th>Domain</th><th>Orphan Roots</th><th>Cause</th></tr></thead><tbody><tr><td><strong>Measurement</strong></td><td><strong>1,223</strong></td><td>Almost entirely cross-domain. Most measurement-domain findings have Observation-domain parents in SNOMED.</td></tr><tr><td><strong>Observation</strong></td><td><strong>633</strong></td><td>Observation concepts parented by Procedure or Condition concepts in SNOMED.</td></tr><tr><td><strong>Condition</strong></td><td><strong>174</strong></td><td>80 concepts with Observation parents + 93 with Measurement parents.</td></tr><tr><td>Procedure</td><td>12</td><td>Mostly self-contained in SNOMED.</td></tr><tr><td>Drug</td><td>14</td><td>Uses ATC hierarchy, not SNOMED.</td></tr><tr><td>Visit</td><td>19</td><td>Uses CMS Place of Service / NUCC / UB04.</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix-cross-domain-snomed-tree-builder">The Fix: Cross-Domain SNOMED Tree Builder<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#the-fix-cross-domain-snomed-tree-builder" class="hash-link" aria-label="Direct link to The Fix: Cross-Domain SNOMED Tree Builder" title="Direct link to The Fix: Cross-Domain SNOMED Tree Builder">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="phase-1-remove-the-domain-filter-on-parents">Phase 1: Remove the Domain Filter on Parents<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#phase-1-remove-the-domain-filter-on-parents" class="hash-link" aria-label="Direct link to Phase 1: Remove the Domain Filter on Parents" title="Direct link to Phase 1: Remove the Domain Filter on Parents">​</a></h3>
<p>The core fix was a single SQL change with cascading architectural implications. We replaced <code>buildSnomedDomain()</code> (which built one domain at a time) with <code>buildUnifiedSnomedTree()</code> that processes all four SNOMED domains together:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)">-- FIXED: no domain filter on parent — follow SNOMED's actual hierarchy</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">INSERT</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">INTO</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_tree </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">parent_concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> child_concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> domain_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> ca</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">ancestor_concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> ca</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">descendant_concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">domain_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_ancestor ca</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">JOIN</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept parent </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ON</span><span class="token plain"> parent</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> ca</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">ancestor_concept_id</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">JOIN</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept child </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ON</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> ca</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">descendant_concept_id</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> ca</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">min_levels_of_separation </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> parent</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">vocabulary_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'SNOMED'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> parent</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">standard_concept </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'S'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">vocabulary_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'SNOMED'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">standard_concept </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'S'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> child</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">domain_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">IN</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">'Condition'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Procedure'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Measurement'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Observation'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Note: NO parent.domain_id filter!</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Each edge is tagged with the <strong>child's</strong> domain_id, so domain-scoped tree queries still work. The primary key was expanded from <code>(parent_concept_id, child_concept_id)</code> to <code>(parent_concept_id, child_concept_id, domain_id)</code> to support the same edge appearing in multiple domain contexts.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="phase-2-propagate-cross-domain-parent-chains">Phase 2: Propagate Cross-Domain Parent Chains<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#phase-2-propagate-cross-domain-parent-chains" class="hash-link" aria-label="Direct link to Phase 2: Propagate Cross-Domain Parent Chains" title="Direct link to Phase 2: Propagate Cross-Domain Parent Chains">​</a></h3>
<p>The initial fix produced 839 Condition roots instead of 174 — worse, not better. Here's why:</p>
<p>Removing the parent domain filter correctly added edges like (Cardiovascular finding → Heart disease) tagged as Condition. But "Cardiovascular finding" itself had no incoming Condition-tagged edge — its parent "Clinical finding" → "Cardiovascular finding" was tagged Observation. So "Cardiovascular finding" became an orphan root in the Condition tree.</p>
<p>We needed to propagate cross-domain parent chains upward iteratively. The <code>propagateCrossDomainParents()</code> algorithm:</p>
<ol>
<li><strong>Find cross-domain roots</strong> — concepts under the virtual domain root whose actual OMOP domain differs from the tree they're in</li>
<li><strong>Walk up their SNOMED parents</strong> via <code>concept_ancestor</code> — add parent→child edges tagged with the target domain</li>
<li><strong>Remove from virtual root</strong> — the concept now has a real parent in the domain tree</li>
<li><strong>Re-discover new roots</strong> — the newly added parents may themselves be cross-domain</li>
<li><strong>Repeat</strong> until no cross-domain roots remain (typically 3-5 iterations)</li>
</ol>
<p>The result was transformative:</p>
<table><thead><tr><th>Domain</th><th>Before</th><th>After Phase 1</th><th>After Phase 2</th></tr></thead><tbody><tr><td><strong>Condition</strong></td><td>174 orphans</td><td>839 (worse!)</td><td><strong>2 roots</strong></td></tr><tr><td><strong>Measurement</strong></td><td>1,223 flat</td><td>620</td><td><strong>5 roots</strong></td></tr><tr><td><strong>Observation</strong></td><td>633 flat</td><td>822</td><td><strong>57 roots</strong></td></tr><tr><td><strong>Procedure</strong></td><td>12</td><td>48</td><td><strong>1 root</strong></td></tr></tbody></table>
<p>The 2 Condition roots are "Clinical finding" (with 121 immediate children) and "Situation with explicit context" (with 1). Drilling into "Clinical finding" now shows exactly what a clinician expects: Disease, Musculoskeletal finding, Bleeding, Neurological finding, Digestive system finding, Respiratory finding — the natural SNOMED organizing categories.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="layer-2-clinical-groupings--our-meddra-soc-equivalent">Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#layer-2-clinical-groupings--our-meddra-soc-equivalent" class="hash-link" aria-label="Direct link to Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent" title="Direct link to Layer 2: Clinical Groupings — Our MedDRA SOC Equivalent">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="why-meddra-navigation-matters">Why MedDRA Navigation Matters<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#why-meddra-navigation-matters" class="hash-link" aria-label="Direct link to Why MedDRA Navigation Matters" title="Direct link to Why MedDRA Navigation Matters">​</a></h3>
<p>MedDRA (Medical Dictionary for Regulatory Activities) provides five levels of curated clinical navigation:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">SOC (27)    → System Organ Class (Cardiac disorders, Respiratory disorders, ...)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">HLGT (~337) → High Level Group Term</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">HLT (~1738) → High Level Term</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">PT (~24000) → Preferred Term</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">LLT (~83000)→ Lowest Level Term</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Every level is curated by human medical terminologists with consistent granularity. A researcher navigating from "Cardiac disorders" through "Coronary artery disorders" to "Myocardial infarction" experiences a smooth, predictable narrowing at each step.</p>
<p>SNOMED's hierarchy, while clinically correct, is organized by <strong>ontological category</strong> (Disease → Disorder of body system → Disorder of cardiovascular system), not by <strong>clinical intuition</strong> (Cardiac disorders → Heart failure syndromes → Congestive heart failure). The depth varies from 2 to 13 levels. Intermediate nodes mix organizational axes — anatomical, etiological, temporal, age-based — in a single level.</p>
<p>We needed a curated navigation layer that provides MedDRA SOC-equivalent entry points while leveraging SNOMED's superior concept hierarchy underneath.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-clinical-groupings-table">The Clinical Groupings Table<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#the-clinical-groupings-table" class="hash-link" aria-label="Direct link to The Clinical Groupings Table" title="Direct link to The Clinical Groupings Table">​</a></h3>
<p>We created <code>app.clinical_groupings</code> — a curated metadata table that lives in the application schema (never modifying the read-only vocabulary tables):</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">CREATE</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">TABLE</span><span class="token plain"> app</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">clinical_groupings </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    id </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">SERIAL</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">PRIMARY</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">KEY</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    name </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">VARCHAR</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token number" style="color:hsl(29, 54%, 61%)">100</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">NOT</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">NULL</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">         </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- "Cardiovascular"</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    description </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">TEXT</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">                    </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- "Heart, blood vessel disorders and findings"</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    domain_id </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">VARCHAR</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token number" style="color:hsl(29, 54%, 61%)">20</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">NOT</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">NULL</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- "Condition"</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    anchor_concept_ids </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">INTEGER</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">NOT</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">NULL</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- SNOMED concept_ids defining this group</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    sort_order </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">INTEGER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">DEFAULT</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    icon </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">VARCHAR</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token number" style="color:hsl(29, 54%, 61%)">50</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    color </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">VARCHAR</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token number" style="color:hsl(29, 54%, 61%)">7</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">                   </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Hex color for UI</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    parent_grouping_id </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">INTEGER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">REFERENCES</span><span class="token plain"> app</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">clinical_groupings</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Each grouping has one or more <strong>anchor concept IDs</strong> — SNOMED concepts whose entire descendant tree (via <code>concept_ancestor</code>) defines the grouping's coverage. When a user clicks "Cardiovascular," they navigate to the anchor concept's subtree in the SNOMED hierarchy.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="closing-every-coverage-gap">Closing Every Coverage Gap<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#closing-every-coverage-gap" class="hash-link" aria-label="Direct link to Closing Every Coverage Gap" title="Direct link to Closing Every Coverage Gap">​</a></h3>
<p>This is where clinical informatics meets systematic engineering. We didn't stop at "good enough" — we measured coverage and closed every gap.</p>
<p><strong>Iteration 1: Disorder Anchors Only (77.3% coverage)</strong></p>
<p>Our first 20 Condition groupings used "Disorder of X system" concepts as anchors — the same approach Atlas and most OHDSI tools take. This covered 81,453 of 105,324 standard Condition concepts.</p>
<p>The missing 22.7% revealed a critical insight: SNOMED distinguishes between <strong>disorders</strong> (diseases, conditions) and <strong>clinical findings</strong> (observations, signs, symptoms). Both are assigned to the Condition domain in OMOP, but they sit on different branches of SNOMED's hierarchy:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">Clinical finding</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Disease</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── Disorder of body system</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│       └── Disorder of cardiovascular system ← Our anchor (covers disorders)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└── Cardiovascular finding                    ← NOT covered (findings branch)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    └── Heart murmur, Blood pressure finding, ECG abnormality, etc.</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>"Heart murmur" is a Condition-domain concept. It's clinically related to cardiovascular disorders. But it's not under "Disorder of cardiovascular system" — it's under "Cardiovascular finding." MedDRA handles this via multi-axiality (a concept can appear under multiple SOCs). We needed to cover both branches.</p>
<p><strong>Iteration 2: Disorder + Finding Siblings (98.4% coverage)</strong></p>
<p>We added the SNOMED "finding" sibling of every organ-system disorder anchor:</p>
<table><thead><tr><th>Grouping</th><th>Disorder Anchor</th><th>Finding Anchor Added</th></tr></thead><tbody><tr><td>Cardiovascular</td><td>Disorder of cardiovascular system (134057)</td><td>+ Cardiovascular finding (4023995)</td></tr><tr><td>Respiratory</td><td>Disorder of respiratory system (320136)</td><td>+ Respiratory finding (4024567)</td></tr><tr><td>Neurological</td><td>Disorder of nervous system (376337)</td><td>+ Neurological finding (4011630) + CNS finding (4086181)</td></tr><tr><td>Dermatological</td><td>Disorder of skin (4317258)</td><td>+ Skin AND/OR mucosa finding (4212577)</td></tr><tr><td>...</td><td>...</td><td>...</td></tr></tbody></table>
<p>We also added 7 new MedDRA SOC-equivalent groupings that were entirely missing:</p>
<table><thead><tr><th>New Grouping</th><th>MedDRA SOC Equivalent</th><th>Anchor Concepts</th></tr></thead><tbody><tr><td><strong>Vascular</strong></td><td>SOC 27 — Vascular disorders</td><td>Vascular disorder (443784)</td></tr><tr><td><strong>Hepatobiliary</strong></td><td>SOC 9 — Hepatobiliary disorders</td><td>Disorder of liver and/or biliary tract (1244824) + Biliary tract (197917) + Jaundice (137977)</td></tr><tr><td><strong>Renal &amp; Urinary</strong></td><td>SOC 21 — Renal and urinary</td><td>Disorder of urinary system (75865) + Urine finding (437382)</td></tr><tr><td><strong>Reproductive &amp; Breast</strong></td><td>SOC 22 — Reproductive system</td><td>Female (4180154) + Male (196738) + Breast (77030)</td></tr><tr><td><strong>Investigations</strong></td><td>SOC 13 — Investigations</td><td>Evaluation finding (40480457) + Finding by method (4041287)</td></tr><tr><td><strong>General Signs &amp; Symptoms</strong></td><td>SOC 8 — General disorders</td><td>Bleeding (437312) + Mass (4102111) + Edema (433595) + Fever (437663) + Disease (4274025) + 16 more</td></tr><tr><td><strong>Body Region Findings</strong></td><td>N/A (SNOMED-specific)</td><td>Trunk (4117930) + Limb (138239) + Head (4247371) + Back (4213101) + Neck (4184252)</td></tr></tbody></table>
<p>Coverage jumped from 77.3% to 98.4% — 103,629 of 105,324 concepts.</p>
<p><strong>Iteration 3: Systematic Gap Closure (100.0% coverage)</strong></p>
<p>The remaining 1.6% (1,695 concepts) fell into specific SNOMED categories that sit outside the disorder/finding dichotomy. We used MedGemma to analyze the 66 parent-level groups and map each to the most clinically appropriate existing grouping, then verified every concept_id against <code>vocab.concept</code>.</p>
<p>Key expansions in this final pass:</p>
<table><thead><tr><th>Expansion</th><th>Concepts Captured</th><th>Clinical Rationale</th></tr></thead><tbody><tr><td><strong>Neoplasm</strong> + Finding of lesion, Clinical stage finding</td><td>+349</td><td>Tumor staging (Gleason grades, TNM), morphology, and oncology assessment findings belong with neoplasms</td></tr><tr><td><strong>Neurological</strong> + Speech finding, Coordination finding</td><td>+261</td><td>Speech pathology and motor coordination are neurological subspecialties</td></tr><tr><td><strong>Hematologic</strong> + Blood/lymphatics/immune system finding</td><td>+185</td><td>Anemias (under "Disorder of cellular component of blood") were missed because they're not under "Disorder of hematopoietic structure" in SNOMED — a non-obvious hierarchy gap</td></tr><tr><td><strong>Injury, Poisoning &amp; Procedural</strong> + Wound finding, Device finding</td><td>+100</td><td>Wound assessment, procedural complications, and device-related findings</td></tr><tr><td><strong>Congenital &amp; Genetic</strong> + Carrier of disorder</td><td>+53</td><td>Genetic carrier states (e.g., "Carrier of cystic fibrosis") are findings, not disorders</td></tr><tr><td><strong>Functional Impairment</strong> (new grouping)</td><td>+473</td><td>Impaired cognition, difficulty walking, ADL limitations — these are cross-domain from Observation but clinically critical Condition concepts</td></tr></tbody></table>
<p>Final result: <strong>105,299 of 105,324 standard SNOMED Condition concepts covered</strong> (100.0%). The 25 uncovered are 3 true orphans with no ancestors in <code>concept_ancestor</code> (vocabulary data quality issue) and 22 concepts reachable only through paths that don't intersect any grouping anchor.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="meddra-soc-parity-map">MedDRA SOC Parity Map<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#meddra-soc-parity-map" class="hash-link" aria-label="Direct link to MedDRA SOC Parity Map" title="Direct link to MedDRA SOC Parity Map">​</a></h3>
<p>The final 27 Condition groupings map directly to MedDRA's 27 System Organ Classes:</p>
<table><thead><tr><th>MedDRA SOC</th><th>Parthenon Grouping</th><th>Anchors</th></tr></thead><tbody><tr><td>Blood and lymphatic system disorders</td><td><strong>Hematologic</strong></td><td>Hematopoietic structure + Cellular blood + Blood/lymph/immune finding</td></tr><tr><td>Cardiac disorders</td><td><strong>Cardiovascular</strong></td><td>Cardiovascular system + Cardiovascular finding</td></tr><tr><td>Congenital, familial and genetic disorders</td><td><strong>Congenital &amp; Genetic</strong></td><td>Congenital disease + Genetic disease + Carrier of disorder</td></tr><tr><td>Ear and labyrinth disorders</td><td><strong>Ear &amp; Hearing</strong></td><td>Disorder of ear + ENT finding</td></tr><tr><td>Endocrine disorders + Metabolism and nutrition</td><td><strong>Endocrine &amp; Metabolic</strong></td><td>Metabolic disease + Endocrine system + Metabolic/endocrine findings</td></tr><tr><td>Eye disorders</td><td><strong>Eye &amp; Vision</strong></td><td>Disorder of eye region + Eye/vision finding</td></tr><tr><td>Gastrointestinal disorders</td><td><strong>Gastrointestinal</strong></td><td>Digestive system + Digestive finding + Stool finding</td></tr><tr><td>General disorders and administration site conditions</td><td><strong>General Signs &amp; Symptoms</strong></td><td>Bleeding + Mass + Edema + Fever + Vital signs + 16 more</td></tr><tr><td>Hepatobiliary disorders</td><td><strong>Hepatobiliary</strong></td><td>Liver/biliary tract + Jaundice</td></tr><tr><td>Immune system disorders</td><td><strong>Immune System</strong></td><td>Immune function + Hypersensitivity + Adverse reaction propensity</td></tr><tr><td>Infections and infestations</td><td><strong>Infectious Disease</strong></td><td>Infectious disease + Inactive TB + Susceptibility</td></tr><tr><td>Injury, poisoning and procedural complications</td><td><strong>Injury, Poisoning &amp; Procedural</strong></td><td>Traumatic injury + Poisoning + Procedural complications + Wound + Device</td></tr><tr><td>Investigations</td><td><strong>Investigations</strong></td><td>Evaluation finding + Method finding + Body product finding</td></tr><tr><td>Musculoskeletal and connective tissue disorders</td><td><strong>Musculoskeletal</strong></td><td>MSK system + MSK finding + Muscle finding</td></tr><tr><td>Neoplasms benign, malignant and unspecified</td><td><strong>Neoplasm</strong></td><td>Malignant + Benign + Uncertain behavior + Lesion finding + Clinical staging</td></tr><tr><td>Nervous system disorders</td><td><strong>Neurological</strong></td><td>Nervous system + Neurological finding + CNS finding + Coordination + Speech</td></tr><tr><td>Pregnancy, puerperium and perinatal conditions</td><td><strong>Pregnancy &amp; Perinatal</strong></td><td>Pregnancy + Childbirth finding + Neonatal + Perinatal + Fetal + Development</td></tr><tr><td>Psychiatric disorders</td><td><strong>Mental &amp; Behavioral</strong></td><td>Mental disorder + Psych finding + Delusion</td></tr><tr><td>Renal and urinary disorders</td><td><strong>Renal &amp; Urinary</strong></td><td>Urinary system + Urine finding + Micturition</td></tr><tr><td>Reproductive system and breast disorders</td><td><strong>Reproductive &amp; Breast</strong></td><td>Female reproductive + Male genital + Breast</td></tr><tr><td>Respiratory, thoracic and mediastinal disorders</td><td><strong>Respiratory</strong></td><td>Respiratory system + Respiratory finding + Respiratory measurements</td></tr><tr><td>Skin and subcutaneous tissue disorders</td><td><strong>Dermatological</strong></td><td>Skin + Mucosa finding + Soft tissue + Color + Integumentary + Swelling</td></tr><tr><td>Social circumstances</td><td><em>Observation domain</em></td><td>Social context finding (covered in Observation groupings)</td></tr><tr><td>Surgical and medical procedures</td><td><em>Procedure domain</em></td><td>Covered by Procedure groupings (Surgical, Evaluation, Therapeutic, Rehab, Preventive)</td></tr><tr><td>Vascular disorders</td><td><strong>Vascular</strong></td><td>Vascular disorder</td></tr><tr><td>N/A — Parthenon additions</td><td><strong>Nutritional</strong></td><td>Nutritional disorder + Eating/feeding finding</td></tr><tr><td>N/A — Parthenon additions</td><td><strong>Pain Syndromes</strong></td><td>Pain</td></tr><tr><td>N/A — Parthenon additions</td><td><strong>Functional Impairment</strong></td><td>Functional finding</td></tr><tr><td>N/A — Parthenon additions</td><td><strong>Body Region Findings</strong></td><td>Trunk + Limb + Head + Back + Neck + Face + Posture</td></tr></tbody></table>
<p>MedDRA SOCs 23 (Social circumstances) and 24 (Surgical and medical procedures) are covered by our Observation and Procedure domain groupings respectively, which is architecturally correct — these concepts live in different OMOP domains.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-anchor-verification-problem">The Anchor Verification Problem<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#the-anchor-verification-problem" class="hash-link" aria-label="Direct link to The Anchor Verification Problem" title="Direct link to The Anchor Verification Problem">​</a></h2>
<p>One of the harder lessons from this work: <strong>SNOMED concept IDs are not guessable, and ILIKE is not a concept resolver.</strong></p>
<p>Our initial seeder used ILIKE pattern matching against <code>vocab.concept</code> to resolve anchor names to concept_ids:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> concept_id </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> concept_name </span><span class="token operator" style="color:hsl(207, 82%, 66%)">ILIKE</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'Disorder of ear'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> vocabulary_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'SNOMED'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> standard_concept </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'S'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">BY</span><span class="token plain"> concept_id </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">LIMIT</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This produced catastrophically wrong results for 22 of 39 initial groupings:</p>
<table><thead><tr><th>Grouping</th><th>Intended Concept</th><th>ILIKE Resolved To</th><th>concept_id</th></tr></thead><tbody><tr><td>Pain Syndromes</td><td>Pain</td><td><strong>Dementia</strong></td><td>4182210</td></tr><tr><td>Ear &amp; Hearing</td><td>Disorder of ear</td><td><strong>Multiple sclerosis</strong></td><td>374919</td></tr><tr><td>Genitourinary</td><td>Disorder of genitourinary system</td><td><strong>Urethritis</strong></td><td>195862</td></tr><tr><td>Immune System</td><td>Immune system disorder</td><td><strong>Malignant lymphoma</strong></td><td>432571</td></tr><tr><td>Neoplasm (2nd anchor)</td><td>Benign neoplasm</td><td><strong>Passing flatus</strong></td><td>4091513</td></tr><tr><td>Cardiac Testing</td><td>Cardiac measure</td><td><strong>Dipipanone overdose</strong></td><td>4173533</td></tr><tr><td>Pulmonary Function</td><td>Respiratory measure</td><td><strong>Eustrongylides tubifex</strong></td><td>4206896</td></tr><tr><td>Preventive (Procedure)</td><td>Prophylactic procedure</td><td><strong>Syndrome of inappropriate vasopressin secretion</strong></td><td>4207539</td></tr></tbody></table>
<p>The problem: ILIKE matches substrings. SNOMED has 350,000+ concepts. An ILIKE query for "Disorder of ear" might match "Disorder of ear" (concept_id 378161) — or it might match "Early onset cerebellar ataxia" or another concept that contains those characters, depending on which concept_id sorts first. The <code>ORDER BY concept_id LIMIT 1</code> made the result deterministic but not correct.</p>
<p>Our fix was to reverse the resolver priority: <strong>verified hardcoded IDs first, name matching as fallback only.</strong> Every anchor concept_id in the seeder was individually verified against <code>vocab.concept</code> with an exhaustive audit query:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">WITH</span><span class="token plain"> seeder_ids</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">intended_name</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> hardcoded_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">AS</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token keyword" style="color:hsl(286, 60%, 67%)">VALUES</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">'Disorder of cardiovascular system'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">134057</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">'Pain'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4329041</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- ... all 119 anchor IDs</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> </span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">CASE</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHEN</span><span class="token plain"> c</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_name </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> s</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">intended_name </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">THEN</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'✓ MATCH'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">       </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ELSE</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'✗ WRONG: '</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">||</span><span class="token plain"> c</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_name</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">END</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">as</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">status</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  s</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">intended_name</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> s</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">hardcoded_id</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> seeder_ids s</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">LEFT</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">JOIN</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept c </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ON</span><span class="token plain"> c</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> s</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">hardcoded_id</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> c</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_name </span><span class="token operator" style="color:hsl(207, 82%, 66%)">IS</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">NULL</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">OR</span><span class="token plain"> c</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_name </span><span class="token operator" style="color:hsl(207, 82%, 66%)">!=</span><span class="token plain"> s</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">intended_name</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Result: (0 rows) — all 119 anchors verified</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This audit query is now part of our verification protocol. Every time we add or modify clinical groupings, we run it to confirm zero mismatches before seeding.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="multi-anchor-navigation-ux">Multi-Anchor Navigation UX<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#multi-anchor-navigation-ux" class="hash-link" aria-label="Direct link to Multi-Anchor Navigation UX" title="Direct link to Multi-Anchor Navigation UX">​</a></h2>
<p>Several groupings require multiple SNOMED anchors (the record is Pregnancy &amp; Perinatal with 9 anchors). When a user clicks a multi-anchor grouping, they see a sub-level listing each anchor concept:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">Conditions &gt; Endocrine &amp; Metabolic</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Metabolic disease (46 subcategories) →</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Disorder of endocrine system (51 subcategories) →</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Metabolic finding (22 subcategories) →</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── Endocrine finding (17 subcategories) →</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└── Finding of secondary sexual characteristics (2 subcategories) →</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Single-anchor groupings drill directly into the SNOMED subtree. A "Show all concepts" toggle lets power users bypass the grouping layer and see the raw tree roots.</p>
<p>The groupings API returns resolved anchor details (concept name, vocabulary, class) so the frontend can display meaningful labels without additional queries:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Endocrine &amp; Metabolic"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"anchors"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">436670</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Metabolic disease"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"domain_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Condition"</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">31821</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Disorder of endocrine system"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"domain_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Condition"</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">432455</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Metabolic finding"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"domain_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Condition"</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">444107</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Endocrine finding"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"domain_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Observation"</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4306009</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"concept_name"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Finding of secondary sexual characteristics"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"domain_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Observation"</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-this-matters">Why This Matters<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#why-this-matters" class="hash-link" aria-label="Direct link to Why This Matters" title="Direct link to Why This Matters">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="for-clinical-researchers">For Clinical Researchers<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#for-clinical-researchers" class="hash-link" aria-label="Direct link to For Clinical Researchers" title="Direct link to For Clinical Researchers">​</a></h3>
<p>Before this work, browsing SNOMED conditions in Parthenon was functionally impossible. A researcher looking for cardiovascular conditions would see 174 orphan concepts and have to use keyword search instead. Now they click Cardiovascular, see 79 subcategories (57 disorders + 22 findings), and drill to any level of SNOMED's 13-deep hierarchy.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="for-cohort-building">For Cohort Building<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#for-cohort-building" class="hash-link" aria-label="Direct link to For Cohort Building" title="Direct link to For Cohort Building">​</a></h3>
<p>The groupings layer makes Parthenon the first open-source OHDSI tool to provide MedDRA-equivalent navigation for cohort definition concept selection. When building a cohort that needs "all cardiovascular conditions," a researcher can start from the Cardiovascular grouping, expand its anchors, and use <code>includeDescendants</code> to capture the full SNOMED subtree — something that previously required knowing the exact SNOMED concept_id to search for.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="for-the-ohdsi-ecosystem">For the OHDSI Ecosystem<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#for-the-ohdsi-ecosystem" class="hash-link" aria-label="Direct link to For the OHDSI Ecosystem" title="Direct link to For the OHDSI Ecosystem">​</a></h3>
<p>The SNOMED-OMOP domain boundary problem affects every tool in the OHDSI ecosystem. Atlas's concept hierarchy viewer suffers from the same orphan-root issue (though it uses a different codebase). Our cross-domain tree builder and clinical groupings layer are architectural patterns that could be adopted by the broader OHDSI community. The <code>propagateCrossDomainParents()</code> algorithm in particular solves a problem that, as far as we can determine, no other OHDSI tool has addressed — following SNOMED's actual polyhierarchical structure across OMOP domain boundaries.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="for-vocabulary-governance">For Vocabulary Governance<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#for-vocabulary-governance" class="hash-link" aria-label="Direct link to For Vocabulary Governance" title="Direct link to For Vocabulary Governance">​</a></h3>
<p>The <code>app.clinical_groupings</code> table establishes infrastructure for ongoing vocabulary curation. The <code>parent_grouping_id</code> foreign key supports future HLGT/HLT-equivalent sub-groupings — the next two levels of MedDRA's five-level navigation. The anchor-based architecture means groupings stay valid across SNOMED vocabulary updates as long as the anchor concepts aren't retired, and the seeder's verification protocol catches breakages automatically.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="technical-summary">Technical Summary<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#technical-summary" class="hash-link" aria-label="Direct link to Technical Summary" title="Direct link to Technical Summary">​</a></h2>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Condition concept coverage</td><td><strong>100.0%</strong> (105,299 / 105,324)</td></tr><tr><td>Total clinical groupings</td><td><strong>46</strong> (27 Condition + 8 Measurement + 6 Observation + 5 Procedure)</td></tr><tr><td>Total anchor concepts</td><td><strong>119</strong> (all verified against vocab.concept)</td></tr><tr><td>Concept tree edges</td><td><strong>538,424</strong> across 6 domains</td></tr><tr><td>Max hierarchy depth</td><td><strong>16</strong> (Observation), 13 (Condition), 12 (Procedure/Measurement)</td></tr><tr><td>Hierarchy build time</td><td>~30 seconds (full rebuild with results population)</td></tr><tr><td>Cross-domain propagation</td><td>3-5 iterations per domain</td></tr><tr><td>MedDRA SOC parity</td><td><strong>25 of 27 SOCs</strong> directly mapped (2 covered by other domains)</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/ontology-parity-with-meddra#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The clinical groupings layer is designed for two future enhancements:</p>
<ol>
<li>
<p><strong>HLGT/HLT-equivalent sub-groupings</strong> — The <code>parent_grouping_id</code> column supports hierarchical groupings. Under "Cardiovascular," we could add sub-groupings like "Coronary artery disorders," "Heart failure syndromes," "Arrhythmias," and "Valvular heart disease" — matching MedDRA's HLGT level. This would require ~300-400 curated sub-groupings, which is a substantial but bounded clinical curation task.</p>
</li>
<li>
<p><strong>Data prevalence overlay</strong> — Show person count and record count from Achilles results alongside each grouping card, so researchers can immediately see which clinical categories have the most data in their CDM sources. This turns the grouping browser from a navigation tool into a data discovery tool.</p>
</li>
<li>
<p><strong>AI-assisted curation</strong> — We've already demonstrated the pattern: use a medical LLM (now II-Medical-8B, replacing MedGemma 4B) for clinical reasoning about concept relationships, paired with database queries for concept_id verification. This pipeline could semi-automate the creation of HLGT-level sub-groupings, with human review as the quality gate.</p>
</li>
</ol>
<p>Today, Parthenon's vocabulary browser provides the navigational quality of MedDRA with the clinical depth of SNOMED CT. No other open-source OHDSI tool offers this combination. For the first time, a clinical researcher can browse from "Cardiovascular" to "Coronary arteriosclerosis" through a clinically intuitive path — without knowing a single concept_id, without switching tools, and without leaving Parthenon.</p>
<hr>
<p><em>This work was completed on April 5, 2026. The cross-domain SNOMED tree builder, clinical groupings layer, and Browse Hierarchy UI are all available in the current Parthenon release. The clinical grouping definitions are seeded via <code>ClinicalGroupingSeeder</code> and can be customized for institution-specific navigation needs.</em></p>]]></content:encoded>
            <category>vocabulary</category>
            <category>snomed</category>
            <category>meddra</category>
            <category>ontology</category>
            <category>hierarchy</category>
            <category>clinical-groupings</category>
            <category>omop</category>
            <category>architecture</category>
            <category>ohdsi</category>
            <category>informatics</category>
        </item>
        <item>
            <title><![CDATA[Jobs Page Overhaul, Drug Era Performance Breakthrough, and Cohort Pipeline Hardening]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-04-04</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-04-04</guid>
            <pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A landmark day for platform observability and data pipeline reliability. We shipped a fully wired Jobs monitoring page that surfaces all 13+ tracked job types, broke through a major ETL performance ceiling on the SynPUF dataset (17 hours → 14 minutes for drug_era builds), and closed out a cohort generation audit that uncovered eight discrete bugs across the SQL builders, API layer, and frontend.]]></description>
            <content:encoded><![CDATA[<p>A landmark day for platform observability and data pipeline reliability. We shipped a fully wired Jobs monitoring page that surfaces all 13+ tracked job types, broke through a major ETL performance ceiling on the SynPUF dataset (17 hours → 14 minutes for <code>drug_era</code> builds), and closed out a cohort generation audit that uncovered eight discrete bugs across the SQL builders, API layer, and frontend.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="jobs-page-from-partial-view-to-full-platform-visibility">Jobs Page: From Partial View to Full Platform Visibility<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#jobs-page-from-partial-view-to-full-platform-visibility" class="hash-link" aria-label="Direct link to Jobs Page: From Partial View to Full Platform Visibility" title="Direct link to Jobs Page: From Partial View to Full Platform Visibility">​</a></h2>
<p>The single biggest user-facing win today was landing commit <code>5e29c3a4e</code> — a ground-up rework of the Jobs monitoring experience.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-was-broken">What Was Broken<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#what-was-broken" class="hash-link" aria-label="Direct link to What Was Broken" title="Direct link to What Was Broken">​</a></h3>
<p>The Jobs page was only surfacing 8 of the system's 13+ tracked job types. Achilles runs, FHIR Sync, Care Gap evaluations, GIS Boundary loads, and Poseidon ETL runs were all dispatching correctly through Horizon and writing to their tracking models — they were just completely invisible in the UI. The <code>JobController::index()</code> method simply never queried them.</p>
<p>The detail drawer had its own problem: the <code>show</code> endpoint was hardcoded to <code>AnalysisExecution</code> route model binding, meaning clicking <em>any</em> non-analysis job (cohort generation, ingestion, DQD, etc.) returned a 404.</p>
<p>Several secondary bugs compounded visibility further:</p>
<ul>
<li>Stale cohort generation jobs appeared under the wrong status filter due to a DB-filter/display-status mismatch</li>
<li>FHIR Export was leaking the raw <code>processing</code> status string instead of normalizing to <code>running</code></li>
<li>N+1 <code>Source::find()</code> calls inside DQD, Heel, and Achilles map loops</li>
<li>SCCS and Evidence Synthesis type filters were returning <em>all</em> analysis types instead of scoping to their specific morph class</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-we-shipped">What We Shipped<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#what-we-shipped" class="hash-link" aria-label="Direct link to What We Shipped" title="Direct link to What We Shipped">​</a></h3>
<p><code>JobController.php</code> gained nearly 1,000 lines across two key changes:</p>
<p><strong>Five new job collectors in <code>index()</code>:</strong></p>
<table><thead><tr><th>Type</th><th>Model</th><th>Scope</th></tr></thead><tbody><tr><td><code>fhir_sync</code></td><td><code>FhirSyncRun</code></td><td>System</td></tr><tr><td><code>care_gap</code></td><td><code>CareGapEvaluation</code></td><td>User</td></tr><tr><td><code>gis_boundary</code></td><td><code>GisDataset</code></td><td>User</td></tr><tr><td><code>poseidon</code></td><td><code>PoseidonRun</code></td><td>System</td></tr></tbody></table>
<p>(The <code>finngen</code> type was removed — it lives in the workbench app, not core job tracking.)</p>
<p><strong>Polymorphic <code>show</code> endpoint:</strong> Route model binding is gone. The new <code>show(Request $request, int $jobId)</code> dispatches through 14 type-specific detail builders via a <code>?type=</code> query param. Each builder returns the standard job envelope plus a <code>details</code> object with type-specific metadata and a <code>timeline</code> array of execution events — giving the rich detail drawer real data to render for every job type in the system.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="etl-performance-drug_era-goes-from-overnight-to-minutes">ETL Performance: <code>drug_era</code> Goes From Overnight to Minutes<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#etl-performance-drug_era-goes-from-overnight-to-minutes" class="hash-link" aria-label="Direct link to etl-performance-drug_era-goes-from-overnight-to-minutes" title="Direct link to etl-performance-drug_era-goes-from-overnight-to-minutes">​</a></h2>
<p>Commit <code>a084b84f6</code> delivered one of the most dramatic performance wins we've had in the ETL layer. The <code>drug_era</code> build step on the 2.3M-patient SynPUF dataset was taking <strong>17 hours</strong>. It now runs in <strong>14 minutes</strong>.</p>
<p>The fix was a two-phase build strategy. Previously the pipeline attempted to compute drug eras in a single monolithic pass, which collapsed under the weight of the dataset's join complexity and row volume. The rewrite splits the work: phase one materializes an intermediate exposure table with appropriate indexes, and phase two performs the era consolidation logic against that pre-built structure. The intermediate materialization pays for itself immediately by giving the query planner something it can actually reason about.</p>
<p>This was preceded by <code>1eb297148</code>, which rewrote the SynPUF enrichment parallelism to eliminate the OOM crashes and deadlocks that were causing enrichment runs to fail intermittently on large datasets. Both fixes together mean the SynPUF 2.3M enrichment pipeline is now stable and fast end-to-end.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cohort-generation-eight-bugs-closed">Cohort Generation: Eight Bugs Closed<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#cohort-generation-eight-bugs-closed" class="hash-link" aria-label="Direct link to Cohort Generation: Eight Bugs Closed" title="Direct link to Cohort Generation: Eight Bugs Closed">​</a></h2>
<p>Commit <code>6b4012262</code> documents a focused audit of the cohort generation pipeline that surfaced and fixed eight bugs spanning three layers:</p>
<ul>
<li><strong>SQL builders</strong> — edge cases in inclusion criteria handling that produced incorrect cohort membership under specific date range configurations</li>
<li><strong>API layer</strong> — response shape inconsistencies that caused the frontend to silently drop data</li>
<li><strong>Frontend</strong> — patient list navigation was routing to malformed profile URLs (<code>9d79ffe37</code>), and breadcrumbs weren't context-aware when entering from the cohort view</li>
</ul>
<p>The risk scores feature also got two targeted fixes: <code>1297db01b</code> corrected the recommend endpoint to return a structured response with the full patient profile attached (it was previously returning a bare score), and <code>6b4012262</code> fixed a <code>useParams</code> / route definition mismatch where the component was reading <code>scoreId</code> but the route defined <code>:id</code>.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="ai-memory--infrastructure-housekeeping">AI Memory &amp; Infrastructure Housekeeping<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#ai-memory--infrastructure-housekeeping" class="hash-link" aria-label="Direct link to AI Memory &amp; Infrastructure Housekeeping" title="Direct link to AI Memory &amp; Infrastructure Housekeeping">​</a></h2>
<p>On the AI side, <code>6035d6d65</code> streamlined the Chroma memory path resolution — a small but meaningful cleanup that removes ambiguity in how the vector store locates its persistence directory across different deployment environments.</p>
<p>Infrastructure received a round of tuning in <code>9f24a2ca2</code>: Docker Compose configuration was tightened, Horizon queue configuration was cleaned up, monitoring alerts were added for key pipeline stages, and the CI workflow was updated to reflect the current test surface.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-04-04#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>With the Jobs page now surfacing the full job graph, the natural follow-on is <strong>real-time status streaming</strong> — replacing the current polling approach with server-sent events so operators get live feedback on long-running ETL and analysis jobs without hammering the API.</p>
<p>The <code>drug_era</code> two-phase build is a pattern worth generalizing. The <code>condition_era</code> and <code>observation_period</code> builders have similar structural characteristics and are candidates for the same treatment.</p>
<p>Cohort generation is in a much cleaner state after today's audit, which unblocks work on <strong>cohort comparison views</strong> — a feature that's been waiting on a stable generation pipeline before we could build on top of it confidently.</p>]]></content:encoded>
            <category>development</category>
            <category>ohdsi</category>
            <category>analytics</category>
            <category>frontend</category>
            <category>backend</category>
            <category>database</category>
            <category>infrastructure</category>
            <category>ai</category>
        </item>
        <item>
            <title><![CDATA[One Million Patient Embeddings: GPU-Accelerated Similarity Search Comes to Parthenon]]></title>
            <link>http://localhost:8082/docs/blog/patient-embeddings-at-scale</link>
            <guid>http://localhost:8082/docs/blog/patient-embeddings-at-scale</guid>
            <pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Two days ago, we shipped the Patient Similarity Engine — a multi-modal system that scores patients across six clinical dimensions on OMOP CDM. The architecture was sound. The algorithms worked. But there was a problem hiding in plain sight: none of our patients had embeddings.]]></description>
            <content:encoded><![CDATA[<p>Two days ago, we <a href="http://localhost:8082/docs/blog/patient-similarity-engine">shipped the Patient Similarity Engine</a> — a multi-modal system that scores patients across six clinical dimensions on OMOP CDM. The architecture was sound. The algorithms worked. But there was a problem hiding in plain sight: none of our patients had embeddings.</p>
<p>The embedding pipeline had been silently failing since day one. Three type mismatches between our PHP backend and Python AI service meant that every embedding request returned a validation error, was caught by a try/catch block, and logged as a warning that nobody read. The feature vectors were all there — conditions, drugs, measurements, procedures — but the 512-dimensional dense vectors that would make similarity search fast at scale? Zero. For every source. For every patient.</p>
<p>Tonight, we fixed all three bugs, refactored the embedding pipeline from CPU-only SapBERT to GPU-accelerated Ollama, upgraded from 512 to 768 dimensions, introduced batch deduplication that delivered a 123x throughput improvement, and generated embeddings for 1,007,007 patients across three CDM sources. This is the story of what broke, what we built, and what it unlocks.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-silent-failure">The Silent Failure<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-silent-failure" class="hash-link" aria-label="Direct link to The Silent Failure" title="Direct link to The Silent Failure">​</a></h2>
<p>The Patient Similarity Engine has two search modes. <strong>Interpretable mode</strong> computes per-dimension Jaccard and z-score similarity in real time — it's explainable but requires loading candidate patients into memory and scoring each one. <strong>Embedding mode</strong> uses pgvector's IVFFlat index for approximate nearest neighbor (ANN) search — sub-second lookups across a million patients, with interpretable scoring applied only to the top candidates.</p>
<p>Embedding mode requires pre-computed dense vectors stored in a <code>vector(N)</code> column on <code>patient_feature_vectors</code>. These vectors are generated by a Laravel queue job (<code>ComputePatientFeatureVectors</code>) that:</p>
<ol>
<li>Extracts clinical features from CDM tables (conditions, drugs, measurements, procedures, genomics)</li>
<li>Stores them as JSONB in the feature vector table</li>
<li>Sends batches to the Python AI service for encoding</li>
<li>Writes the resulting vectors back to pgvector</li>
</ol>
<p>Steps 1 and 2 worked perfectly. Step 3 failed on every single call. Step 4 never executed.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="bug-1-integer-concepts-vs-string-validation">Bug #1: Integer Concepts vs. String Validation<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#bug-1-integer-concepts-vs-string-validation" class="hash-link" aria-label="Direct link to Bug #1: Integer Concepts vs. String Validation" title="Direct link to Bug #1: Integer Concepts vs. String Validation">​</a></h3>
<p>The Pydantic model for the <code>/patient-similarity/embed</code> endpoint declared concept lists as <code>list[str]</code>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">class</span><span class="token plain"> </span><span class="token class-name" style="color:hsl(29, 54%, 61%)">PatientFeatures</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">BaseModel</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    condition_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    drug_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    procedure_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>But OMOP concept IDs are integers. PHP's <code>PatientFeatureVector::toArray()</code> serializes the JSONB <code>condition_concepts</code> column as <code>[4120002, 4045900, 4031047]</code> — a list of integers. FastAPI's Pydantic validation rejected every request with:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token property" style="color:hsl(355, 65%, 65%)">"detail"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token property" style="color:hsl(355, 65%, 65%)">"type"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"string_type"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"loc"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string" style="color:hsl(95, 38%, 62%)">"body"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"condition_concepts"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"msg"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"Input should be a valid string"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"input"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4120002</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The <code>EmbeddingClient</code> caught the 422, logged a warning, and returned an empty array. The job continued to the next batch.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="bug-2-dict-lab-vector-vs-list-validation">Bug #2: Dict Lab Vector vs. List Validation<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#bug-2-dict-lab-vector-vs-list-validation" class="hash-link" aria-label="Direct link to Bug #2: Dict Lab Vector vs. List Validation" title="Direct link to Bug #2: Dict Lab Vector vs. List Validation">​</a></h3>
<p>The <code>lab_vector</code> field was declared as <code>list[float]</code>, but PHP sends a JSONB dictionary mapping concept IDs to z-scores:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token property" style="color:hsl(355, 65%, 65%)">"3025315"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">-0.1184</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"3036277"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">-0.0578</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"3036832"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1.5564</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Same failure pattern — Pydantic rejected the dict, the client swallowed the error.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="bug-3-batch-response-key-mismatch">Bug #3: Batch Response Key Mismatch<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#bug-3-batch-response-key-mismatch" class="hash-link" aria-label="Direct link to Bug #3: Batch Response Key Mismatch" title="Direct link to Bug #3: Batch Response Key Mismatch">​</a></h3>
<p>Even if the first two bugs hadn't existed, the batch endpoint wouldn't have worked. The Python API returns:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token property" style="color:hsl(355, 65%, 65%)">"embeddings"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token property" style="color:hsl(355, 65%, 65%)">"person_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">142763</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"embedding"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">...</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token property" style="color:hsl(355, 65%, 65%)">"dimension"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">768</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> ...</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>But <code>EmbeddingClient::embedBatch()</code> iterated the response as a flat array:</p>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">foreach</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token variable" style="color:hsl(207, 82%, 66%)">$embeddings</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">as</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$pid</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$embedding</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// $pid = 0, 1, 2... (array indices, not person IDs)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// $embedding = {"person_id": 142763, ...} (object, not float array)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The UPDATE would have tried to write a JSON object as a pgvector value with index 0 as the person_id.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-compound-effect">The Compound Effect<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-compound-effect" class="hash-link" aria-label="Direct link to The Compound Effect" title="Direct link to The Compound Effect">​</a></h3>
<p>Three bugs, three layers, one outcome: zero embeddings for any patient in any source. The job always "succeeded" — it just never wrote any vectors. The interpretable search mode worked fine as a fallback, masking the problem entirely. We only discovered it because we asked a simple question: <em>why does the <code>embedding</code> column show NULL for every row?</em></p>
<p><strong>Lesson learned:</strong> Silent failure in pipeline stages is worse than a crash. The <code>EmbeddingClient</code> should have thrown on non-200 responses, or at minimum, the job should have asserted that <code>count($embeddings) &gt; 0</code> per batch.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fixes">The Fixes<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-fixes" class="hash-link" aria-label="Direct link to The Fixes" title="Direct link to The Fixes">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="type-coercion-at-the-api-boundary">Type Coercion at the API Boundary<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#type-coercion-at-the-api-boundary" class="hash-link" aria-label="Direct link to Type Coercion at the API Boundary" title="Direct link to Type Coercion at the API Boundary">​</a></h3>
<p>The Pydantic model now accepts what PHP actually sends:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">class</span><span class="token plain"> </span><span class="token class-name" style="color:hsl(29, 54%, 61%)">PatientFeatures</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">BaseModel</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    condition_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">int</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    lab_vector</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">float</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">dict</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">float</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    drug_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">int</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    procedure_concepts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">int</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    variant_genes</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">dict</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">default_factory</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The <code>variant_genes</code> field deserves special mention. In the OMOP genomics extension, variant data is stored as <code>[{"gene": "KRAS", "pathogenicity": "Pathogenic"}, ...]</code>. The embedding service now extracts the <code>gene</code> field from dict entries and passes it to the encoder:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">variant_genes </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    g</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string" style="color:hsl(95, 38%, 62%)">"gene"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">if</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">isinstance</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">g</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">dict</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">else</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">g</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> g </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">in</span><span class="token plain"> raw_genes</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This means patients with genomic data — like those in the Pancreatic Cancer Corpus with KRAS, BRCA1, and TP53 variants — get genomics-aware embeddings.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="lab-vector-dict-handling">Lab Vector Dict Handling<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#lab-vector-dict-handling" class="hash-link" aria-label="Direct link to Lab Vector Dict Handling" title="Direct link to Lab Vector Dict Handling">​</a></h3>
<p>The <code>_encode_measurements</code> function now accepts both forms:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">if</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">isinstance</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">lab_vector</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">dict</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    values </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">lab_vector</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">values</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain">  </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Extract z-scores from {concept_id: zscore}</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">else</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    values </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> lab_vector  </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Already a list of floats</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The z-scores are clipped to [-5, 5] and normalized to [-1, 1], then packed into the 96-dimensional measurements slice of the patient vector.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="batch-response-re-keying">Batch Response Re-keying<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#batch-response-re-keying" class="hash-link" aria-label="Direct link to Batch Response Re-keying" title="Direct link to Batch Response Re-keying">​</a></h3>
<p>The PHP <code>EmbeddingClient::embedBatch()</code> now properly maps the response:</p>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token variable" style="color:hsl(207, 82%, 66%)">$raw</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$response</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-&gt;</span><span class="token function" style="color:hsl(207, 82%, 66%)">json</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'embeddings'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token variable" style="color:hsl(207, 82%, 66%)">$result</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">foreach</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token variable" style="color:hsl(207, 82%, 66%)">$raw</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">as</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$entry</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">if</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token keyword" style="color:hsl(286, 60%, 67%)">isset</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token variable" style="color:hsl(207, 82%, 66%)">$entry</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'person_id'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$entry</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'embedding'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$result</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token keyword type-casting" style="color:hsl(286, 60%, 67%)">int</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$entry</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'person_id'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$entry</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'embedding'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$result</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The caller gets a <code>person_id =&gt; float[]</code> map instead of an indexed array, and the UPDATE statement writes the correct vector to the correct patient.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="from-cpu-to-gpu-the-ollama-migration">From CPU to GPU: The Ollama Migration<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#from-cpu-to-gpu-the-ollama-migration" class="hash-link" aria-label="Direct link to From CPU to GPU: The Ollama Migration" title="Direct link to From CPU to GPU: The Ollama Migration">​</a></h2>
<p>With the type bugs fixed, embeddings started generating — but slowly. The original pipeline used <strong>SapBERT</strong> (cambridgeltl/SapBERT-from-PubMedBERT-fulltext), a 110M-parameter biomedical language model running on CPU inside the Docker container. SapBERT is an excellent model for clinical concept encoding, but CPU inference is not how you want to embed a million patients.</p>
<p>Meanwhile, Ollama was already running on the host machine with full GPU access, serving MedGemma for Abby's conversational AI. Three embedding models were loaded and ready:</p>
<table><thead><tr><th>Model</th><th>Parameters</th><th>Dimension</th><th>Use Case</th></tr></thead><tbody><tr><td><code>nomic-embed-text</code></td><td>137M</td><td>768</td><td>General-purpose embedding, fast</td></tr><tr><td><code>embeddinggemma:300m</code></td><td>300M</td><td>768</td><td>Google's embedding model</td></tr><tr><td><code>text-embedding-3-large</code></td><td>—</td><td>768</td><td>OpenAI-compatible embedding</td></tr></tbody></table>
<p>All three produce 768-dimensional embeddings, matching SapBERT's native output dimension. We chose <code>nomic-embed-text</code> for its speed: 27 concepts/second in batch mode, with the GPU doing the heavy lifting.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-embedding-service-refactor">The Embedding Service Refactor<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-embedding-service-refactor" class="hash-link" aria-label="Direct link to The Embedding Service Refactor" title="Direct link to The Embedding Service Refactor">​</a></h3>
<p>The <code>sapbert.py</code> service was refactored to try Ollama first, falling back to CPU-based SapBERT only if Ollama is unavailable:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">class</span><span class="token plain"> </span><span class="token class-name" style="color:hsl(29, 54%, 61%)">OllamaEmbeddingService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:hsl(95, 38%, 62%)">"""GPU-accelerated embedding via Ollama's /api/embed endpoint."""</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">def</span><span class="token plain"> </span><span class="token function" style="color:hsl(207, 82%, 66%)">encode</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">self</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> texts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token operator" style="color:hsl(207, 82%, 66%)">&gt;</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">float</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        resp </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> httpx</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            </span><span class="token string-interpolation string" style="color:hsl(95, 38%, 62%)">f"</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token string-interpolation interpolation">self</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token string-interpolation interpolation">_base_url</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token string-interpolation string" style="color:hsl(95, 38%, 62%)">/api/embed"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            json</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token string" style="color:hsl(95, 38%, 62%)">"model"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> self</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">_model</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"input"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> texts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            timeout</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token number" style="color:hsl(29, 54%, 61%)">60.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        resp</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">raise_for_status</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> resp</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string" style="color:hsl(95, 38%, 62%)">"embeddings"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">def</span><span class="token plain"> </span><span class="token function" style="color:hsl(207, 82%, 66%)">get_sapbert_service</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token operator" style="color:hsl(207, 82%, 66%)">&gt;</span><span class="token plain"> OllamaEmbeddingService </span><span class="token operator" style="color:hsl(207, 82%, 66%)">|</span><span class="token plain"> SapBERTService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:hsl(95, 38%, 62%)">"""Return the best available embedding service."""</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">if</span><span class="token plain"> _ollama_service</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">is_available</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> _ollama_service</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> _sapbert_service  </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># CPU fallback</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>On startup, the service probes Ollama with a single test embedding. If it responds, all subsequent calls go to the GPU. If Ollama is down, the service falls back to loading the SapBERT model into CPU memory — slower, but functional. The interface is identical: both have an <code>.encode(texts: list[str]) -&gt; list[list[float]]</code> method.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="768-dimensions-the-full-encoder-width">768 Dimensions: The Full Encoder Width<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#768-dimensions-the-full-encoder-width" class="hash-link" aria-label="Direct link to 768 Dimensions: The Full Encoder Width" title="Direct link to 768 Dimensions: The Full Encoder Width">​</a></h2>
<p>The original design used 512-dimensional patient embeddings, partitioning the vector into six slices that truncated the encoder's 768-dim output:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">Old (512-dim):</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[0-32]:   Demographics (32)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[32-160]: Conditions (128)   ← truncated from 768</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[160-224]: Measurements (64)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[224-352]: Drugs (128)       ← truncated from 768</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[352-448]: Procedures (96)   ← truncated from 768</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[448-512]: Genomics (64)     ← truncated from 768</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Truncation discards information. The SapBERT and Ollama encoders pack semantic meaning across all 768 dimensions, and lopping off the tail loses the long-range feature interactions that distinguish similar-but-different concepts.</p>
<p>With the move to Ollama, we expanded to 768 dimensions — the encoder's native width:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">New (768-dim):</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[0-32]:    Demographics (32)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[32-224]:  Conditions (192)   ← 50% more capacity</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[224-320]: Measurements (96)  ← 50% more capacity</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[320-512]: Drugs (192)        ← 50% more capacity</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[512-672]: Procedures (160)   ← 67% more capacity</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">[672-768]: Genomics (96)      ← 50% more capacity</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The pgvector column was altered from <code>vector(512)</code> to <code>vector(768)</code>, the IVFFlat index was rebuilt, and the migration file was updated to reflect the new dimension for fresh installs.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="demographics-and-measurements-numeric-not-encoded">Demographics and Measurements: Numeric, Not Encoded<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#demographics-and-measurements-numeric-not-encoded" class="hash-link" aria-label="Direct link to Demographics and Measurements: Numeric, Not Encoded" title="Direct link to Demographics and Measurements: Numeric, Not Encoded">​</a></h3>
<p>Two of the six dimensions don't use the language model at all:</p>
<p><strong>Demographics (32 dims):</strong> Age is normalized (<code>age_bucket / 20</code>), gender is encoded as +1 (male) / -1 (female) / 0 (unknown), and race uses one-hot encoding in dims 2-31 mapped to OMOP race concept IDs (8516=Black, 8527=White, 8515=Asian, etc.). This is simple, deterministic, and doesn't need a language model.</p>
<p><strong>Measurements (96 dims):</strong> Lab z-scores are clipped to [-5, 5], normalized to [-1, 1], and packed directly into the vector. The z-scores come from population-level statistics computed per source: for each measurement concept, we compute mean and standard deviation across all patients, then express each patient's value as a distance from the population mean. A hemoglobin of 7.2 g/dL means different things depending on whether the population average is 8.5 (critical) or 14.0 (severely anemic).</p>
<p>The remaining four dimensions — conditions, drugs, procedures, and genomics — are encoded through Ollama. OMOP concept IDs (integers) are passed as text strings to the embedding model, which maps them into dense semantic space. Related concepts cluster together: metformin and insulin share neighborhood structure; KRAS and TP53 occupy nearby regions of the genomics subspace. Mean pooling across all concepts in a dimension produces a single representative vector for that clinical aspect of the patient.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-123x-speedup-batch-deduplication">The 123x Speedup: Batch Deduplication<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-123x-speedup-batch-deduplication" class="hash-link" aria-label="Direct link to The 123x Speedup: Batch Deduplication" title="Direct link to The 123x Speedup: Batch Deduplication">​</a></h2>
<p>The original pipeline processed patients one at a time. Each patient triggered four Ollama calls — one per encoded dimension (conditions, drugs, procedures, genomics). For a batch of 200 patients, that's 800 Ollama calls.</p>
<p>But patients share concepts. In a cancer registry, most patients have the same core ICD-10 codes, the same standard-of-care medications, the same diagnostic procedures. A batch of 200 patients might reference 15 unique condition concepts, not 200 × 10 = 2,000.</p>
<p>The batch-optimized path exploits this:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">def</span><span class="token plain"> </span><span class="token function" style="color:hsl(207, 82%, 66%)">compute_patient_embeddings_batch</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">patients</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">dict</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token operator" style="color:hsl(207, 82%, 66%)">&gt;</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">list</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token builtin" style="color:hsl(95, 38%, 62%)">float</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:hsl(95, 38%, 62%)">"""4 Ollama calls per batch, not 4 per patient."""</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> slc </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">in</span><span class="token plain"> dim_configs</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Collect ALL unique concepts across ALL patients</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        all_unique_texts </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> deduplicate</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">patients</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> field</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># ONE encoding call for all unique concepts</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        all_embeddings </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> svc</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">encode</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">all_unique_texts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Mean-pool per patient using shared lookup</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> i</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> patient </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">in</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">enumerate</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">patients</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            indices </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">text_to_idx</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">t</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> t </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">in</span><span class="token plain"> patient_texts</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            embeddings</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">i</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> slc</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> all_embeddings</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">indices</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">axis</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token number" style="color:hsl(29, 54%, 61%)">0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Instead of 800 Ollama calls for 200 patients, we make 4 calls (one per dimension) with 15-50 unique texts each. The encoding is done once; the per-patient work is just numpy indexing and mean pooling.</p>
<p><strong>Benchmark results:</strong></p>
<table><thead><tr><th>Approach</th><th>Batch of 200</th><th>Rate</th><th>Ollama Calls</th></tr></thead><tbody><tr><td>Per-patient (old)</td><td>14.1s</td><td>14 patients/sec</td><td>800</td></tr><tr><td>Batch dedup (new)</td><td>0.1s</td><td>1,743 patients/sec</td><td>4</td></tr><tr><td><strong>Speedup</strong></td><td><strong>123x</strong></td><td></td><td><strong>200x fewer calls</strong></td></tr></tbody></table>
<p>The actual throughput for the full Acumenus CDM run settled at ~130 patients/sec sustained — lower than the benchmark because real patient data has more concept diversity than synthetic test data, and the database UPDATE operations add I/O overhead. But 130/sec on a million patients is still roughly 2 hours, compared to the ~18 hours the per-patient approach would have taken.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-production-run-1007007-patients">The Production Run: 1,007,007 Patients<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#the-production-run-1007007-patients" class="hash-link" aria-label="Direct link to The Production Run: 1,007,007 Patients" title="Direct link to The Production Run: 1,007,007 Patients">​</a></h2>
<p>With all fixes in place, we generated embeddings for three CDM sources:</p>
<table><thead><tr><th>Source</th><th>Patients</th><th>Time</th><th>Rate</th><th>Notes</th></tr></thead><tbody><tr><td>IRSF Natural History Study</td><td>1,858</td><td>22s</td><td>84/sec</td><td>Rare disease cohort</td></tr><tr><td>Pancreatic Cancer Corpus</td><td>361</td><td>4s</td><td>90/sec</td><td>Multimodal cancer registry</td></tr><tr><td>OHDSI Acumenus CDM</td><td>1,005,788</td><td>~2 hours</td><td>130/sec</td><td>Full clinical data warehouse</td></tr><tr><td><strong>Total</strong></td><td><strong>1,007,007</strong></td><td></td><td></td><td></td></tr></tbody></table>
<p>The IRSF and Pancreas sources completed in under 30 seconds each. The Acumenus CDM required multiple runs due to PHP's process limits — <code>artisan tinker</code> chunks stop after ~500 iterations regardless of timeout settings. We ran the embedding loop five times, each picking up where the previous left off via <code>whereNull('embedding')</code>.</p>
<p>One patient — person_id 1005788 — required special handling. With 51 condition concepts, 12 procedures, and genomic variants (KRAS pathogenic), the full payload triggered a timeout in the batch endpoint. We embedded him individually with his complete clinical profile, ensuring his KRAS variant was encoded in the genomics dimension alongside his full comorbidity burden.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="data-richness-across-sources">Data Richness Across Sources<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#data-richness-across-sources" class="hash-link" aria-label="Direct link to Data Richness Across Sources" title="Direct link to Data Richness Across Sources">​</a></h3>
<p>The feature vectors capture meaningfully different clinical profiles across sources:</p>
<table><thead><tr><th>Source</th><th>Avg Conditions</th><th>Avg Drugs</th><th>Avg Labs</th><th>Has Genomics</th></tr></thead><tbody><tr><td>IRSF</td><td>3-10</td><td>5-15</td><td>18-22 z-scores</td><td>No</td></tr><tr><td>Pancreas</td><td>5-51</td><td>3-8</td><td>5-11 z-scores</td><td>Yes (KRAS, BRCA1, TP53)</td></tr><tr><td>Acumenus</td><td>0-50+</td><td>0-30+</td><td>0-50 z-scores</td><td>Selected patients</td></tr></tbody></table>
<p>The Pancreatic Cancer Corpus is the richest per patient — small cohort, deep phenotyping, genomic annotation. IRSF has consistent depth across a rare disease population. Acumenus is the long tail: a million patients with highly variable data completeness, from single-visit records to decades of longitudinal care.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-this-enables">What This Enables<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#what-this-enables" class="hash-link" aria-label="Direct link to What This Enables" title="Direct link to What This Enables">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="sub-second-similar-patient-search">Sub-Second Similar Patient Search<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#sub-second-similar-patient-search" class="hash-link" aria-label="Direct link to Sub-Second Similar Patient Search" title="Direct link to Sub-Second Similar Patient Search">​</a></h3>
<p>Before embeddings, similarity search for a patient in the Acumenus CDM required loading all 1M candidates into memory (impractical) or SQL-based pre-screening with PHP re-scoring. With pgvector's IVFFlat index, finding the 20 most similar patients is a single cosine distance query:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> person_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> embedding </span><span class="token operator" style="color:hsl(207, 82%, 66%)">&lt;=&gt;</span><span class="token plain"> $</span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">AS</span><span class="token plain"> distance</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> patient_feature_vectors</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> source_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">47</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">BY</span><span class="token plain"> embedding </span><span class="token operator" style="color:hsl(207, 82%, 66%)">&lt;=&gt;</span><span class="token plain"> $</span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">LIMIT</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">20</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This returns in milliseconds. The interpretable scoring (per-dimension Jaccard, z-score comparison) is then applied only to these 20 candidates, giving the user both fast results and explainable scores.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="cross-source-phenotypic-matching">Cross-Source Phenotypic Matching<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#cross-source-phenotypic-matching" class="hash-link" aria-label="Direct link to Cross-Source Phenotypic Matching" title="Direct link to Cross-Source Phenotypic Matching">​</a></h3>
<p>All three sources share the same 768-dimensional embedding space with the same encoding model. A clinician studying a rare disease patient in IRSF can ask: "are there any patients in the million-patient Acumenus CDM who look like this?" The vector search doesn't care about source boundaries — it finds the nearest neighbors across the entire embedding space.</p>
<p>This is especially powerful for rare disease research, where individual institutions may have only a handful of cases. Cross-source similarity expands the searchable population from hundreds to millions.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="cohort-discovery-via-centroid-search">Cohort Discovery via Centroid Search<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#cohort-discovery-via-centroid-search" class="hash-link" aria-label="Direct link to Cohort Discovery via Centroid Search" title="Direct link to Cohort Discovery via Centroid Search">​</a></h3>
<p>The <code>search-from-cohort</code> endpoint computes the centroid (average embedding) of a defined cohort and finds individual patients nearest to it. Define a cohort of 50 confirmed cases, compute their centroid, and discover 500 more patients with similar clinical profiles who weren't captured by the original inclusion criteria. This is phenotype-driven cohort expansion — the computational equivalent of a clinician saying "find me more patients like these."</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="embedding-powered-analytics">Embedding-Powered Analytics<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#embedding-powered-analytics" class="hash-link" aria-label="Direct link to Embedding-Powered Analytics" title="Direct link to Embedding-Powered Analytics">​</a></h3>
<p>With every patient represented as a point in vector space, standard machine learning techniques become applicable:</p>
<ul>
<li>
<p><strong>Clustering:</strong> K-means or HDBSCAN on patient embeddings reveals natural phenotypic subgroups without pre-specifying features. A cluster analysis of the Pancreatic Cancer Corpus might reveal subtypes that correlate with survival — not from genomics alone, but from the full clinical picture.</p>
</li>
<li>
<p><strong>Outlier Detection:</strong> Patients far from any cluster centroid may represent rare phenotypes, coding errors, or unusual disease presentations. In a quality improvement context, outliers in a supposedly homogeneous cohort warrant chart review.</p>
</li>
<li>
<p><strong>Temporal Trajectories:</strong> Re-embedding patients at different time windows (diagnosis, 6 months, 12 months) traces how their clinical profile evolves. Patients whose trajectories diverge despite similar starting points are natural candidates for outcome analysis.</p>
</li>
<li>
<p><strong>Treatment Response Similarity:</strong> Find patients who looked similar pre-treatment, then compare outcomes. This is observational causal inference bootstrapped by embedding similarity — less rigorous than propensity score matching, but vastly more scalable.</p>
</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="genomics-aware-similarity">Genomics-Aware Similarity<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#genomics-aware-similarity" class="hash-link" aria-label="Direct link to Genomics-Aware Similarity" title="Direct link to Genomics-Aware Similarity">​</a></h3>
<p>Patients with genomic data get embeddings that encode molecular profiles alongside clinical features. The 96-dimensional genomics slice captures gene-level similarity through the language model's understanding of gene names and their relationships. KRAS and NRAS cluster together; BRCA1 and BRCA2 share embedding structure.</p>
<p>This makes the similarity engine directly useful for molecular tumor board workflows: given an index patient with a pathogenic KRAS variant, find clinically similar patients who also carry RAS pathway mutations — even if they have different specific variants.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="foundation-for-federated-similarity">Foundation for Federated Similarity<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#foundation-for-federated-similarity" class="hash-link" aria-label="Direct link to Foundation for Federated Similarity" title="Direct link to Foundation for Federated Similarity">​</a></h3>
<p>Patient embeddings are privacy-preserving representations. A 768-dimensional vector does not contain raw clinical data — you cannot reconstruct a patient's medication list or lab values from their embedding. This makes embeddings suitable for sharing across institutional boundaries in a federated learning network.</p>
<p>In the Hive Networks architecture, participating sites could share patient embeddings without sharing PHI. A query like "find patients similar to this one across the network" becomes a vector search across sites — each site returns only the embedding distances, never the underlying data. The requesting site gets a ranked list of similar patients by site, enabling multi-institutional rare disease research without a central data repository.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="architecture-the-final-pipeline">Architecture: The Final Pipeline<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#architecture-the-final-pipeline" class="hash-link" aria-label="Direct link to Architecture: The Final Pipeline" title="Direct link to Architecture: The Final Pipeline">​</a></h2>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">CDM Tables (person, condition_occurrence, drug_exposure,</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">            measurement, procedure_occurrence, genomic_variant)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">SimilarityFeatureExtractor (Laravel)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Demographics: age_bucket, gender, race</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Conditions: ancestor-rolled concept IDs (3 levels)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Measurements: z-score normalized lab values</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Drugs: ingredient-level concept IDs</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Procedures: procedure concept IDs</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    └── Genomics: gene names with pathogenicity tier</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">patient_feature_vectors (PostgreSQL, app schema)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── source_id, person_id</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── condition_concepts (JSONB)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── lab_vector (JSONB: {concept_id: z_score, ...})</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── drug_concepts (JSONB)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── procedure_concepts (JSONB)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── variant_genes (JSONB: [{gene, pathogenicity}, ...])</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    └── embedding (pgvector, vector(768))</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">EmbeddingClient (Laravel → Python AI Service)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">OllamaEmbeddingService (GPU, nomic-embed-text, 768-dim)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Batch deduplication: 4 calls per batch, not per patient</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Per-dimension encoding:</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   ├── Conditions → 192 dims (SapBERT-pooled)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   ├── Drugs → 192 dims (SapBERT-pooled)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   ├── Procedures → 160 dims (SapBERT-pooled)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   └── Genomics → 96 dims (SapBERT-pooled)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Direct encoding (no LM):</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   ├── Demographics → 32 dims (numeric)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │   └── Measurements → 96 dims (z-scores)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    └── L2 normalization → unit vector</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">pgvector IVFFlat Index (cosine distance, 100 lists)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ▼</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">PatientSimilarityService.search()</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    ├── Embedding mode: ANN search → top K → interpretable re-score</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    └── Interpretable mode: full dimension-wise scoring (fallback)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="fallback-guarantees">Fallback Guarantees<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#fallback-guarantees" class="hash-link" aria-label="Direct link to Fallback Guarantees" title="Direct link to Fallback Guarantees">​</a></h3>
<p>The system degrades gracefully at every layer:</p>
<ul>
<li><strong>Ollama down?</strong> Falls back to CPU-based SapBERT. Slower, but produces identical-dimension embeddings.</li>
<li><strong>No embeddings computed?</strong> Falls back to interpretable-only search. No ANN, but full scoring across all dimensions.</li>
<li><strong>Source too small for IVFFlat?</strong> (&lt; 100 patients) Skips index creation; pgvector does exact scan.</li>
<li><strong>Patient missing a dimension?</strong> Zero-padded in the embedding; interpretable scoring skips that dimension and re-weights the others.</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="performance-characteristics">Performance Characteristics<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#performance-characteristics" class="hash-link" aria-label="Direct link to Performance Characteristics" title="Direct link to Performance Characteristics">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="embedding-generation">Embedding Generation<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#embedding-generation" class="hash-link" aria-label="Direct link to Embedding Generation" title="Direct link to Embedding Generation">​</a></h3>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Encoding backend</td><td>Ollama (nomic-embed-text) on GPU</td></tr><tr><td>Embedding dimension</td><td>768</td></tr><tr><td>Batch size (PHP → Python)</td><td>500 patients</td></tr><tr><td>Batch dedup calls per batch</td><td>4 (one per encoded dimension)</td></tr><tr><td>Sustained throughput</td><td>~130 patients/sec</td></tr><tr><td>Time for 1M patients</td><td>~2 hours</td></tr><tr><td>Peak GPU utilization</td><td>~40% (Ollama, batch encoding)</td></tr><tr><td>Peak DB write throughput</td><td>~500 UPDATEs/sec (CASE/WHEN batch)</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="similarity-search-embedding-mode">Similarity Search (Embedding Mode)<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#similarity-search-embedding-mode" class="hash-link" aria-label="Direct link to Similarity Search (Embedding Mode)" title="Direct link to Similarity Search (Embedding Mode)">​</a></h3>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Index type</td><td>IVFFlat (100 lists)</td></tr><tr><td>Distance metric</td><td>Cosine</td></tr><tr><td>ANN candidates</td><td>20 (configurable)</td></tr><tr><td>Search latency (1M patients)</td><td>&lt; 50ms</td></tr><tr><td>Interpretable re-scoring</td><td>~5ms per candidate</td></tr><tr><td>Total search time</td><td>&lt; 150ms</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="storage">Storage<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#storage" class="hash-link" aria-label="Direct link to Storage" title="Direct link to Storage">​</a></h3>
<table><thead><tr><th>Source</th><th>Rows</th><th>Embedding Storage</th><th>Total Table Size</th></tr></thead><tbody><tr><td>Acumenus</td><td>1,005,788</td><td>~5.8 GB (768 × float32 × 1M)</td><td>~8.2 GB</td></tr><tr><td>IRSF</td><td>1,858</td><td>~5.4 MB</td><td>~12 MB</td></tr><tr><td>Pancreas</td><td>361</td><td>~1.1 MB</td><td>~3.5 MB</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="lessons-learned">Lessons Learned<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#lessons-learned" class="hash-link" aria-label="Direct link to Lessons Learned" title="Direct link to Lessons Learned">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-silent-failures-are-architecture-bugs">1. Silent Failures Are Architecture Bugs<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#1-silent-failures-are-architecture-bugs" class="hash-link" aria-label="Direct link to 1. Silent Failures Are Architecture Bugs" title="Direct link to 1. Silent Failures Are Architecture Bugs">​</a></h3>
<p>The embedding pipeline "worked" for weeks without generating a single embedding. The <code>EmbeddingClient</code> caught exceptions and returned empty arrays. The job logged warnings that scrolled past in a sea of other output. The search engine fell back to interpretable mode without complaint.</p>
<p>Every pipeline stage should either succeed visibly or fail loudly. A try/catch that returns a default value without raising an alert is not error handling — it's evidence suppression.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="2-validate-at-the-seam">2. Validate at the Seam<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#2-validate-at-the-seam" class="hash-link" aria-label="Direct link to 2. Validate at the Seam" title="Direct link to 2. Validate at the Seam">​</a></h3>
<p>The type mismatch between PHP (integers) and Python (strings) lived at the service boundary — the HTTP API between Laravel and FastAPI. Both sides were internally correct: PHP correctly serialized OMOP concept IDs as integers; Python correctly expected concept identifiers as strings. Neither side was wrong in isolation.</p>
<p>Service boundaries need explicit contracts. The Pydantic model should have been generated from or validated against the PHP serialization format. In a multi-language architecture, the API schema is the source of truth — not either implementation.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="3-deduplication-beats-parallelism">3. Deduplication Beats Parallelism<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#3-deduplication-beats-parallelism" class="hash-link" aria-label="Direct link to 3. Deduplication Beats Parallelism" title="Direct link to 3. Deduplication Beats Parallelism">​</a></h3>
<p>Our first instinct for performance was to increase batch sizes and add worker parallelism. The 123x speedup came instead from observing that patients share concepts. In a batch of 200 oncology patients, there might be 15 unique condition concepts. Encoding 15 texts once is faster than encoding 2,000 texts (even on a GPU) because the bottleneck is Ollama's tokenization and inference, not the network call.</p>
<p>The general principle: before parallelizing work, check if the work is redundant. Deduplication is free; parallelism has coordination costs.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="4-tinker-has-a-hidden-iteration-limit">4. Tinker Has a Hidden Iteration Limit<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#4-tinker-has-a-hidden-iteration-limit" class="hash-link" aria-label="Direct link to 4. Tinker Has a Hidden Iteration Limit" title="Direct link to 4. Tinker Has a Hidden Iteration Limit">​</a></h3>
<p>PHP's <code>artisan tinker</code> (PsySH) silently stops <code>chunk()</code> iteration after approximately 500 calls, regardless of <code>max_execution_time</code> settings. For bulk operations over large datasets, use a proper artisan command or a raw PHP script — not an interactive REPL with undocumented safety limits.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="5-one-patient-can-break-the-pipeline">5. One Patient Can Break the Pipeline<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#5-one-patient-can-break-the-pipeline" class="hash-link" aria-label="Direct link to 5. One Patient Can Break the Pipeline" title="Direct link to 5. One Patient Can Break the Pipeline">​</a></h3>
<p>Patient 1005788 — with 51 conditions, 12 procedures, and KRAS/BRCA genomic variants — was the single holdout in a million-patient run. The <code>variant_genes</code> field stored as <code>[{"gene": "KRAS", "pathogenicity": "Pathogenic"}]</code> didn't match the <code>list[str]</code> Pydantic type. One patient, one type mismatch, one silent failure.</p>
<p>Robust pipelines handle edge cases in the data model, not in the exception handler. The fix wasn't a special case for patient 1005788 — it was accepting the actual data shape in the Pydantic model and converting dicts to gene names in the encoder.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/patient-embeddings-at-scale#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The embedding infrastructure is now production-ready. The immediate roadmap:</p>
<ol>
<li>
<p><strong>IVFFlat Index Rebuild:</strong> Create the index on the full 1M-row table with tuned <code>lists</code> parameter for optimal recall/speed tradeoff.</p>
</li>
<li>
<p><strong>Embedding-Mode Search in the UI:</strong> The frontend currently defaults to interpretable mode. With embeddings available, the search controller can route to ANN search for large sources (&gt; 5,000 patients) and fall back to interpretable for small ones.</p>
</li>
<li>
<p><strong>Cohort Centroid Visualization:</strong> Display the centroid embedding of a cohort as a radar chart across the six dimensions, showing where the cohort's "center of mass" lies in clinical space.</p>
</li>
<li>
<p><strong>Incremental Embedding Updates:</strong> New patients added through ETL should trigger embedding generation without reprocessing the entire source. The <code>whereNull('embedding')</code> pattern already supports this — we just need to hook it into the ingestion pipeline.</p>
</li>
<li>
<p><strong>SynPUF and Eunomia:</strong> Two remaining sources (2.3M CMS SynPUF patients and 2.7K Eunomia demo patients) need feature extraction and embedding. SynPUF at 2.3M patients will take approximately 5 hours at current throughput.</p>
</li>
<li>
<p><strong>Federated Embedding Exchange:</strong> Design the protocol for sharing patient embeddings across Hive Network sites — embedding format, distance computation, privacy guarantees, and consent models.</p>
</li>
</ol>
<p>The Patient Similarity Engine now has its index. A million patients, each reduced to 768 numbers that capture their demographics, conditions, labs, medications, procedures, and genomic variants. The question "which patients are most like this one?" is no longer a research project. It's a query.</p>]]></content:encoded>
            <category>patient-similarity</category>
            <category>embeddings</category>
            <category>pgvector</category>
            <category>ollama</category>
            <category>gpu</category>
            <category>omop</category>
            <category>architecture</category>
            <category>ai</category>
            <category>precision-medicine</category>
            <category>performance</category>
        </item>
        <item>
            <title><![CDATA[Patients Like Mine: Building a Multi-Modal Patient Similarity Engine on OMOP CDM]]></title>
            <link>http://localhost:8082/docs/blog/patient-similarity-engine</link>
            <guid>http://localhost:8082/docs/blog/patient-similarity-engine</guid>
            <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[For twenty years, the question "which patients are most like this one?" has haunted clinical informatics. Molecular tumor boards want to know: of the 300 patients in our pancreatic cancer corpus, which ones had the same pathogenic variants, the same comorbidity profile, the same treatment history — and what happened to them? Population health researchers want to seed cohort definitions not from abstract inclusion criteria but from a concrete index patient. And every clinician who has ever stared at a complex case has wished for a button that says show me others like this.]]></description>
            <content:encoded><![CDATA[<p>For twenty years, the question "which patients are most like this one?" has haunted clinical informatics. Molecular tumor boards want to know: of the 300 patients in our pancreatic cancer corpus, which ones had the same pathogenic variants, the same comorbidity profile, the same treatment history — and what happened to them? Population health researchers want to seed cohort definitions not from abstract inclusion criteria but from a concrete index patient. And every clinician who has ever stared at a complex case has wished for a button that says <em>show me others like this</em>.</p>
<p>Today, Parthenon ships that button. The Patient Similarity Engine is a multi-modal matching system that scores patients across six clinical dimensions — demographics, conditions, measurements, drugs, procedures, and genomic variants — with user-adjustable weights, dual algorithmic modes, bidirectional cohort integration, and tiered privacy controls. It works across any OMOP CDM source in the platform, from the 361-patient Pancreatic Cancer Corpus to the million-patient Acumenus CDM.</p>
<p>This post tells the story of why it was needed, what we studied before building it, how it works under the hood, and what we learned along the way.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-gap-from-genomic-identity-to-clinical-phenotype">The Gap: From Genomic Identity to Clinical Phenotype<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-gap-from-genomic-identity-to-clinical-phenotype" class="hash-link" aria-label="Direct link to The Gap: From Genomic Identity to Clinical Phenotype" title="Direct link to The Gap: From Genomic Identity to Clinical Phenotype">​</a></h2>
<p>Parthenon already had a form of patient similarity. The Molecular Tumor Board (<code>TumorBoardService</code>) could find patients sharing pathogenic or likely-pathogenic variants in the same gene. If your index patient carried a BRCA1 p.C61G variant classified as pathogenic by ClinVar, the tumor board would surface every other patient in the corpus with a pathogenic BRCA1 variant, compute median survival, and tally drug exposure patterns among those matches.</p>
<p>It was useful. It was also binary. You either shared a pathogenic variant or you didn't. There was no notion of <em>degree</em> of similarity, no consideration of clinical phenotype, no way to ask: "this 62-year-old woman with pancreatic adenocarcinoma, Type 2 diabetes, and BRCA1 — who <em>else</em> in our data looks like her, not just genomically, but clinically?"</p>
<p>The gap matters because clinical decisions are rarely made on genomics alone. Two patients with identical BRCA1 mutations but different comorbidity burdens, different lab profiles, and different treatment histories will have vastly different expected outcomes. Precision medicine requires precision <em>context</em> — and that context spans every clinical dimension in the OMOP CDM.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="landscape-research-what-exists-and-what-doesnt">Landscape Research: What Exists and What Doesn't<a href="http://localhost:8082/docs/blog/patient-similarity-engine#landscape-research-what-exists-and-what-doesnt" class="hash-link" aria-label="Direct link to Landscape Research: What Exists and What Doesn't" title="Direct link to Landscape Research: What Exists and What Doesn't">​</a></h2>
<p>Before writing a single line of code, we studied the landscape. What we found was a fragmented ecosystem where no single system solved the complete problem on OMOP CDM.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-oracle-approach-weighted-pagerank">The Oracle Approach: Weighted PageRank<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-oracle-approach-weighted-pagerank" class="hash-link" aria-label="Direct link to The Oracle Approach: Weighted PageRank" title="Direct link to The Oracle Approach: Weighted PageRank">​</a></h3>
<p>Oracle Healthcare Translational Research offers a "Patients Like Mine" feature that uses <strong>Weighted Personalized PageRank (PPR)</strong> over a bipartite graph of patients and clinical events. Users adjust weights on clinical categories, and the algorithm performs biased random walks personalized toward a seed patient. The output is a ranked list with drill-down comparison views and Kaplan-Meier survival curves.</p>
<p>The design insights worth borrowing: user-adjustable dimension weights (clinicians know what matters for their case), one-to-one comparison views, and integrated survival analysis on the similar cohort.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-academic-frontier-embeddings-and-meta-paths">The Academic Frontier: Embeddings and Meta-Paths<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-academic-frontier-embeddings-and-meta-paths" class="hash-link" aria-label="Direct link to The Academic Frontier: Embeddings and Meta-Paths" title="Direct link to The Academic Frontier: Embeddings and Meta-Paths">​</a></h3>
<p>The research literature offered several promising methodologies:</p>
<table><thead><tr><th>Approach</th><th>Key Paper</th><th>Insight</th></tr></thead><tbody><tr><td><strong>Patient2Vec</strong></td><td>Zhang et al., IEEE Access 2018</td><td>LSTM + attention over longitudinal EHR produces personalized patient embeddings. 0.799 AUC. MIT-licensed.</td></tr><tr><td><strong>S-PathSim</strong></td><td>PMC8456037</td><td>Annotated Heterogeneous Information Networks prevent false associations. nDCG 0.791 on 53K patients.</td></tr><tr><td><strong>Transformer Embeddings</strong></td><td>Nature Digital Medicine, 2025</td><td>Treat each patient as a "sentence" of medical concepts. Enables stratification and progression analysis.</td></tr><tr><td><strong>Patient Similarity Networks</strong></td><td>Multi-modal fusion, Frontiers AI 2025</td><td>Graph neural networks with early/intermediate/late fusion strategies. Multi-modal significantly outperforms single-modality.</td></tr><tr><td><strong>Phe2vec</strong></td><td>Patterns, 2021</td><td>Unsupervised phenotype embeddings from EHR co-occurrence patterns.</td></tr></tbody></table>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-ohdsi-ecosystem">The OHDSI Ecosystem<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-ohdsi-ecosystem" class="hash-link" aria-label="Direct link to The OHDSI Ecosystem" title="Direct link to The OHDSI Ecosystem">​</a></h3>
<p>The OHDSI community has related tools but nothing purpose-built for patient similarity:</p>
<ul>
<li><strong>CohortMethod</strong> uses propensity score matching — similar in spirit but designed for treatment effect estimation, not general similarity search.</li>
<li><strong>ComparatorSelectionExplorer</strong> computes cosine similarity across drug comparator feature vectors — closer, but drug-only and designed for study design, not clinical matching.</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-was-missing">What Was Missing<a href="http://localhost:8082/docs/blog/patient-similarity-engine#what-was-missing" class="hash-link" aria-label="Direct link to What Was Missing" title="Direct link to What Was Missing">​</a></h3>
<p>No open-source system combined these properties:</p>
<ol>
<li><strong>OMOP-native</strong> — works directly on standard CDM tables without custom ETL</li>
<li><strong>Multi-modal</strong> — fuses demographics, conditions, labs, drugs, procedures, and genomics</li>
<li><strong>User-weighted</strong> — clinicians adjust dimension weights per search</li>
<li><strong>Interpretable</strong> — every score decomposes into per-dimension explanations</li>
<li><strong>Source-agnostic</strong> — works across any CDM source in the platform</li>
<li><strong>Cohort-integrated</strong> — bidirectional flow between similarity and cohort definitions</li>
</ol>
<p>We decided to build it.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="architecture-split-responsibility">Architecture: Split Responsibility<a href="http://localhost:8082/docs/blog/patient-similarity-engine#architecture-split-responsibility" class="hash-link" aria-label="Direct link to Architecture: Split Responsibility" title="Direct link to Architecture: Split Responsibility">​</a></h2>
<p>The biggest design decision was how to divide work between Parthenon's two backend stacks: Laravel (PHP) for application logic and FastAPI (Python) for AI services. After evaluating three architectural options, we chose <strong>Split Responsibility</strong> — each language does what it's best at:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">┌──────────────┐     ┌──────────────┐     ┌──────────────────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   Frontend   │     │   Laravel    │     │   Python AI      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   React SPA  │────▶│   API        │────▶│   Service        │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│              │     │              │     │                  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│ Weight sliders│     │ Auth/RBAC   │     │ SapBERT encode   │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│ Score bars   │     │ Extraction   │     │ Mean pooling     │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│ Compare view │     │ Scoring      │     │ 512-dim vectors  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│ Cohort export│     │ Orchestration│     │ Batch embed      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└──────────────┘     └──────┬───────┘     └────────┬─────────┘</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                            │                      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     ┌──────▼──────────────────────▼──────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │         PostgreSQL + pgvector       │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │                                     │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │  patient_feature_vectors (JSONB)    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │  patient_feature_vectors.embedding  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │  source_measurement_stats           │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │  similarity_dimensions              │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     │  patient_similarity_cache            │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                     └─────────────────────────────────────┘</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p><strong>Laravel</strong> owns feature extraction (reusing the existing <code>PatientFeatureExtractor</code> and <code>FeatureBuilder</code> patterns), interpretable scoring, auth/RBAC, and caching. <strong>Python</strong> owns SapBERT embedding generation and dense vector computation. <strong>PostgreSQL + pgvector</strong> stores both structured features (JSONB) and dense embeddings (vector(512)) for approximate nearest-neighbor search.</p>
<p>The critical benefit: <strong>interpretable mode works without the Python service.</strong> If the AI container is down, researchers still get full patient similarity via the Jaccard/Euclidean scoring path. The embedding mode adds semantic power when available, but the system degrades gracefully.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-six-dimensions">The Six Dimensions<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-six-dimensions" class="hash-link" aria-label="Direct link to The Six Dimensions" title="Direct link to The Six Dimensions">​</a></h2>
<p>Every patient in a CDM source gets a feature vector extracted across six clinical dimensions. Each extraction is tailored to the OMOP data model:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-demographics">1. Demographics<a href="http://localhost:8082/docs/blog/patient-similarity-engine#1-demographics" class="hash-link" aria-label="Direct link to 1. Demographics" title="Direct link to 1. Demographics">​</a></h3>
<p>From the <code>person</code> table: age (bucketed into 5-year intervals), gender concept, race concept. Scoring uses a composite: 40% age proximity + 40% gender match + 20% race match.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="2-conditions">2. Conditions<a href="http://localhost:8082/docs/blog/patient-similarity-engine#2-conditions" class="hash-link" aria-label="Direct link to 2. Conditions" title="Direct link to 2. Conditions">​</a></h3>
<p>From <code>condition_occurrence</code>, rolled up through <code>concept_ancestor</code> to three levels of the SNOMED hierarchy. This means "Essential hypertension" and "Hypertensive heart disease" both map to their shared ancestor "Hypertensive disorder" — capturing clinical relatedness, not just exact code matches.</p>
<p>Scoring uses <strong>Jaccard similarity</strong> on the ancestor-rolled concept sets: <code>|A ∩ B| / |A ∪ B|</code>. Two patients who share 40 of 50 ancestor conditions score 0.80.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="3-measurements--labs">3. Measurements / Labs<a href="http://localhost:8082/docs/blog/patient-similarity-engine#3-measurements--labs" class="hash-link" aria-label="Direct link to 3. Measurements / Labs" title="Direct link to 3. Measurements / Labs">​</a></h3>
<p>From <code>measurement</code>, taking the most recent value per measurement type per patient. Values are z-score normalized against source-level population statistics (stored in <code>source_measurement_stats</code>), so a hemoglobin of 14 g/dL means different things in a source with mean 13.5 vs. 15.0.</p>
<p>Scoring uses <strong>inverse Euclidean distance</strong> on the z-scored values, computed only over measurement types present in <em>both</em> patients:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">score = 1 / (1 + √(mean_squared_diff))</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="4-drugs">4. Drugs<a href="http://localhost:8082/docs/blog/patient-similarity-engine#4-drugs" class="hash-link" aria-label="Direct link to 4. Drugs" title="Direct link to 4. Drugs">​</a></h3>
<p>From <code>drug_exposure</code>, rolled up to the <strong>ingredient level</strong> via <code>concept_ancestor</code> joined to <code>concept</code> where <code>concept_class_id = 'Ingredient'</code>. This collapses brand names, formulations, and dosage forms into their active ingredients — "Metformin 500 mg tablet" and "Glucophage XR 1000 mg" both become <em>Metformin</em>.</p>
<p>Scoring: Jaccard on ingredient-level concept sets.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="5-procedures">5. Procedures<a href="http://localhost:8082/docs/blog/patient-similarity-engine#5-procedures" class="hash-link" aria-label="Direct link to 5. Procedures" title="Direct link to 5. Procedures">​</a></h3>
<p>From <code>procedure_occurrence</code>, using distinct procedure concept IDs. No rollup — procedure hierarchies are flatter than condition or drug hierarchies, and exact procedure matching is clinically meaningful.</p>
<p>Scoring: Jaccard on procedure concept sets.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="6-genomic-variants">6. Genomic Variants<a href="http://localhost:8082/docs/blog/patient-similarity-engine#6-genomic-variants" class="hash-link" aria-label="Direct link to 6. Genomic Variants" title="Direct link to 6. Genomic Variants">​</a></h3>
<p>From <code>genomic_variants</code> (Parthenon's app-schema table linking VCF-parsed variants to OMOP person IDs). Each variant is represented as a (gene, pathogenicity) tuple.</p>
<p>Scoring uses <strong>pathogenicity-tiered weighted overlap</strong>: pathogenic variants score 3x, likely-pathogenic 2x, VUS 1x. Two patients sharing a pathogenic BRCA1 variant is a stronger match than sharing a VUS in a less actionable gene.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-missing-dimension-problem-and-its-elegant-solution">The Missing-Dimension Problem (and Its Elegant Solution)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-missing-dimension-problem-and-its-elegant-solution" class="hash-link" aria-label="Direct link to The Missing-Dimension Problem (and Its Elegant Solution)" title="Direct link to The Missing-Dimension Problem (and Its Elegant Solution)">​</a></h2>
<p>Not every CDM source has every dimension. SynPUF (CMS synthetic data) has conditions, drugs, and procedures but no lab values and no genomic data. The Pancreatic Cancer Corpus has conditions, drugs, and measurements but no procedures and no genomics (yet). Acumenus CDM has everything except genomics.</p>
<p>A naive approach would give SynPUF patients a 0 on measurements and genomics, penalizing them unfairly. Our approach: <strong>missing dimensions reduce the denominator, not the score.</strong></p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">available_dims = dimensions where BOTH patients have data</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">score = Σ(weight × dim_score) / Σ(weight)    for dims in available_dims</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Each patient's feature vector carries a <code>dimensions_available</code> array tracking which dimensions have data. When comparing two patients, the scorer only includes dimensions that are available to <em>both</em> — and the weighted average divides only by the weights of those included dimensions.</p>
<p>This means a SynPUF patient with perfect condition/drug overlap and the same demographics can score 0.95 against another SynPUF patient, even though neither has lab values or genomic data. The score honestly represents the similarity across the data that <em>exists</em>.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="dual-scoring-modes-interpretable-vs-embedding">Dual Scoring Modes: Interpretable vs. Embedding<a href="http://localhost:8082/docs/blog/patient-similarity-engine#dual-scoring-modes-interpretable-vs-embedding" class="hash-link" aria-label="Direct link to Dual Scoring Modes: Interpretable vs. Embedding" title="Direct link to Dual Scoring Modes: Interpretable vs. Embedding">​</a></h2>
<p>The engine supports two algorithmic modes, togglable in the UI:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="interpretable-mode-pure-sql">Interpretable Mode (Pure SQL)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#interpretable-mode-pure-sql" class="hash-link" aria-label="Direct link to Interpretable Mode (Pure SQL)" title="Direct link to Interpretable Mode (Pure SQL)">​</a></h3>
<p>Every candidate in the source is scored against the seed patient using the six dimension scorers described above. This is a brute-force scan — for each candidate, compute weighted Jaccard/Euclidean across all available dimensions, sum, rank. On the Pancreatic Cancer Corpus (361 patients), this takes ~200ms. On Acumenus (1M patients), it's slower but still feasible for pre-filtered queries.</p>
<p><strong>Why it matters:</strong> every score is fully decomposable. A researcher can see that patient 341 scored 0.87 because demographics were a perfect match (1.0), conditions overlapped 89.8%, labs were moderately similar (0.60), and drugs were identical (1.0). There is no black box.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="embedding-mode-pgvector-ann--re-ranking">Embedding Mode (pgvector ANN + Re-ranking)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#embedding-mode-pgvector-ann--re-ranking" class="hash-link" aria-label="Direct link to Embedding Mode (pgvector ANN + Re-ranking)" title="Direct link to Embedding Mode (pgvector ANN + Re-ranking)">​</a></h3>
<p>For larger populations, the engine offers a two-stage approach:</p>
<ol>
<li>
<p><strong>Stage 1: Approximate Nearest Neighbors.</strong> Each patient's structured features are sent to the Python AI service, which encodes them into a 512-dimensional dense vector using SapBERT concept embeddings. Demographics get 32 dimensions, conditions get 128, measurements get 64, drugs get 128, procedures get 96, and genomics get 64. The resulting vector is L2-normalized and stored in pgvector with an IVFFlat index for cosine distance search.</p>
<p>A single pgvector ANN query retrieves the top 200 candidates in milliseconds, even at 1M patients:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> person_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">embedding </span><span class="token operator" style="color:hsl(207, 82%, 66%)">&lt;=&gt;</span><span class="token plain"> ?::vector</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">AS</span><span class="token plain"> cosine_similarity</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> patient_feature_vectors</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> source_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> ? </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> person_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">!=</span><span class="token plain"> ? </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> embedding </span><span class="token operator" style="color:hsl(207, 82%, 66%)">IS</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">NOT</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">NULL</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">BY</span><span class="token plain"> embedding </span><span class="token operator" style="color:hsl(207, 82%, 66%)">&lt;=&gt;</span><span class="token plain"> ?::vector</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">LIMIT</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">200</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
</li>
<li>
<p><strong>Stage 2: Re-ranking.</strong> The 200 ANN candidates are re-ranked using the <em>same interpretable scorers</em> from mode 1. This means the final results have identical per-dimension score breakdowns regardless of which mode was used. The only difference is how candidates were selected — brute-force scan vs. ANN approximation.</p>
</li>
</ol>
<p>The SapBERT encoding is what makes embedding mode genuinely better than fast Jaccard for semantic matching. SapBERT (a PubMedBERT-based biomedical language model) encodes concept names into 768-dimensional vectors where semantically related concepts are close — "Type 2 diabetes mellitus" and "Insulin resistance" have high cosine similarity even though they share no OMOP ancestor concepts. By mean-pooling SapBERT embeddings across a patient's condition set, the resulting vector captures the <em>gestalt</em> of their clinical profile, not just the discrete concepts.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="bidirectional-cohort-integration">Bidirectional Cohort Integration<a href="http://localhost:8082/docs/blog/patient-similarity-engine#bidirectional-cohort-integration" class="hash-link" aria-label="Direct link to Bidirectional Cohort Integration" title="Direct link to Bidirectional Cohort Integration">​</a></h2>
<p>Patient similarity doesn't live in a vacuum — it feeds into and draws from Parthenon's cohort system.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="similarity--cohort-export">Similarity → Cohort (Export)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#similarity--cohort-export" class="hash-link" aria-label="Direct link to Similarity → Cohort (Export)" title="Direct link to Similarity → Cohort (Export)">​</a></h3>
<p>After running a similarity search, researchers can click "Export as Cohort" to save the result set as a new cohort definition. They set a minimum similarity score threshold, name the cohort, and the engine writes the matching person_ids into <code>results.cohort</code>. From there, the cohort is available for characterization, estimation, prediction, pathways — every analysis tool in Parthenon.</p>
<p>This enables a workflow that wasn't possible before: <em>start with a patient, find similar ones, export them as a cohort, run a Kaplan-Meier analysis on that cohort.</em> Clinical hypothesis generation driven by concrete clinical intuition rather than abstract inclusion criteria.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="cohort--similarity-seed">Cohort → Similarity (Seed)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#cohort--similarity-seed" class="hash-link" aria-label="Direct link to Cohort → Similarity (Seed)" title="Direct link to Cohort → Similarity (Seed)">​</a></h3>
<p>The reverse flow is equally powerful. Instead of "find patients similar to <em>this person</em>," researchers can ask "find patients similar to <em>this cohort</em>." The engine computes a centroid — the average feature vector across all cohort members — and searches for patients near that centroid.</p>
<p>The centroid is constructed differently for each mode:</p>
<ul>
<li><strong>Interpretable:</strong> Union of member conditions/drugs/procedures, mean of lab z-scores, median age, mode gender/race. A "virtual patient" representing the cohort's composite phenotype.</li>
<li><strong>Embedding:</strong> Mean of member 512-dim embeddings. Mathematically equivalent to the centroid of the cohort in embedding space.</li>
</ul>
<p>This supports cohort enrichment: start with a small, well-characterized cohort, find similar patients to expand it, validate the expanded cohort against the original inclusion criteria.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="tiered-privacy-hipaa-friendly-by-default">Tiered Privacy: HIPAA-Friendly by Default<a href="http://localhost:8082/docs/blog/patient-similarity-engine#tiered-privacy-hipaa-friendly-by-default" class="hash-link" aria-label="Direct link to Tiered Privacy: HIPAA-Friendly by Default" title="Direct link to Tiered Privacy: HIPAA-Friendly by Default">​</a></h2>
<p>Parthenon handles OMOP CDM data that may include PHI under HIPAA. The similarity engine respects this with tiered access control:</p>
<ul>
<li><strong>Default (patient-similarity.view):</strong> Results show overall and per-dimension scores, age/gender summaries, and condition/drug counts — but no person_ids, no named conditions, no named drugs. Aggregate-level similarity without patient identification.</li>
<li><strong>With profiles.view:</strong> Full person-level results including person_ids (clickable to Patient Profile), named shared conditions, named shared drugs, and the Compare view for head-to-head analysis.</li>
</ul>
<p>The tiering is enforced at the controller level — the service always computes full results, but the controller strips person-level fields before responding to users without <code>profiles.view</code> permission.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-data-model">The Data Model<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-data-model" class="hash-link" aria-label="Direct link to The Data Model" title="Direct link to The Data Model">​</a></h2>
<p>Four new tables in the <code>app</code> schema:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">patient_feature_vectors     — One </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">row</span><span class="token plain"> per patient per source</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"> Demographics</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              condition</span><span class="token operator" style="color:hsl(207, 82%, 66%)">/</span><span class="token plain">drug</span><span class="token operator" style="color:hsl(207, 82%, 66%)">/</span><span class="token keyword" style="color:hsl(286, 60%, 67%)">procedure</span><span class="token plain"> concept arrays </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">JSONB</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              z</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token plain">scored lab vector</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> genomic variants</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">512</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token plain">dim</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              pgvector embedding</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> dimensions_available</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">Unique</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">on</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">source_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> person_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">source_measurement_stats    — Population</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token keyword" style="color:hsl(286, 60%, 67%)">level</span><span class="token plain"> measurement </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">statistics</span><span class="token plain"> per source</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              Mean</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> stddev</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> n_patients</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> quartiles per measurement</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              concept</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"> Used </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> z</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token plain">score normalization</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">similarity_dimensions       — Admin</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token plain">configurable dimension definitions </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">with</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">default</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              weights</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"> Six seeded dimensions</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> extensible</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">patient_similarity_cache    — Result caching keyed </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">on</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">source</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> person</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">mode</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              weights_hash</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-</span><span class="token keyword" style="color:hsl(286, 60%, 67%)">hour</span><span class="token plain"> TTL</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain"> Prevents redundant</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                              computation </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">for</span><span class="token plain"> identical queries</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The <code>patient_feature_vectors</code> table carries both structured data (for interpretable scoring) and the dense embedding (for ANN search) in the same row. This co-location means a single query can filter by demographics, retrieve the embedding for ANN, and return structured features for re-ranking — no joins required.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="feature-extraction-at-scale">Feature Extraction at Scale<a href="http://localhost:8082/docs/blog/patient-similarity-engine#feature-extraction-at-scale" class="hash-link" aria-label="Direct link to Feature Extraction at Scale" title="Direct link to Feature Extraction at Scale">​</a></h2>
<p>The <code>ComputePatientFeatureVectors</code> Horizon job processes patients in batches of 500. For each batch:</p>
<ol>
<li><strong>Demographics</strong> from <code>person</code> table — age bucketed into 5-year intervals</li>
<li><strong>Conditions</strong> from <code>condition_occurrence</code> joined to <code>concept_ancestor</code> (0-3 levels of separation) — ancestor rollup</li>
<li><strong>Measurements</strong> from <code>measurement</code> — latest value per concept, z-scored against <code>source_measurement_stats</code></li>
<li><strong>Drugs</strong> from <code>drug_exposure</code> joined to <code>concept_ancestor</code> and <code>concept</code> — ingredient-level rollup</li>
<li><strong>Procedures</strong> from <code>procedure_occurrence</code> — distinct procedure concepts</li>
<li><strong>Genomics</strong> from <code>genomic_variants</code> — gene/pathogenicity tuples</li>
</ol>
<p>On the Pancreatic Cancer Corpus (361 patients), full extraction takes <strong>3 seconds</strong>. The Acumenus CDM (1 million patients) processes at approximately <strong>8,000 patients per minute</strong> — around 2 hours for the full population. The measurement statistics (top 50 measurement types by patient count, minimum 10 patients, non-zero standard deviation) are computed once upfront and take 5-10 seconds.</p>
<p>The job is idempotent — it uses <code>updateOrCreate</code> on the (source_id, person_id) unique key, so re-running it on the same source updates existing vectors and adds new patients without duplicates.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-we-reused-and-why-it-matters">What We Reused (and Why It Matters)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#what-we-reused-and-why-it-matters" class="hash-link" aria-label="Direct link to What We Reused (and Why It Matters)" title="Direct link to What We Reused (and Why It Matters)">​</a></h2>
<p>One of the most satisfying aspects of this build was how much existing Parthenon infrastructure we could leverage:</p>
<table><thead><tr><th>Existing Component</th><th>How We Reused It</th></tr></thead><tbody><tr><td><code>PatientFeatureExtractor</code> (PopulationRisk)</td><td>Pattern for demographics/conditions/measurements extraction</td></tr><tr><td><code>FeatureBuilderInterface</code> (Analysis/Features)</td><td>Modular feature extraction pattern with 6 implementations</td></tr><tr><td><code>SapBERT service</code> (ai/services/sapbert.py)</td><td>Core of embedding generation — encode concept names to 768-dim vectors</td></tr><tr><td><code>pgvector</code> + <code>search_nearest</code> pattern</td><td>Already deployed for concept embeddings, extended for patient embeddings</td></tr><tr><td><code>SourceContext</code></td><td>Dynamic schema isolation — one codebase works across all CDM sources</td></tr><tr><td><code>ConceptResolutionService</code></td><td>Ancestor concept rollup for condition/drug hierarchies</td></tr><tr><td>Horizon queue infrastructure</td><td>Background job processing with monitoring via dashboard</td></tr><tr><td><code>PatientProfileService</code></td><td>Integrated for contextual "Find Similar" entry point</td></tr><tr><td>Spatie RBAC</td><td>Permission-based tiered access (patient-similarity.view, profiles.view)</td></tr></tbody></table>
<p>We didn't build a similarity engine from scratch. We built a new <em>composition</em> of capabilities that Parthenon had been developing for months — embeddings, vector search, feature extraction, schema isolation, RBAC — and surfaced them through a new lens.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-api-surface">The API Surface<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-api-surface" class="hash-link" aria-label="Direct link to The API Surface" title="Direct link to The API Surface">​</a></h2>
<p>Seven endpoints behind <code>auth:sanctum</code> with Spatie RBAC:</p>
<table><thead><tr><th>Method</th><th>Endpoint</th><th>Permission</th><th>Purpose</th></tr></thead><tbody><tr><td>POST</td><td><code>/patient-similarity/search</code></td><td>patient-similarity.view</td><td>Single-patient similarity search</td></tr><tr><td>POST</td><td><code>/patient-similarity/search-from-cohort</code></td><td>patient-similarity.view + cohorts.view</td><td>Cohort-seeded similarity search</td></tr><tr><td>GET</td><td><code>/patient-similarity/compare</code></td><td>patient-similarity.view + profiles.view</td><td>Head-to-head patient comparison</td></tr><tr><td>POST</td><td><code>/patient-similarity/export-cohort</code></td><td>patient-similarity.view + cohorts.create</td><td>Export results as cohort definition</td></tr><tr><td>GET</td><td><code>/patient-similarity/dimensions</code></td><td>patient-similarity.view</td><td>List configurable dimensions</td></tr><tr><td>POST</td><td><code>/patient-similarity/compute</code></td><td>patient-similarity.compute</td><td>Trigger feature extraction</td></tr><tr><td>GET</td><td><code>/patient-similarity/status/{sourceId}</code></td><td>patient-similarity.view</td><td>Extraction status + staleness</td></tr></tbody></table>
<p>The search endpoint accepts user-adjustable weights:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"person_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"source_id"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">47</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"mode"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"interpretable"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"weights"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"demographics"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"conditions"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">3.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"measurements"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">2.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"drugs"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"procedures"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1.0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"genomics"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">5.0</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"limit"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">25</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"min_score"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">0.3</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Boosting genomics to 5.0 makes the engine prioritize shared variant profiles. Zeroing out demographics removes age/gender/race from the scoring entirely. The weights are fully user-controlled, per-search.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="validation-results">Validation Results<a href="http://localhost:8082/docs/blog/patient-similarity-engine#validation-results" class="hash-link" aria-label="Direct link to Validation Results" title="Direct link to Validation Results">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="pancreatic-cancer-corpus-361-patients-4-dimensions">Pancreatic Cancer Corpus (361 patients, 4 dimensions)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#pancreatic-cancer-corpus-361-patients-4-dimensions" class="hash-link" aria-label="Direct link to Pancreatic Cancer Corpus (361 patients, 4 dimensions)" title="Direct link to Pancreatic Cancer Corpus (361 patients, 4 dimensions)">​</a></h3>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Extraction time</td><td>3 seconds</td></tr><tr><td>Search latency (interpretable)</td><td>~200ms</td></tr><tr><td>Dimensions available</td><td>demographics, conditions, measurements, drugs</td></tr><tr><td>Top match for person_id=1</td><td>Person 341: 0.87 overall (demo 1.0, conditions 0.90, labs 0.60, drugs 1.0)</td></tr><tr><td>Missing dimensions</td><td>procedures (null), genomics (null) — correctly excluded from scoring</td></tr></tbody></table>
<p>Custom weight validation: boosting conditions to 3.0 correctly reranked Person 141 (95.9% condition overlap) above Person 341 (89.8% conditions but perfect demographics).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="acumenus-cdm-1m-patients-5-dimensions">Acumenus CDM (1M patients, 5 dimensions)<a href="http://localhost:8082/docs/blog/patient-similarity-engine#acumenus-cdm-1m-patients-5-dimensions" class="hash-link" aria-label="Direct link to Acumenus CDM (1M patients, 5 dimensions)" title="Direct link to Acumenus CDM (1M patients, 5 dimensions)">​</a></h3>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Extraction rate</td><td>~8,000 patients/minute</td></tr><tr><td>Dimensions available</td><td>demographics, conditions, measurements, drugs, procedures</td></tr><tr><td>Top match for person_id=1</td><td>Person 985: 0.72 overall (demo 0.80, conditions 0.82, labs 0.53, drugs 0.58, procedures 0.86)</td></tr><tr><td>Missing dimensions</td><td>genomics (null) — correctly excluded</td></tr></tbody></table>
<p>The lower overall scores on Acumenus are expected — with a million diverse patients, even the best match will have more divergence than in a specialized 361-patient corpus.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-frontend-experience">The Frontend Experience<a href="http://localhost:8082/docs/blog/patient-similarity-engine#the-frontend-experience" class="hash-link" aria-label="Direct link to The Frontend Experience" title="Direct link to The Frontend Experience">​</a></h2>
<p>The Patient Similarity page follows Parthenon's dark clinical theme and offers:</p>
<ul>
<li><strong>Search form</strong> with source selector, patient ID input, and dimension weight sliders (0-5, step 0.5)</li>
<li><strong>Mode toggle</strong> between Interpretable and Embedding</li>
<li><strong>Results table</strong> with ranked patients, overall score, and per-dimension score bars (teal &gt;0.7, gold &gt;0.4, grey below)</li>
<li><strong>Compare link</strong> on each result row for head-to-head patient analysis</li>
<li><strong>Staleness indicator</strong> showing when feature vectors were last computed with a "Recompute" action</li>
<li><strong>Search mode toggle</strong> between "Single Patient" and "From Cohort" for both entry workflows</li>
<li><strong>Export as Cohort</strong> button for saving result sets as cohort definitions</li>
</ul>
<p>Contextual entry points are embedded throughout the platform:</p>
<ul>
<li><strong>Patient Profile page:</strong> a "Find Similar Patients" button pre-fills the search with the current patient and source</li>
<li><strong>Cohort Definitions:</strong> a "Find Similar to Cohort" action opens the similarity page in cohort-seed mode</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/patient-similarity-engine#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The engine ships today with Phases 1-4 complete. Phase 5 remains as a backlog of advanced capabilities:</p>
<ol>
<li><strong>Temporal similarity</strong> — consider <em>when</em> conditions, treatments, and events occurred relative to each other, not just <em>which</em> ones</li>
<li><strong>Imaging radiomics</strong> — tumor volumetrics and radiomic features from DICOM via Orthanc</li>
<li><strong>Clinical notes NLP</strong> — embed <code>note_nlp</code> content for text-based phenotype matching</li>
<li><strong>Learned patient embeddings</strong> — train a Patient2Vec or transformer model on Parthenon's CDM data for temporal-aware embeddings</li>
<li><strong>Weighted Personalized PageRank</strong> — implement the Oracle PLM graph algorithm as an alternative to vector-based scoring</li>
<li><strong>Cross-source federated similarity</strong> — find patients in the Pancreatic Cancer Corpus who are similar to an Acumenus patient, blending data across CDM sources without co-locating patient records</li>
<li><strong>Tumor Board integration</strong> — the Molecular Tumor Board's existing genomic matching will be unified with the similarity engine, so clinicians see genomic <em>and</em> clinical similarity in one view</li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="http://localhost:8082/docs/blog/patient-similarity-engine#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h2>
<p>The Patient Similarity Engine is the kind of feature that seems obvious in retrospect — of course a research platform should let you find similar patients across all available clinical dimensions. But implementing it correctly required solving a specific, non-trivial composition of challenges: multi-modal feature extraction from OMOP CDM, missing-dimension tolerance, dual algorithmic approaches with shared interpretability, bidirectional cohort integration, HIPAA-conscious tiered access, and source-agnostic architecture that works from 361 patients to 1 million.</p>
<p>What makes this a milestone for Parthenon isn't the similarity scoring itself — Jaccard and cosine distance have been around for decades. It's the <em>integration</em>. Patient similarity is woven into the cohort builder, the patient profile, the tumor board, and the permission system. It's not a standalone tool bolted onto the side. It's a new lens through which every other capability in the platform becomes more powerful.</p>
<p>Thirteen commits. Four phases. Six dimensions. One button: <strong>Find Similar Patients.</strong></p>]]></content:encoded>
            <category>patient-similarity</category>
            <category>omop</category>
            <category>pgvector</category>
            <category>sapbert</category>
            <category>embeddings</category>
            <category>cohort-discovery</category>
            <category>architecture</category>
            <category>ai</category>
            <category>precision-medicine</category>
        </item>
        <item>
            <title><![CDATA[Poseidon and Vulcan: The Gods of Continuous Data Ingestion]]></title>
            <link>http://localhost:8082/docs/blog/poseidon-and-vulcan</link>
            <guid>http://localhost:8082/docs/blog/poseidon-and-vulcan</guid>
            <pubDate>Sat, 28 Mar 2026 18:00:00 GMT</pubDate>
            <description><![CDATA[Two new engines join the Parthenon pantheon — Vulcan commands the FHIR, while Poseidon rules the tides of transactional data. Together they deliver continuous, incremental, dependency-aware OMOP CDM ingestion.]]></description>
            <content:encoded><![CDATA[<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/poseidon-vulcan.png" alt="Poseidon and Vulcan — the gods of continuous data ingestion" style="width:100%;display:block"></div>
<p>Healthcare data does not arrive in neat packages. It streams — continuously, chaotically, from dozens of transactional systems that were never designed to talk to each other. EHR encounters appear as HL7 ADT messages. Lab results materialize through OBX segments hours after the draw. Radiology reports surface from PACS archives with inconsistent coding. Claims trickle in from clearinghouses days or weeks after the visit. Genomic panels arrive as VCF files from external laboratories with their own nomenclatures and timelines.</p>
<p>Transforming this unruly sea of clinical data into a coherent, research-ready OMOP Common Data Model is the central engineering challenge of any outcomes research platform. And until now, Parthenon handled it the same way most platforms do: as a series of one-time bulk loads. Upload a file. Map the concepts. Write the CDM. Move on.</p>
<p>That era is over.</p>
<p>Today we introduce two new engines to the Parthenon pantheon — <strong>Vulcan</strong> and <strong>Poseidon</strong> — purpose-built for the reality of continuous healthcare data integration.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="vulcan-god-of-the-fhir">Vulcan: God of the FHIR<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#vulcan-god-of-the-fhir" class="hash-link" aria-label="Direct link to Vulcan: God of the FHIR" title="Direct link to Vulcan: God of the FHIR">​</a></h2>
<p>In Roman mythology, Vulcan was the god of fire and the forge — the divine craftsman who shaped raw materials into instruments of power. His forge burned at the heart of Mount Etna, transforming crude ore into the weapons and tools that the other gods depended on.</p>
<p>In Parthenon, Vulcan occupies an analogous role. He is the <strong>FHIR integration engine</strong> — the system that connects directly to EHR servers, extracts clinical data through standardized FHIR R4 interfaces, and forges it into OMOP CDM records ready for analysis.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-vulcan-does">What Vulcan Does<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#what-vulcan-does" class="hash-link" aria-label="Direct link to What Vulcan Does" title="Direct link to What Vulcan Does">​</a></h3>
<p>Vulcan operates through a connection-backed bulk sync architecture. Data stewards attach a registered FHIR server connection to an ingestion project, then trigger incremental or full exports. The pipeline handles the rest:</p>
<p><strong>FHIR Bulk Data Access ($export)</strong> — Vulcan initiates SMART Backend Services or anonymous bulk export requests against FHIR R4 servers. It manages the asynchronous polling lifecycle — submitting the export, monitoring the status endpoint, downloading NDJSON files when ready, and handling the inevitable timeouts and retries that bulk exports entail.</p>
<p><strong>Connection Management</strong> — Each FHIR connection is a named configuration: server URL, authentication mode (SMART Backend Services with JWKS, client credentials, or anonymous for public test servers), target resource types, group identifiers for filtered exports, and incremental sync tracking. Connections are registered once by administrators and reused across projects.</p>
<p><strong>Incremental Sync</strong> — After the initial full export, subsequent syncs request only resources modified since the last successful run. Vulcan tracks the <code>_since</code> parameter per connection, ensuring that each sync captures new admissions, updated lab results, and corrected diagnoses without re-processing the entire dataset.</p>
<p><strong>Workspace Operations Console</strong> — The Vulcan workspace provides real-time visibility into sync operations: connection status, last sync time, record counts, mapping coverage percentages, and a full history of sync runs with extraction and mapping metrics. Sync controls are immediate — one button for incremental refresh, another for full re-export.</p>
<p><strong>NDJSON Bundle Sandbox</strong> — For ad-hoc validation, Vulcan includes a sandbox mode where individual FHIR bundles or NDJSON files can be uploaded directly for concept mapping spot-checks — useful for verifying that a new server's coding conventions map cleanly before committing to a full sync.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-architecture-of-fhir-at-scale">The Architecture of FHIR at Scale<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#the-architecture-of-fhir-at-scale" class="hash-link" aria-label="Direct link to The Architecture of FHIR at Scale" title="Direct link to The Architecture of FHIR at Scale">​</a></h3>
<p>Vulcan's design reflects the operational reality of FHIR bulk data access. Public test servers like HAPI R4 and Firely are useful for development but unreliable for sustained bulk exports. Production Epic, Cerner, and MEDITECH deployments behave differently — they enforce rate limits, require SMART Backend Services authentication with rotating JWKS keys, and produce NDJSON files that can exceed gigabytes for large patient populations.</p>
<p>Vulcan handles this through a queue-driven architecture. Each sync run dispatches a <code>RunFhirSyncJob</code> onto Laravel Horizon's Redis-backed queue. The job manages the full export lifecycle asynchronously — polling status endpoints, downloading resources, mapping FHIR codes to OMOP concepts, and writing CDM records — while the frontend auto-refreshes every 10 seconds to reflect progress. If the export fails or times out, the run is marked with a clear error message and the connection remains ready for retry.</p>
<p>The key insight: FHIR integration in healthcare is inherently asynchronous and failure-prone. Vulcan's architecture embraces this rather than fighting it. Every operation is resumable, every failure is visible, and every run produces an auditable record of what was extracted, what was mapped, and what was written.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="poseidon-ruler-of-the-data-seas">Poseidon: Ruler of the Data Seas<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#poseidon-ruler-of-the-data-seas" class="hash-link" aria-label="Direct link to Poseidon: Ruler of the Data Seas" title="Direct link to Poseidon: Ruler of the Data Seas">​</a></h2>
<p>Where Vulcan commands the fire of FHIR, Poseidon rules the seas — the vast, churning ocean of transactional data that flows from every clinical system in the enterprise.</p>
<p>In mythology, Poseidon wielded his trident to control the waves, calm storms, and shake the earth itself. In Parthenon, Poseidon is the <strong>CDM refresh orchestration engine</strong> — powered by <strong>dbt</strong> (Data Build Tool) for SQL-based transformations and <strong>Dagster</strong> for dependency-aware scheduling and observability. He takes the raw data that Aqueduct stages and transforms it into a living, breathing OMOP CDM that stays current as the underlying sources change.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="why-poseidon-exists">Why Poseidon Exists<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#why-poseidon-exists" class="hash-link" aria-label="Direct link to Why Poseidon Exists" title="Direct link to Why Poseidon Exists">​</a></h3>
<p>Aqueduct — Parthenon's existing ingestion pipeline — handles the initial ETL brilliantly: file upload, profiling, AI-assisted concept mapping, schema mapping, and CDM writing. But Aqueduct operates on a batch paradigm. You upload data, map it, write it, and the job is done.</p>
<p>Healthcare data sources are not batch systems. EHR databases accumulate new encounters hourly. LIMS systems process lab results continuously. PACS archives ingest imaging studies around the clock. Claims feeds arrive on weekly or monthly cycles. Each of these sources produces data that must flow into the CDM incrementally — without duplicating existing records, without violating foreign key constraints, and without requiring a full rebuild every time.</p>
<p>This is the problem Poseidon was designed to solve.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-dbt-transformation-layer">The dbt Transformation Layer<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#the-dbt-transformation-layer" class="hash-link" aria-label="Direct link to The dbt Transformation Layer" title="Direct link to The dbt Transformation Layer">​</a></h3>
<p>At Poseidon's core is a <strong>dbt project</strong> — a collection of SQL-based models that define how raw staged data transforms into OMOP CDM tables. Each CDM table (person, visit_occurrence, condition_occurrence, drug_exposure, measurement, observation, procedure_occurrence, and more) is a dbt model with:</p>
<p><strong>Incremental Materialization</strong> — Poseidon's models use dbt's <code>incremental</code> materialization strategy with <code>merge</code> semantics. On each run, only new or modified records are processed. The <code>WHERE modified_date &gt; last_run_date</code> filter ensures that a nightly refresh of a million-patient CDM processes only the day's new encounters — not the entire history.</p>
<p><strong>Dependency-Aware Execution</strong> — dbt understands the directed acyclic graph (DAG) of table dependencies. <code>person</code> must load before <code>visit_occurrence</code>. Visits must exist before <code>condition_occurrence</code> can reference them via foreign keys. <code>observation_period</code> depends on the union of all clinical event tables. Poseidon respects this dependency graph automatically — no manual ordering, no failed runs from FK violations.</p>
<p><strong>Schema Tests</strong> — Every CDM model carries built-in data quality assertions: not-null constraints on required fields, uniqueness checks on primary keys, foreign key relationships validated against the vocabulary and person tables, accepted-value checks on concept IDs, and temporal plausibility tests (no events before birth, no events after death). These tests run as part of every refresh, catching data quality issues before they propagate into analyses.</p>
<p><strong>Vocabulary-Aware Transformations</strong> — Poseidon's custom macros (<code>concept_lookup</code>, <code>standard_concept</code>) perform source-to-standard concept mapping within dbt SQL. Source codes from EHR systems are resolved to standard OMOP concepts through the shared vocabulary schema — the same vocabulary that powers Parthenon's Concept Explorer and Hecate semantic search.</p>
<p><strong>Schema Routing</strong> — A custom <code>generate_schema_name</code> macro routes each model to the correct PostgreSQL schema per source. The same dbt models can produce CDM tables in <code>omop</code>, <code>synpuf</code>, <code>irsf</code>, <code>pancreas</code>, or any other source schema — controlled by a single variable at run time.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-dagster-orchestration-layer">The Dagster Orchestration Layer<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#the-dagster-orchestration-layer" class="hash-link" aria-label="Direct link to The Dagster Orchestration Layer" title="Direct link to The Dagster Orchestration Layer">​</a></h3>
<p>dbt handles the what — Dagster handles the when, how, and what-if:</p>
<p><strong>Software-Defined Assets</strong> — Every CDM table is a Dagster asset backed by a dbt model. Dagster tracks the materialization state of each asset — when it was last refreshed, whether the refresh succeeded, and what downstream assets depend on it. The asset graph provides a complete lineage view from staging tables through intermediate transformations to final CDM tables.</p>
<p><strong>Per-Source Scheduling</strong> — Different data sources have different cadences. EHR feeds might refresh nightly at 2 AM. LIMS data might arrive hourly. Claims feeds might land weekly. Poseidon supports per-source cron schedules, each with its own cadence, dbt selector (e.g., <code>tag:ehr</code> or <code>source:staging_acumenus</code>), and activation state.</p>
<p><strong>Event-Driven Sensors</strong> — Beyond cron schedules, Poseidon can watch for events: new rows in a staging table, a FHIR webhook notification from Vulcan, or a file drop in a monitored directory. When the sensor fires, Poseidon automatically triggers the appropriate refresh pipeline.</p>
<p><strong>Manual Triggers</strong> — Data stewards can trigger incremental or full refreshes on demand through the Poseidon operations console — useful for ad-hoc loads, post-mapping corrections, or testing new source integrations.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-operations-console">The Operations Console<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#the-operations-console" class="hash-link" aria-label="Direct link to The Operations Console" title="Direct link to The Operations Console">​</a></h3>
<p>Poseidon's frontend is a single-page operations console designed for the daily reality of data stewardship — not a DevOps dashboard, but a clinical data control tower:</p>
<p><strong>Overview Metrics</strong> — Active schedules, runs in progress, success/failure counts at a glance.</p>
<p><strong>Source Schedules</strong> — Each configured source shows its schedule type (cron, sensor, or manual), cron expression, last run time, next scheduled run, and run count. Activate, pause, or trigger runs directly from the schedule card.</p>
<p><strong>Recent Runs</strong> — A live table of recent pipeline executions with source, run type, status, trigger method, and duration. Click any run to expand inline details: rows inserted, rows updated, models materialized, tests passed and failed, and full error messages for failed runs.</p>
<p><strong>CDM Freshness</strong> — A grid view of every CDM asset with its last materialization timestamp. Stale assets (not refreshed in 24+ hours) are highlighted in gold — immediately visible, immediately actionable.</p>
<p><strong>Asset Lineage</strong> — A tiered dependency view showing the flow from staging through intermediate transformations to CDM tables and quality models. Not a decorative graph — a diagnostic tool for understanding impact when a source fails or a model changes.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-they-work-together">How They Work Together<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#how-they-work-together" class="hash-link" aria-label="Direct link to How They Work Together" title="Direct link to How They Work Together">​</a></h2>
<p>Vulcan and Poseidon are not competing systems. They occupy different positions in the data lifecycle and are designed to complement each other:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">EHR / FHIR Server</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      v</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  [ Vulcan ]  ------&gt;  FHIR Bulk Export  ------&gt;  Staged Data</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                                      |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">Flat Files / DB                                       |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      |                                               v</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      v                                        [ Poseidon ]</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  [ Aqueduct ]  ------&gt;  Profiling + Mapping         |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      |                                               v</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      v                                     Incremental CDM</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  Staged Data  --------------------------------&gt;  Refresh</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                                      |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                                      v</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                              OMOP CDM Tables</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                                      |</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                                      v</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">                                            Achilles / DQD / Analyses</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p><strong>Vulcan</strong> handles the FHIR-specific integration layer: connecting to servers, managing authentication, handling the bulk export lifecycle, and staging FHIR resources as relational data. Once staged, the data enters the same pipeline as any other source.</p>
<p><strong>Poseidon</strong> handles the transformation layer: taking staged data from any source (Vulcan, Aqueduct file uploads, direct database connections) and maintaining the CDM through incremental, dependency-aware, vocabulary-mapped, quality-tested refreshes.</p>
<p><strong>Aqueduct</strong> remains the one-time bulk ETL tool: file upload, profiling, AI-assisted concept mapping, schema mapping, and initial CDM writing. It is the craftsman's workshop where new data sources are onboarded. Once the mappings are confirmed, Poseidon takes over for ongoing maintenance.</p>
<p>Together, they transform Parthenon from a platform that receives data to one that continuously integrates it — a living analytical environment where the CDM reflects the current state of the clinical enterprise, not a snapshot from the last quarterly load.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-naming-convention">The Naming Convention<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#the-naming-convention" class="hash-link" aria-label="Direct link to The Naming Convention" title="Direct link to The Naming Convention">​</a></h2>
<p>Parthenon's feature naming follows the architecture of classical mythology:</p>
<table><thead><tr><th>Feature</th><th>Namesake</th><th>Domain</th></tr></thead><tbody><tr><td><strong>Parthenon</strong></td><td>The temple of Athena</td><td>The platform itself — wisdom through evidence</td></tr><tr><td><strong>Aqueduct</strong></td><td>Roman water engineering</td><td>Bulk data ingestion and ETL pipelines</td></tr><tr><td><strong>Vulcan</strong></td><td>God of fire and the forge</td><td>FHIR integration — forging interoperability standards into CDM</td></tr><tr><td><strong>Poseidon</strong></td><td>God of the sea</td><td>Continuous data orchestration — commanding the waves of transactional data</td></tr><tr><td><strong>Achilles</strong></td><td>Greatest warrior of Troy</td><td>Data characterization — relentless, thorough, exhaustive</td></tr><tr><td><strong>Hecate</strong></td><td>Goddess of crossroads</td><td>Semantic vocabulary search — navigating the intersections of meaning</td></tr><tr><td><strong>Abby</strong></td><td>(Athena's owl)</td><td>AI assistant — intelligence through accumulated knowledge</td></tr><tr><td><strong>Ares</strong></td><td>God of war</td><td>Data quality dashboard — aggressive defense of data integrity</td></tr></tbody></table>
<p>Each name is chosen not just for flavor but for functional resonance. Vulcan forges raw FHIR resources into structured CDM records. Poseidon governs the tidal rhythms of data flow. The names tell you what each system does if you know the mythology — and they make the platform memorable for those who don't.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-this-means-for-research">What This Means for Research<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#what-this-means-for-research" class="hash-link" aria-label="Direct link to What This Means for Research" title="Direct link to What This Means for Research">​</a></h2>
<p>The practical impact of continuous ingestion is profound:</p>
<p><strong>Near-real-time cohort surveillance</strong> — Cohort definitions that previously reflected quarterly snapshots now reflect yesterday's admissions. Researchers can monitor recruitment criteria as patients enter the system, not after the fact.</p>
<p><strong>Faster time to analysis</strong> — When a new data source is onboarded through Aqueduct and handed off to Poseidon, subsequent updates are automatic. The analyst's CDM stays current without manual intervention.</p>
<p><strong>Reduced data engineering burden</strong> — Data stewards configure a schedule once. Poseidon handles the recurring execution, monitors for failures, and surfaces freshness issues. The human role shifts from executing pipelines to overseeing them.</p>
<p><strong>Improved data quality</strong> — Every Poseidon refresh runs dbt's built-in schema tests and custom quality assertions. Data quality is validated on every load, not as an afterthought.</p>
<p><strong>Auditable provenance</strong> — Every sync run, every CDM refresh, every test outcome is recorded. When a researcher asks "when was this data last updated?" or "did any quality checks fail?", the answer is one click away.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="looking-ahead">Looking Ahead<a href="http://localhost:8082/docs/blog/poseidon-and-vulcan#looking-ahead" class="hash-link" aria-label="Direct link to Looking Ahead" title="Direct link to Looking Ahead">​</a></h2>
<p>Vulcan and Poseidon represent Phase 1 and Phase 5 of a six-phase implementation plan. The remaining phases will add:</p>
<ul>
<li><strong>Core dbt models</strong> covering all 20+ OMOP CDM clinical tables with incremental materialization</li>
<li><strong>Dagster sensors and schedules</strong> for fully automated, event-driven pipeline execution</li>
<li><strong>Aqueduct-to-Poseidon handoff</strong> — confirmed mappings automatically generate dbt models</li>
<li><strong>Production hardening</strong> — retry policies, alerting, run history management, and Dagit UI proxy</li>
</ul>
<p>The gods have taken their stations. The data flows.</p>
<hr>
<p><em>Vulcan and Poseidon are available now in Parthenon's Data Ingestion module. Navigate to the Poseidon or Vulcan tabs to begin configuring continuous ingestion for your data sources.</em></p>]]></content:encoded>
            <category>architecture</category>
            <category>ingestion</category>
            <category>fhir</category>
            <category>dbt</category>
            <category>dagster</category>
            <category>omop</category>
            <category>pipeline</category>
        </item>
        <item>
            <title><![CDATA[Building a Clinically Intelligent Risk Scoring Engine on OMOP CDM]]></title>
            <link>http://localhost:8082/docs/blog/population-risk-scoring-engine</link>
            <guid>http://localhost:8082/docs/blog/population-risk-scoring-engine</guid>
            <pubDate>Sat, 28 Mar 2026 12:00:00 GMT</pubDate>
            <description><![CDATA[In Greek mythology, Tyche was the goddess of fortune, chance, and prosperity. Depicted with a cornucopia of abundance and the wheel of fate, she governed the unpredictable forces that determined whether a city would flourish or fall. The ancient Greeks understood that outcomes are shaped by forces beyond individual control — health, circumstance, and probability. In the Parthenon pantheon, Tyche presides over population risk scoring: the quantification of clinical probability, the stratification of patients by the likelihood of outcomes they cannot fully control, and the transformation of uncertainty into actionable intelligence.]]></description>
            <content:encoded><![CDATA[<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/Tyche.png" alt="Tyche, Greek goddess of fortune and chance" style="width:100%;display:block"></div>
<p><em>In Greek mythology, <strong>Tyche</strong> was the goddess of fortune, chance, and prosperity. Depicted with a cornucopia of abundance and the wheel of fate, she governed the unpredictable forces that determined whether a city would flourish or fall. The ancient Greeks understood that outcomes are shaped by forces beyond individual control — health, circumstance, and probability. In the Parthenon pantheon, Tyche presides over population risk scoring: the quantification of clinical probability, the stratification of patients by the likelihood of outcomes they cannot fully control, and the transformation of uncertainty into actionable intelligence.</em></p>
<p>We built a population risk scoring engine that runs 20 validated clinical risk calculators against any OMOP CDM dataset — then immediately realized the approach was wrong. This post covers what we built, why we tore it apart, and the v2 architecture that replaced "run everything on everyone" with cohort-scoped, recommendation-driven clinical risk analysis.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-with-run-all">The Problem with "Run All"<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#the-problem-with-run-all" class="hash-link" aria-label="Direct link to The Problem with &quot;Run All&quot;" title="Direct link to The Problem with &quot;Run All&quot;">​</a></h2>
<p>Clinical risk scores are precision instruments. A Framingham Risk Score was designed for adults aged 30-74 without prior cardiovascular events. CHADS2-VASc only applies to patients with atrial fibrillation. MELD is for liver disease severity. Running all 20 scores against a pancreatic cancer cohort produces a page full of "low" and "uncomputable" — clinically meaningless results that make the platform look naive.</p>
<p>But that's exactly what v1 did. We implemented 20 risk calculators, wired them to a "Run All" button, and watched the results pour in. Framingham returned "uncomputable" for 66% of our cancer patients (no lipid panels). CHADS2-VASc returned 0 for everyone (no atrial fibrillation). Charlson returned mean CCI of 0.37 for a cohort where every single patient has cancer — because the concept IDs were wrong.</p>
<p>That last part was the wake-up call.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="where-hallucinated-concepts-go-to-die">Where Hallucinated Concepts Go to Die<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#where-hallucinated-concepts-go-to-die" class="hash-link" aria-label="Direct link to Where Hallucinated Concepts Go to Die" title="Direct link to Where Hallucinated Concepts Go to Die">​</a></h2>
<p>Our first Charlson implementation used concept ID <code>4178681</code> for "any malignancy." It seemed right. The code was clean. The SQL ran without errors. The score computed to 0.37 for a cohort of 361 pancreatic cancer patients who should all score at least 2.</p>
<p>We queried the vocabulary:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> concept_name </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4178681</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<table><thead><tr><th>concept_id</th><th>concept_name</th></tr></thead><tbody><tr><td>4178681</td><td>Dermatological complication of procedure</td></tr></tbody></table>
<p>Not malignancy. A dermatological complication. The concept ID was fabricated — confidently wrong, plausibly formatted, and catastrophically misleading. Every patient in our cancer cohort was being matched against a skin procedure concept. Of course the CCI was near zero.</p>
<p>This wasn't an edge case. Ten of our twenty score implementations had the same problem: concept IDs pulled from training data rather than queried from the actual OMOP vocabulary. Some were close enough to pass a cursory review. Others were entirely fictional.</p>
<p>The fix was straightforward but non-negotiable: every concept ID must be verified against <code>vocab.concept</code> at development time, and resolved via <code>concept_ancestor</code> at runtime. No exceptions. No "I'm pretty sure this is right." Query the vocabulary or don't write the code.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-vocabulary-is-the-source-of-truth">The Vocabulary Is the Source of Truth<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#the-vocabulary-is-the-source-of-truth" class="hash-link" aria-label="Direct link to The Vocabulary Is the Source of Truth" title="Direct link to The Vocabulary Is the Source of Truth">​</a></h2>
<p>OMOP CDM's strength is its standardized vocabulary. Concept hierarchies, ancestor relationships, and cross-vocabulary mappings are the foundation that makes population-level analytics work. Ignoring them — or approximating them from memory — defeats the purpose.</p>
<p>Here's what the correct Charlson malignancy lookup looks like:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)">-- "Malignant neoplastic disease" (443392) is the verified ancestor</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> concept_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> concept_name </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">443392</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Returns: Malignant neoplastic disease</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Verify our PDAC concept is a descendant</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> min_levels_of_separation</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> vocab</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">concept_ancestor</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> ancestor_concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">443392</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> descendant_concept_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4180793</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"> </span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Malignant tumor of pancreas</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">-- Returns: 3 (three levels of separation — it IS a descendant)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>One query. Definitive answer. Our pancreatic cancer concept (4180793) sits three levels below the general malignancy ancestor (443392) in the SNOMED hierarchy. Every patient with PDAC now correctly matches the Charlson "any malignancy" condition group.</p>
<p>We verified all 17 Charlson condition groups this way:</p>
<table><thead><tr><th>Group</th><th>Ancestor</th><th>Verified Concept</th></tr></thead><tbody><tr><td>MI</td><td>4329847</td><td>Myocardial infarction</td></tr><tr><td>CHF</td><td>319835</td><td>Congestive heart failure</td></tr><tr><td>Malignancy</td><td>443392</td><td>Malignant neoplastic disease</td></tr><tr><td>Metastatic tumor</td><td>432851</td><td>Metastatic malignant neoplasm</td></tr><tr><td>Diabetes</td><td>201820</td><td>Diabetes mellitus</td></tr><tr><td>COPD</td><td>255573</td><td>Chronic obstructive pulmonary disease</td></tr><tr><td>Renal disease</td><td>46271022</td><td>Chronic kidney disease</td></tr><tr><td>HIV/AIDS</td><td>439727</td><td>Human immunodeficiency virus infection</td></tr><tr><td>...</td><td>...</td><td>...</td></tr></tbody></table>
<p>With verified ancestors and runtime descendant resolution, the Charlson now correctly scores our pancreatic cancer cohort: <strong>226 patients at CCI=2 (cancer only), 135 patients at CCI=3 (cancer + Type 2 diabetes).</strong></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="from-run-all-to-recommendation-driven">From "Run All" to Recommendation-Driven<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#from-run-all-to-recommendation-driven" class="hash-link" aria-label="Direct link to From &quot;Run All&quot; to Recommendation-Driven" title="Direct link to From &quot;Run All&quot; to Recommendation-Driven">​</a></h2>
<p>The concept ID fix was necessary but not sufficient. The fundamental design was still wrong: presenting 20 scores to every user for every cohort. A researcher studying pancreatic cancer doesn't need CURB-65 (pneumonia severity) or STOP-BANG (sleep apnea risk). Showing them alongside Charlson creates noise and erodes trust.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="v2-architecture-cohort-scoped-risk-analysis">v2 Architecture: Cohort-Scoped Risk Analysis<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#v2-architecture-cohort-scoped-risk-analysis" class="hash-link" aria-label="Direct link to v2 Architecture: Cohort-Scoped Risk Analysis" title="Direct link to v2 Architecture: Cohort-Scoped Risk Analysis">​</a></h3>
<p>The redesigned engine is built around a simple principle: <strong>risk scores are only meaningful when applied to the right population.</strong> The system should know which scores apply and recommend them.</p>
<p><strong>The flow:</strong></p>
<ol>
<li>Researcher selects a target cohort (e.g., "All PDAC Patients" — 361 subjects)</li>
<li>The recommendation engine profiles the cohort: demographics, condition prevalence, measurement availability</li>
<li>Based on the profile, it recommends applicable scores with relevance reasons:<!-- -->
<ul>
<li>Charlson CCI: <strong>Recommended</strong> — "100% of cohort has malignancy conditions; 37% have diabetes"</li>
<li>FIB-4 Index: <strong>Recommended</strong> — "Liver function relevant for chemo hepatotoxicity monitoring; labs available"</li>
<li>CHADS2-VASc: <strong>Not applicable</strong> — "Less than 1% atrial fibrillation prevalence in cohort"</li>
</ul>
</li>
<li>Researcher confirms selection</li>
<li>Scores execute scoped to the cohort membership, storing patient-level results</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="score-eligibility-criteria">Score Eligibility Criteria<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#score-eligibility-criteria" class="hash-link" aria-label="Direct link to Score Eligibility Criteria" title="Direct link to Score Eligibility Criteria">​</a></h3>
<p>Each score declares its eligibility as structured criteria, not just a human-readable string:</p>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">public</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">function</span><span class="token plain"> </span><span class="token function-definition function" style="color:hsl(207, 82%, 66%)">eligibilityCriteria</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token keyword return-type" style="color:hsl(286, 60%, 67%)">array</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'population_type'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'universal'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// Universal scores (Charlson, Elixhauser) apply to any cohort.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// Condition-specific scores (CHADS2-VASc, MELD) require prerequisite conditions.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// Age-restricted scores (Framingham, SCORE2) need patients in the right age range.</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The recommendation engine uses these criteria plus the cohort's actual clinical profile to make intelligent suggestions. A cardiovascular screening cohort gets Framingham and Pooled Cohort Equations. A liver disease cohort gets MELD and Child-Pugh. A cancer cohort gets Charlson, Elixhauser, and Multimorbidity Burden.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="runtime-concept-resolution">Runtime Concept Resolution<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#runtime-concept-resolution" class="hash-link" aria-label="Direct link to Runtime Concept Resolution" title="Direct link to Runtime Concept Resolution">​</a></h3>
<p>Instead of hardcoded concept IDs in SQL templates, v2 scores declare clinical condition groups with verified ancestor concepts:</p>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">public</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">function</span><span class="token plain"> </span><span class="token function-definition function" style="color:hsl(207, 82%, 66%)">conditionGroups</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token keyword return-type" style="color:hsl(286, 60%, 67%)">array</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">return</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'label'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'Myocardial infarction'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'ancestor_concept_id'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4329847</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'weight'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">1</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'label'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'Malignant neoplastic disease'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'ancestor_concept_id'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">443392</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'weight'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">2</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'label'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'Metastatic malignant neoplasm'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'ancestor_concept_id'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">432851</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'weight'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">6</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>At execution time, the <code>ConceptResolutionService</code> resolves each ancestor to its full descendant set via <code>concept_ancestor</code>. This means:</p>
<ul>
<li>Different vocabulary versions produce correct results automatically</li>
<li>No hardcoded concept IDs in scoring logic</li>
<li>The vocabulary is always the source of truth, queried live</li>
</ul>
<p>Results are cached for one hour to avoid redundant ancestor lookups across multiple score executions.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="pure-computation-separate-data-access">Pure Computation, Separate Data Access<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#pure-computation-separate-data-access" class="hash-link" aria-label="Direct link to Pure Computation, Separate Data Access" title="Direct link to Pure Computation, Separate Data Access">​</a></h3>
<p>v1 scores were SQL templates — the scoring logic was tangled with data access. A Charlson score was a 200-line SQL CTE chain that both fetched conditions and computed weights. Debugging meant reading SQL. Testing meant running against a database.</p>
<p>v2 separates these concerns:</p>
<ol>
<li><strong>PatientFeatureExtractor</strong> — queries condition_occurrence, measurement, and person tables for the entire cohort in one efficient batch</li>
<li><strong>Score.compute()</strong> — a pure PHP function that receives extracted features and returns a score. No database access. Testable with mock data.</li>
</ol>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">public</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">function</span><span class="token plain"> </span><span class="token function-definition function" style="color:hsl(207, 82%, 66%)">compute</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token keyword type-hint" style="color:hsl(286, 60%, 67%)">array</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$patientData</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token keyword return-type" style="color:hsl(286, 60%, 67%)">array</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// $patientData contains: age, gender, conditions (as ancestor IDs), measurements</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)">// Returns: score value, risk tier, confidence, completeness, missing components</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This makes each score independently testable, debuggable, and auditable. The Charlson <code>compute()</code> method is 50 lines of clear PHP logic with explicit supersession rules (metastatic trumps malignancy, severe liver trumps mild liver).</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="patient-level-persistence">Patient-Level Persistence<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#patient-level-persistence" class="hash-link" aria-label="Direct link to Patient-Level Persistence" title="Direct link to Patient-Level Persistence">​</a></h3>
<p>v1 stored only population summaries — mean scores and tier counts. Useful for dashboards, useless for research. v2 stores every patient's individual score:</p>
<div class="language-sql codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-sql codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token keyword" style="color:hsl(286, 60%, 67%)">SELECT</span><span class="token plain"> person_id</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> score_value</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> risk_tier</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> confidence</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">FROM</span><span class="token plain"> app</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">risk_score_patient_results</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">WHERE</span><span class="token plain"> score_id </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'RS005'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">AND</span><span class="token plain"> risk_tier </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">'moderate'</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">ORDER</span><span class="token plain"> </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">BY</span><span class="token plain"> score_value </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">DESC</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">;</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This enables:</p>
<ul>
<li>Patient-level drill-through from any risk tier to the Patient Profile</li>
<li>Using risk scores as cohort inclusion criteria (future: "Charlson &gt;= 3" as a cohort filter)</li>
<li>Exporting patient-level risk stratification for downstream analysis</li>
<li>Comparing risk distributions across cohorts</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-20-scores">The 20 Scores<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#the-20-scores" class="hash-link" aria-label="Direct link to The 20 Scores" title="Direct link to The 20 Scores">​</a></h2>
<p>Parthenon ships with 20 validated clinical risk calculators spanning six clinical domains:</p>
<table><thead><tr><th>Category</th><th>Scores</th><th>Key Use Case</th></tr></thead><tbody><tr><td><strong>Cardiovascular</strong></td><td>Framingham, Pooled Cohort Equations, CHA2DS2-VASc, HAS-BLED, SCORE2, TIMI, GRACE, CHADS2, RCRI</td><td>CV event prediction, stroke risk in AF, bleeding risk, pre-operative cardiac risk</td></tr><tr><td><strong>Comorbidity</strong></td><td>Charlson CCI, Elixhauser, Multimorbidity Burden</td><td>Overall disease burden, mortality prediction, resource utilization</td></tr><tr><td><strong>Hepatic</strong></td><td>MELD, Child-Pugh, FIB-4</td><td>Liver transplant priority, cirrhosis severity, fibrosis staging</td></tr><tr><td><strong>Pulmonary</strong></td><td>CURB-65, STOP-BANG</td><td>Pneumonia severity, sleep apnea screening</td></tr><tr><td><strong>Metabolic</strong></td><td>Metabolic Syndrome Score, DCSI</td><td>Metabolic risk clustering, diabetes complications</td></tr><tr><td><strong>Musculoskeletal</strong></td><td>FRAX</td><td>Osteoporotic fracture risk</td></tr></tbody></table>
<p>Each score implements the same v2 interface. Adding a new score means implementing one PHP class with ~100 lines of code: eligibility criteria, condition/measurement groups, risk tiers, and a <code>compute()</code> method.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="running-it-on-the-pancreatic-cancer-corpus">Running It on the Pancreatic Cancer Corpus<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#running-it-on-the-pancreatic-cancer-corpus" class="hash-link" aria-label="Direct link to Running It on the Pancreatic Cancer Corpus" title="Direct link to Running It on the Pancreatic Cancer Corpus">​</a></h2>
<p>Our test dataset: 361 patients with pancreatic ductal adenocarcinoma (PDAC) across three sub-cohorts — 21 PANCREAS-CT imaging patients, 168 CPTAC-PDA pathology patients, and 172 TCGA-PAAD genomics patients. Full clinical trajectories: visits, labs, drugs, conditions, procedures, specimens, 1,227 clinical notes, and genomic mutation profiles (KRAS/TP53/SMAD4/CDKN2A).</p>
<p>We ran the recommendation engine against the "All PDAC Patients" cohort:</p>
<p><strong>Recommended:</strong></p>
<ul>
<li>Charlson CCI — universal applicability, 100% have malignancy conditions</li>
<li>Elixhauser Index — universal, captures T2DM, cachexia, DVT</li>
<li>Multimorbidity Burden — broad comorbidity assessment</li>
<li>FIB-4 — liver function labs available, relevant for chemotherapy hepatotoxicity monitoring</li>
</ul>
<p><strong>Not applicable:</strong></p>
<ul>
<li>CHADS2-VASc, CHADS2 — less than 1% atrial fibrillation</li>
<li>MELD, Child-Pugh — no primary liver disease</li>
<li>CURB-65 — no pneumonia diagnoses</li>
<li>Framingham, PCE, SCORE2 — missing lipid panels for most patients</li>
</ul>
<p>This is exactly what a clinical researcher would expect. The engine's recommendations align with clinical judgment because they're derived from the actual data, not from assumptions about what a cancer cohort "probably" needs.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="charlson-cci-results">Charlson CCI Results<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#charlson-cci-results" class="hash-link" aria-label="Direct link to Charlson CCI Results" title="Direct link to Charlson CCI Results">​</a></h3>
<table><thead><tr><th>Tier</th><th>Patients</th><th>Mean CCI</th><th>Interpretation</th></tr></thead><tbody><tr><td>Low (0-2)</td><td>226</td><td>2.0</td><td>Cancer only — no additional comorbidities</td></tr><tr><td>Moderate (3-4)</td><td>135</td><td>3.0</td><td>Cancer + one comorbidity (typically T2DM)</td></tr></tbody></table>
<p>All 361 patients correctly score at least 2 (any malignancy, weight 2). The 37% with Type 2 diabetes score 3 (malignancy + diabetes, weight 1). No patient scores below 2. No patient is "uncomputable." The vocabulary hierarchy resolution works.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The v2 backend is complete. Remaining work:</p>
<ol>
<li><strong>Frontend analysis creator</strong> — cohort selector with recommendation cards, score selection, execution modal (replicating the Achilles UX pattern)</li>
<li><strong>Results visualization</strong> — tier distribution charts, patient drill-through tables</li>
<li><strong>Score migration</strong> — converting the remaining 19 scores from v1 SQL templates to v2 pure-compute implementations</li>
<li><strong>Cohort builder integration</strong> — using risk scores as cohort inclusion criteria ("Charlson &gt;= 3 AND KRAS mutant" as a single cohort definition)</li>
</ol>
<p>The architectural lesson: clinical analytics tools must respect clinical context. A risk score without population awareness is just a number. A risk score that knows when it's relevant — and when it's not — is a clinical decision support tool.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="technical-summary">Technical Summary<a href="http://localhost:8082/docs/blog/population-risk-scoring-engine#technical-summary" class="hash-link" aria-label="Direct link to Technical Summary" title="Direct link to Technical Summary">​</a></h2>
<table><thead><tr><th>Component</th><th>Technology</th></tr></thead><tbody><tr><td>Score Engine</td><td>Laravel 11 / PHP 8.4</td></tr><tr><td>Vocabulary Resolution</td><td>vocab.concept_ancestor (runtime, cached)</td></tr><tr><td>Feature Extraction</td><td>Bulk SQL with DISTINCT ON, PostgreSQL ANY()</td></tr><tr><td>Patient Storage</td><td>app.risk_score_patient_results (indexed by cohort + person)</td></tr><tr><td>Execution Tracking</td><td>AnalysisExecution polymorphism + RiskScoreRunStep</td></tr><tr><td>Score Interface</td><td>PopulationRiskScoreV2Interface with pure compute()</td></tr><tr><td>Database</td><td>PostgreSQL 17, OMOP CDM v5.4</td></tr></tbody></table>
<p>All 20 scores, the recommendation engine, and the execution pipeline are open source under Apache 2.0.</p>]]></content:encoded>
            <category>risk-scores</category>
            <category>omop</category>
            <category>clinical-analytics</category>
            <category>architecture</category>
            <category>cohort-analysis</category>
            <category>vocabulary</category>
        </item>
        <item>
            <title><![CDATA[The Magical Ladies of Parthenon]]></title>
            <link>http://localhost:8082/docs/blog/magical-ladies-of-parthenon</link>
            <guid>http://localhost:8082/docs/blog/magical-ladies-of-parthenon</guid>
            <pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[In Greek mythology, the great temple atop the Acropolis housed not just Athena, but an entire pantheon of divine figures — each wielding a unique gift. Parthenon, our unified OHDSI outcomes research platform, follows the same philosophy. Behind the scenes, four mythological women power the intelligence layer that transforms raw clinical data into actionable research: Hecate, Phoebe, Ariadne, and Arachne.]]></description>
            <content:encoded><![CDATA[<p>In Greek mythology, the great temple atop the Acropolis housed not just Athena, but an entire pantheon of divine figures — each wielding a unique gift. Parthenon, our unified OHDSI outcomes research platform, follows the same philosophy. Behind the scenes, four mythological women power the intelligence layer that transforms raw clinical data into actionable research: <strong>Hecate</strong>, <strong>Phoebe</strong>, <strong>Ariadne</strong>, and <strong>Arachne</strong>.</p>
<div style="text-align:center;margin:2rem 0"><img src="http://localhost:8082/docs/img/magical-ladies.png" alt="Hecate, Phoebe, Ariadne, and Arachne — the four mythological engines of Parthenon" style="border-radius:16px;max-width:100%;box-shadow:0 8px 32px rgba(0,0,0,0.4)"><p style="font-size:0.85rem;color:#8A857D;margin-top:0.75rem;font-style:italic"></p><p>From left to right: Hecate (torch-bearer of hidden knowledge), Ariadne (thread-spinner of vocabulary mappings),
Phoebe (oracle of concept relationships), and Arachne (weaver of the federated network).</p><p></p></div>
<p>Each of these engines appears throughout the Parthenon interface as a distinctive "Powered by" pill — teal for Hecate, gold for Phoebe, crimson for Ariadne, and violet for Arachne. They aren't cosmetic labels. They represent four fundamentally different approaches to the same grand challenge: helping researchers find the right concepts, build complete concept sets, map between vocabularies, and execute studies across a distributed network of clinical databases.</p>
<p>This post tells the story of who they are, what they do, and how they came to life.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="hecate-the-torch-bearer-of-hidden-knowledge">Hecate: The Torch-Bearer of Hidden Knowledge<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#hecate-the-torch-bearer-of-hidden-knowledge" class="hash-link" aria-label="Direct link to Hecate: The Torch-Bearer of Hidden Knowledge" title="Direct link to Hecate: The Torch-Bearer of Hidden Knowledge">​</a></h2>
<p><strong>Color:</strong> Teal (#2DD4BF) | <strong>Domain:</strong> Semantic concept search | <strong>Technology:</strong> Vector embeddings + Qdrant</p>
<p>In mythology, Hecate stood at crossroads with a torch in each hand, illuminating paths hidden from mortal sight. In Parthenon, she does the same for clinical concepts.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-she-solves">The Problem She Solves<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-problem-she-solves" class="hash-link" aria-label="Direct link to The Problem She Solves" title="Direct link to The Problem She Solves">​</a></h3>
<p>Traditional vocabulary search is keyword-based. Search for "heart attack" and you'll find concepts named "heart attack" — but you might miss <em>myocardial infarction</em>, <em>STEMI</em>, <em>acute coronary syndrome</em>, or <em>troponin elevation</em>. Clinical researchers think in medical concepts, not in exact vocabulary strings. The gap between how a researcher thinks about a condition and how OMOP CDM encodes it can mean the difference between a complete cohort and a dangerously incomplete one.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-she-works">How She Works<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#how-she-works" class="hash-link" aria-label="Direct link to How She Works" title="Direct link to How She Works">​</a></h3>
<p>Hecate operates through a three-layer architecture:</p>
<ol>
<li>
<p><strong>Embedding Layer (Ollama + EmbeddingGemma-300M):</strong> Every standard concept in the OMOP vocabulary (1,968,694 of them) is passed through a medical-domain embedding model running locally via Ollama. Each concept name becomes a 768-dimensional vector that captures its <em>semantic meaning</em>, not just its characters.</p>
</li>
<li>
<p><strong>Vector Index (Qdrant):</strong> These ~2 million vectors are stored in a Qdrant collection called <code>meddra</code>, with cosine similarity indexing. When a researcher types a query, Hecate embeds the query text through the same model and performs approximate nearest-neighbor search against the full vocabulary.</p>
</li>
<li>
<p><strong>Concept Resolution (PostgreSQL):</strong> The nearest vectors map back to OMOP concept IDs through a pairs file (1.94 million unique concept names), and the full concept metadata (domain, vocabulary, class, standard status) is resolved from PostgreSQL.</p>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-makes-her-special">What Makes Her Special<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#what-makes-her-special" class="hash-link" aria-label="Direct link to What Makes Her Special" title="Direct link to What Makes Her Special">​</a></h3>
<p>Search for "sugar disease" and Hecate returns <em>Diabetes mellitus</em> (SNOMED 201820) at 0.93 similarity. Search for "broken hip" and she returns <em>Fracture of neck of femur</em> alongside <em>Hip fracture</em> and <em>Intertrochanteric fracture</em>. She understands medical synonymy, abbreviations, and even casual descriptions — because the embedding model learned those relationships from medical literature.</p>
<p>She also powers the autocomplete in Parthenon's vocabulary browser, the concept search within the ETL mapping tool (Aqueduct), and the concept picker in cohort definitions.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-numbers">The Numbers<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers">​</a></h3>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Total concepts embedded</td><td>1,968,694</td></tr><tr><td>Phase 1 (Clinical)</td><td>705,294 concepts</td></tr><tr><td>Phase 2 (Drug/RxNorm)</td><td>1,263,400 concepts</td></tr><tr><td>Embedding dimension</td><td>768</td></tr><tr><td>Model</td><td>EmbeddingGemma-300M (local)</td></tr><tr><td>Index</td><td>Qdrant v1.17, cosine similarity</td></tr><tr><td>Query latency</td><td>~50ms typical</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phoebe-the-oracle-of-concept-relationships">Phoebe: The Oracle of Concept Relationships<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#phoebe-the-oracle-of-concept-relationships" class="hash-link" aria-label="Direct link to Phoebe: The Oracle of Concept Relationships" title="Direct link to Phoebe: The Oracle of Concept Relationships">​</a></h2>
<p><strong>Color:</strong> Gold (#C9A227) | <strong>Domain:</strong> Concept set recommendations | <strong>Technology:</strong> Pre-computed co-occurrence network from 22 global data sources</p>
<p>Phoebe was the Titan of prophecy and radiant intellect — grandmother of Apollo and Artemis, keeper of the Oracle at Delphi before Apollo claimed it. In Parthenon, she whispers to researchers: <em>"You're building a concept set for diabetes — have you considered these 733 related concepts?"</em></p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-she-solves-1">The Problem She Solves<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-problem-she-solves-1" class="hash-link" aria-label="Direct link to The Problem She Solves" title="Direct link to The Problem She Solves">​</a></h3>
<p>Building a comprehensive concept set is one of the hardest tasks in observational research. A researcher creating a cohort for "Type 2 Diabetes" needs to decide: should I include <em>Diabetes mellitus type 2 without complication</em>? What about <em>Diabetic neuropathy</em>? <em>Insulin resistance</em>? <em>HbA1c measurement</em>? The OMOP vocabulary contains millions of concepts with complex hierarchical and lateral relationships. Missing a critical concept can bias an entire study.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-she-works-1">How She Works<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#how-she-works-1" class="hash-link" aria-label="Direct link to How She Works" title="Direct link to How She Works">​</a></h3>
<p>Phoebe is powered by the OHDSI <strong>concept_recommended</strong> dataset — a pre-computed network of 3,768,447 concept-to-concept recommendation pairs, derived from analyzing concept usage patterns across <strong>22 real-world healthcare databases</strong> spanning <strong>6 countries</strong> and <strong>272 billion clinical records</strong>.</p>
<p>The recommendations come in five relationship types:</p>
<table><thead><tr><th>Relationship</th><th>Count</th><th>What It Captures</th></tr></thead><tbody><tr><td><strong>Lexical via standard</strong></td><td>1,383,892</td><td>Concepts with similar names in standard vocabularies</td></tr><tr><td><strong>Ontology-descendant</strong></td><td>1,111,848</td><td>Child concepts in the vocabulary hierarchy</td></tr><tr><td><strong>Ontology-parent</strong></td><td>1,095,982</td><td>Parent concepts in the vocabulary hierarchy</td></tr><tr><td><strong>Patient context</strong></td><td>135,033</td><td>Concepts that co-occur in the same patients across databases</td></tr><tr><td><strong>Lexical via source</strong></td><td>41,692</td><td>Concepts with similar names in source vocabularies</td></tr></tbody></table>
<p>The <strong>Patient context</strong> relationships are the most valuable — they represent real-world clinical co-occurrence patterns. If patients with <em>Diabetes mellitus</em> frequently also have records for <em>Diabetic retinopathy screening</em>, that relationship is captured even though the two concepts are in different domains and different vocabulary hierarchies.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-makes-her-special-1">What Makes Her Special<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#what-makes-her-special-1" class="hash-link" aria-label="Direct link to What Makes Her Special" title="Direct link to What Makes Her Special">​</a></h3>
<p>When a researcher selects concept 201820 (Diabetes mellitus), Phoebe returns 733 recommended concepts spanning complications (neuropathy, retinopathy, nephropathy), related measurements (HbA1c, fasting glucose), medications (metformin, insulin), and associated conditions (metabolic syndrome, obesity). She surfaces concepts that a researcher <em>should consider</em> based on how the global OHDSI network actually uses them together.</p>
<p>She's integrated into Parthenon's Concept Set Editor — as you add concepts to your set, Phoebe aggregates recommendations across all included concepts, deduplicates them, and ranks by relevance. The panel is collapsible and non-intrusive, but when expanded, it's a revelation.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-data-pipeline">The Data Pipeline<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-data-pipeline" class="hash-link" aria-label="Direct link to The Data Pipeline" title="Direct link to The Data Pipeline">​</a></h3>
<p>The concept_recommended dataset is published by OHDSI through the <a href="https://github.com/OHDSI/Broadsea" target="_blank" rel="noopener noreferrer">Broadsea</a> project and is based on the <a href="https://github.com/ohdsi-studies/ConceptPrevalence" target="_blank" rel="noopener noreferrer">ConceptPrevalence study</a> led by Anna Ostropolets. We load it into a <code>vocab.phoebe</code> table and query it directly — no external service dependency, sub-millisecond response times.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="ariadne-the-thread-spinner-of-vocabulary-mappings">Ariadne: The Thread-Spinner of Vocabulary Mappings<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#ariadne-the-thread-spinner-of-vocabulary-mappings" class="hash-link" aria-label="Direct link to Ariadne: The Thread-Spinner of Vocabulary Mappings" title="Direct link to Ariadne: The Thread-Spinner of Vocabulary Mappings">​</a></h2>
<p><strong>Color:</strong> Crimson (#9B1B30 / #E85A6B) | <strong>Domain:</strong> AI-assisted source-to-standard concept mapping | <strong>Technology:</strong> RAG pipeline + LLM reasoning</p>
<p>Ariadne gave Theseus a ball of thread to navigate the Labyrinth and slay the Minotaur. In Parthenon, she gives data engineers a thread through the labyrinth of source-to-standard vocabulary mapping — arguably the most labor-intensive step in any OMOP ETL pipeline.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-she-solves-2">The Problem She Solves<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-problem-she-solves-2" class="hash-link" aria-label="Direct link to The Problem She Solves" title="Direct link to The Problem She Solves">​</a></h3>
<p>When a hospital's EHR uses the code "DM2" for Type 2 Diabetes, someone needs to map that to OMOP concept 201826 (<em>Type 2 diabetes mellitus</em>). When a lab system reports "GLU-F" for fasting glucose, someone needs to find LOINC concept 2345-7 (<em>Glucose [Mass/volume] in Serum or Plasma</em>). A typical ETL project involves mapping thousands of source codes, and each mapping requires domain expertise, vocabulary knowledge, and careful judgment.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-she-works-2">How She Works<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#how-she-works-2" class="hash-link" aria-label="Direct link to How She Works" title="Direct link to How She Works">​</a></h3>
<p>Ariadne operates as an AI mapping assistant in Parthenon's Mapping Assistant page. She combines:</p>
<ol>
<li><strong>Hecate's semantic search</strong> to find candidate standard concepts for each source code</li>
<li><strong>Vocabulary context</strong> from concept hierarchies, relationships, and domain constraints</li>
<li><strong>LLM reasoning</strong> to evaluate candidates and suggest the best mapping with a confidence score and rationale</li>
</ol>
<p>The researcher sees a side-by-side interface: source codes on the left, Ariadne's suggestions on the right. Each suggestion includes the recommended standard concept, a confidence percentage, the mapping type (direct, lookup, transform), and a natural-language explanation of <em>why</em> this mapping makes sense.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-makes-her-special-2">What Makes Her Special<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#what-makes-her-special-2" class="hash-link" aria-label="Direct link to What Makes Her Special" title="Direct link to What Makes Her Special">​</a></h3>
<p>Ariadne doesn't just pattern-match strings. She understands that "BP systolic" should map to a <em>Measurement</em> domain concept, not a <em>Condition</em>. She knows that drug mappings should target RxNorm Clinical Drug concepts, not ingredient-level concepts. She respects the OMOP conventions for concept class, domain, and standard status — because she's been trained on the vocabulary structure itself.</p>
<p>She also learns from the mappings you accept. As you work through a mapping project, the patterns you confirm help her make better suggestions for subsequent codes. She's a tireless assistant who gets smarter as you work.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="arachne-the-weaver-of-the-federated-network">Arachne: The Weaver of the Federated Network<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#arachne-the-weaver-of-the-federated-network" class="hash-link" aria-label="Direct link to Arachne: The Weaver of the Federated Network" title="Direct link to Arachne: The Weaver of the Federated Network">​</a></h2>
<p><strong>Color:</strong> Violet (#8B5CF6 / #A78BFA) | <strong>Domain:</strong> Federated study execution | <strong>Technology:</strong> OHDSI Arachne Central integration</p>
<p>Arachne was the mortal weaver who challenged Athena herself — her tapestries so perfect that the goddess transformed her into a spider, forever weaving intricate webs that connect distant points. In Parthenon, Arachne weaves a web of federated data nodes, enabling studies to execute across multiple institutions without centralizing patient data.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-she-solves-3">The Problem She Solves<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-problem-she-solves-3" class="hash-link" aria-label="Direct link to The Problem She Solves" title="Direct link to The Problem She Solves">​</a></h3>
<p>The fundamental tension in multi-site clinical research: you need data from many hospitals to achieve statistical power, but you can't (and shouldn't) move patient data to a central location. HIPAA, GDPR, and institutional policies all forbid it. The traditional solution — months of IRB negotiations, data use agreements, and manual result aggregation — makes large-scale studies impractical.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-she-works-3">How She Works<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#how-she-works-3" class="hash-link" aria-label="Direct link to How She Works" title="Direct link to How She Works">​</a></h3>
<p>Arachne integrates with <a href="https://github.com/OHDSI/Arachne" target="_blank" rel="noopener noreferrer">OHDSI Arachne Central</a>, a federated execution platform. The workflow:</p>
<ol>
<li>
<p><strong>Study Design (Parthenon):</strong> A researcher designs their study — cohort definitions, analysis packages, outcome measures — entirely within Parthenon's study workspace.</p>
</li>
<li>
<p><strong>Node Discovery (Arachne):</strong> Parthenon queries Arachne Central for available data nodes — institutions that have registered their OMOP CDM databases and agreed to participate in federated analyses.</p>
</li>
<li>
<p><strong>Distribution (Arachne):</strong> With one click, the researcher distributes their analysis package to selected nodes. Arachne Central handles authentication, package delivery, and execution coordination.</p>
</li>
<li>
<p><strong>Execution (Remote):</strong> Each data node runs the analysis locally against its own OMOP CDM database. Patient-level data never leaves the institution. Only aggregate results (counts, statistics, effect estimates) are returned.</p>
</li>
<li>
<p><strong>Aggregation (Parthenon):</strong> Results flow back through Arachne Central into Parthenon, where they're displayed in a unified results viewer with per-node breakdowns.</p>
</li>
</ol>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-makes-her-special-3">What Makes Her Special<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#what-makes-her-special-3" class="hash-link" aria-label="Direct link to What Makes Her Special" title="Direct link to What Makes Her Special">​</a></h3>
<p>Arachne makes the federated model <em>invisible</em> to the researcher. You don't need to know which hospitals are participating, what their IRB requirements are, or how to package an R script for remote execution. You design your study, click "Distribute," and watch results arrive from across the network.</p>
<p>The Federated Execution tab in Parthenon's study workspace shows real-time status for each node — queued, running, completed, or failed — with the ability to drill into per-node results. It transforms what used to be a months-long coordination effort into a same-day operation.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-pantheon-together">The Pantheon Together<a href="http://localhost:8082/docs/blog/magical-ladies-of-parthenon#the-pantheon-together" class="hash-link" aria-label="Direct link to The Pantheon Together" title="Direct link to The Pantheon Together">​</a></h2>
<p>These four engines are independent but complementary. A typical research workflow touches all of them:</p>
<ol>
<li><strong>Hecate</strong> helps you <em>find</em> the concepts you're looking for, even when you don't know the exact vocabulary terms</li>
<li><strong>Phoebe</strong> helps you <em>complete</em> your concept set by recommending related concepts you might have missed</li>
<li><strong>Ariadne</strong> helps you <em>map</em> your source data to the OMOP standard, so your local data is compatible with the global network</li>
<li><strong>Arachne</strong> helps you <em>execute</em> your study across that global network, bringing federated evidence to bear on your research question</li>
</ol>
<p>They're named after figures from Greek mythology not as a whimsical branding exercise, but because each one's mythological role maps precisely to their function in the platform. Hecate illuminates hidden paths. Phoebe prophesies connections. Ariadne provides the thread through the labyrinth. Arachne weaves the web that connects distant nodes.</p>
<p>Together, they make Parthenon more than a tool — they make it an intelligent research companion that understands clinical vocabularies, anticipates researcher needs, and bridges the gap between local data and global evidence.</p>
<hr>
<p><em>The Magical Ladies of Parthenon are all open-source, built on OHDSI standards, and running in production at Acumenus Data Sciences. If you'd like to learn more about any of them, explore the <a href="https://parthenon.acumenus.net/docs" target="_blank" rel="noopener noreferrer">Parthenon documentation</a> or reach out to the team.</em></p>]]></content:encoded>
            <category>architecture</category>
            <category>hecate</category>
            <category>phoebe</category>
            <category>ariadne</category>
            <category>arachne</category>
            <category>vocabulary</category>
            <category>ai</category>
            <category>federated</category>
            <category>concept-sets</category>
        </item>
        <item>
            <title><![CDATA[Building the Ingestion Pipeline: File Staging, Project Management, and the Path to Aqueduct]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-26</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-26</guid>
            <pubDate>Thu, 26 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A massive day on the ingestion front — 87 commits landed in Parthenon today, almost entirely focused on building out a brand-new end-to-end data ingestion pipeline. We now have a fully wired system for creating ingestion projects, uploading raw files, staging them into a schema-isolated PostgreSQL environment, and handing off to Aqueduct for ETL. This has been a long time coming.]]></description>
            <content:encoded><![CDATA[<p>A massive day on the ingestion front — 87 commits landed in Parthenon today, almost entirely focused on building out a brand-new end-to-end data ingestion pipeline. We now have a fully wired system for creating ingestion projects, uploading raw files, staging them into a schema-isolated PostgreSQL environment, and handing off to Aqueduct for ETL. This has been a long time coming.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-ingestion-pipeline-from-zero-to-staged-data">The Ingestion Pipeline: From Zero to Staged Data<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#the-ingestion-pipeline-from-zero-to-staged-data" class="hash-link" aria-label="Direct link to The Ingestion Pipeline: From Zero to Staged Data" title="Direct link to The Ingestion Pipeline: From Zero to Staged Data">​</a></h2>
<p>The headline work today is the ingestion subsystem — a cohesive feature that takes a researcher from "I have some CSV files" to "my data is staged and ready for CDM mapping," all within the Parthenon UI.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="project-model-and-access-control">Project Model and Access Control<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#project-model-and-access-control" class="hash-link" aria-label="Direct link to Project Model and Access Control" title="Direct link to Project Model and Access Control">​</a></h3>
<p>Everything starts with <code>IngestionProject</code> — a new Eloquent model and accompanying Laravel policy (<code>aacf41c93</code>). Projects act as the top-level container for a researcher's raw data, tracking lifecycle state from initial creation through file upload, staging, and ultimately a <strong>ready</strong> status that unlocks downstream actions. The policy enforces ownership and role-based access from the start, ensuring researchers only see and act on their own projects.</p>
<p>A dedicated set of form requests and a full <code>IngestionProjectController</code> (<code>f48992b5b</code>) wire up the REST surface — create, list, show, and status-transition endpoints — all sitting behind properly scoped middleware. Notably, a follow-up fix (<code>60bd93bf7</code>) patched a gap where the ingestion routes were missing permission middleware entirely; that's now resolved and serves as a reminder to audit new route groups at the point of creation rather than after.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="queue-based-file-staging">Queue-Based File Staging<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#queue-based-file-staging" class="hash-link" aria-label="Direct link to Queue-Based File Staging" title="Direct link to Queue-Based File Staging">​</a></h3>
<p>The core of the pipeline is <code>StageFileJob</code> (<code>58ed82726</code>), a queued Laravel job that handles the heavy lifting of getting uploaded files into a usable database structure. Each file gets dispatched independently, meaning multi-file uploads process in parallel without blocking the UI. The job hands off to <code>StagingService</code> (<code>28797e458</code>), which is responsible for:</p>
<ul>
<li><strong>Schema creation</strong>: Each ingestion project gets its own isolated PostgreSQL schema, preventing cross-project data bleed during the staging phase.</li>
<li><strong>Data loading</strong>: Reads uploaded files and bulk-loads rows into the staging schema, handling type inference at the column level.</li>
</ul>
<p>Alongside staging, we introduced a column and table name sanitizer (<code>aacf41c93</code>) that handles the unglamorous but critical job of cleaning arbitrary user-supplied headers into valid SQL identifiers. It handles reserved word collisions, strips illegal characters, and deduplicates columns — exactly the kind of defensive logic that prevents subtle downstream failures when researchers upload files with headers like <code>"order"</code>, <code>"select"</code>, or <code>"patient id (v2) [final]"</code>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="frontend-project-list-detail-view-and-multi-file-upload">Frontend: Project List, Detail View, and Multi-File Upload<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#frontend-project-list-detail-view-and-multi-file-upload" class="hash-link" aria-label="Direct link to Frontend: Project List, Detail View, and Multi-File Upload" title="Direct link to Frontend: Project List, Detail View, and Multi-File Upload">​</a></h3>
<p>The UI side kept pace with the backend. New React hooks and API bindings (<code>01a657dd0</code>) wrap all the ingestion endpoints, and a project list component gives researchers a dashboard view of their active and completed ingestion projects. The <strong>Upload Files</strong> tab was restructured (<code>a7b2c59d4</code>) to support multi-file selection with per-file status indicators — upload progress, staging status, and any errors surface inline rather than in a toast that disappears.</p>
<p>The project detail view is the centrepiece here: it shows project metadata, file status, and — once the project reaches <strong>ready</strong> — an <strong>Open in Aqueduct</strong> button.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="auto-creation-and-aqueduct-handoff">Auto-Creation and Aqueduct Handoff<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#auto-creation-and-aqueduct-handoff" class="hash-link" aria-label="Direct link to Auto-Creation and Aqueduct Handoff" title="Direct link to Auto-Creation and Aqueduct Handoff">​</a></h3>
<p>Two commits tie the lifecycle together neatly. When a project transitions to <code>ready</code> status (all files staged without error), the system automatically creates a staging <strong>Source</strong> record (<code>e0efbb89b</code>) — the entity that Aqueduct uses to know where to pull data from. No manual configuration step required.</p>
<p>The <strong>Open in Aqueduct</strong> button (<code>fbea80b04</code>) then deep-links into Aqueduct with that source pre-selected, dropping the researcher directly into the ETL mapping workflow with their data already wired up. This is the kind of cross-tool integration that makes the platform feel like a platform rather than a collection of loosely related tools.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="on-the-horizon-abby-20-phase-3">On the Horizon: Abby 2.0 Phase 3<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#on-the-horizon-abby-20-phase-3" class="hash-link" aria-label="Direct link to On the Horizon: Abby 2.0 Phase 3" title="Direct link to On the Horizon: Abby 2.0 Phase 3">​</a></h2>
<p>While today's work was all ingestion, the devlog notes from last week signal what's coming next on the AI side. <strong>Abby 2.0 Phase 3</strong> — the Semantic Knowledge Graph — is in active planning. The design calls for a <code>KnowledgeGraphService</code> that traverses <code>concept_ancestor</code> and <code>concept_relationship</code> tables with Redis-backed caching, paired with a <code>DataProfileService</code> that builds a living coverage profile of the institution's CDM: temporal range, domain density, vocabulary completeness, and proactive gap warnings.</p>
<p>The goal is to give Abby genuine relational understanding of clinical concepts — so when a researcher asks about a condition with thin data at this institution, she warns them <em>before</em> they build a cohort on a foundation of 12 patients. That work will touch <code>ai/app/knowledge/</code>, the live context pipeline in <code>chroma/live_context.py</code>, and the context assembler. Expect those commits to start landing soon.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-26#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<ul>
<li><strong>Ingestion error handling</strong>: Surface per-row staging errors back to the UI, and define retry semantics for <code>StageFileJob</code> on transient failures.</li>
<li><strong>Schema lifecycle management</strong>: Staged schemas need a cleanup path — either on project deletion or after successful CDM load in Aqueduct.</li>
<li><strong>Abby Phase 3 kickoff</strong>: <code>KnowledgeGraphService</code> and <code>DataProfileService</code> implementation, starting with the OMOP hierarchy traversal and Redis caching layer.</li>
<li><strong>Staging source permissions</strong>: Review whether auto-created Sources inherit project-level ACLs correctly or need explicit permission wiring.</li>
</ul>
<p>Solid day. The ingestion pipeline has been a missing piece for researchers who want to bring their own data into the platform without going through a manual DBA-assisted ETL setup. Today's work makes that self-service path real.</p>]]></content:encoded>
            <category>development</category>
            <category>frontend</category>
            <category>backend</category>
            <category>database</category>
            <category>ohdsi</category>
            <category>analytics</category>
        </item>
        <item>
            <title><![CDATA[Publication Workflows, Manuscript Generation, and Darkstar Gets a Name]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-27</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-27</guid>
            <pubDate>Thu, 26 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A massive day on Parthenon with 193 commits landing across the platform. The headlining work: a near-complete publication/manuscript workflow that takes study analyses all the way to a formatted, auto-numbered document preview, plus a long-overdue rename of the R Analytics Runtime to Darkstar — the name it's been running under in Docker all along.]]></description>
            <content:encoded><![CDATA[<p>A massive day on Parthenon with 193 commits landing across the platform. The headlining work: a near-complete publication/manuscript workflow that takes study analyses all the way to a formatted, auto-numbered document preview, plus a long-overdue rename of the R Analytics Runtime to <strong>Darkstar</strong> — the name it's been running under in Docker all along.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="publication-workflow-from-study-results-to-manuscript">Publication Workflow: From Study Results to Manuscript<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#publication-workflow-from-study-results-to-manuscript" class="hash-link" aria-label="Direct link to Publication Workflow: From Study Results to Manuscript" title="Direct link to Publication Workflow: From Study Results to Manuscript">​</a></h2>
<p>The most substantial feature push today was on the <code>publish</code> module, which is rapidly becoming a first-class citizen in the Parthenon platform. The goal is to let researchers go from completed study analyses directly to a publication-ready manuscript — without leaving the platform.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="manuscript-structure-overhaul">Manuscript Structure Overhaul<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#manuscript-structure-overhaul" class="hash-link" aria-label="Direct link to Manuscript Structure Overhaul" title="Direct link to Manuscript Structure Overhaul">​</a></h3>
<p>The section editor previously organized content around analysis <em>types</em> (cohort, characterization, PLP, etc.). That framing made sense from an engineering perspective but doesn't match how manuscripts are actually written. Today's refactor (<code>b7411cd78</code>) replaced that structure with a <strong>research-question-driven manuscript layout</strong> — Introduction, Methods, Results, Discussion — which is how journals and regulatory submissions expect content to be organized.</p>
<p>This is a subtle but important shift: the platform now speaks the language of the researcher, not the pipeline.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="element-toggles-and-section-configurability">Element Toggles and Section Configurability<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#element-toggles-and-section-configurability" class="hash-link" aria-label="Direct link to Element Toggles and Section Configurability" title="Direct link to Element Toggles and Section Configurability">​</a></h3>
<p>Two commits (<code>2efc99095</code>, <code>94bf9eb15</code>) wired up the full toggle system between <code>DocumentConfigurator</code> and <code>SectionEditor</code>. Each section can now independently show or hide tables, narrative text, and diagrams. The configurator acts as the source of truth, propagating toggle state down to the section editors — a clean unidirectional data flow that should make this easy to extend as more element types are added.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="resultstable-component">ResultsTable Component<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#resultstable-component" class="hash-link" aria-label="Direct link to ResultsTable Component" title="Direct link to ResultsTable Component">​</a></h3>
<p>A new <code>ResultsTable</code> component (<code>c2406012b</code>) handles publication-style rendering of analysis results — think formatted cells, appropriate significant figures, and layout that maps to what you'd see in a journal table. Crucially, tables and figures in the preview are now <strong>auto-numbered</strong> (<code>8a85a80e6</code>), so Table 1, Table 2, Figure 1, etc. update dynamically as sections are toggled on or off. Anyone who's manually renumbered tables in a Word document at midnight before a submission deadline knows why this matters.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="analysis-picker-improvements">Analysis Picker Improvements<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#analysis-picker-improvements" class="hash-link" aria-label="Direct link to Analysis Picker Improvements" title="Direct link to Analysis Picker Improvements">​</a></h3>
<p>The analysis picker (<code>c2406012b</code>) gained two quality-of-life improvements: a <strong>Select All per study</strong> checkbox, and automatic pre-selection of the <code>studyId</code> when navigating to the publish page from a specific study. The latter pairs with a new <strong>Generate Manuscript</strong> button added to the Studies page (<code>f208b2e52</code>) — one click takes you to the publish workflow with your study already in context.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="narrative-generation-and-bug-fixes">Narrative Generation and Bug Fixes<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#narrative-generation-and-bug-fixes" class="hash-link" aria-label="Direct link to Narrative Generation and Bug Fixes" title="Direct link to Narrative Generation and Bug Fixes">​</a></h3>
<p>Two fix commits (<code>dc4d19e05</code>, <code>3b4f21103</code>) addressed real issues surfacing during end-to-end testing of the publish workflow:</p>
<ul>
<li>Study analyses now load with their associated executions, which is required for the publish workflow to have the data it needs to generate content.</li>
<li>Narrative generation is now properly wired end-to-end, 95% confidence intervals are included in result summaries, unlisted analysis types are handled gracefully, and several test failures introduced during the refactor were resolved.</li>
</ul>
<p>These aren't glamorous fixes, but they're the difference between a feature that demos well and one that actually works.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="darkstar-the-r-analytics-runtime-gets-its-name">Darkstar: The R Analytics Runtime Gets Its Name<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#darkstar-the-r-analytics-runtime-gets-its-name" class="hash-link" aria-label="Direct link to Darkstar: The R Analytics Runtime Gets Its Name" title="Direct link to Darkstar: The R Analytics Runtime Gets Its Name">​</a></h2>
<p>The R Analytics Runtime has been called "Darkstar" in Docker configurations for a while, but the System Health admin UI and backend were still referring to it as <code>r</code> or "R Analytics Runtime." Today's work (<code>b3a265ecb</code> and associated devlog) brought everything into alignment.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="backend-and-api">Backend and API<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#backend-and-api" class="hash-link" aria-label="Direct link to Backend and API" title="Direct link to Backend and API">​</a></h3>
<p><code>SystemHealthController.php</code> now uses the service key <code>darkstar</code> (matching the Docker service name) and the display name "Darkstar." The health card message is more informative too — instead of a generic status, it now shows something like <em>"R 4.4.2, 20 HADES packages loaded"</em> at a glance.</p>
<p>The <code>getDarkstarMetrics()</code> method replaces the old <code>getRMetrics()</code> and returns structured package version groups alongside runtime diagnostics (memory usage, JVM status, JDBC connectivity). On the R side, <code>darkstar/api/health.R</code> bumped to version <code>0.3.0</code> and now enumerates <strong>20 OHDSI HADES packages</strong> and <strong>12 Posit/CRAN infrastructure packages</strong> using <code>utils::packageVersion()</code> with per-package error handling — so a missing package surfaces cleanly rather than crashing the health endpoint.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="frontend-darkstarpackagespanel">Frontend: DarkstarPackagesPanel<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#frontend-darkstarpackagespanel" class="hash-link" aria-label="Direct link to Frontend: DarkstarPackagesPanel" title="Direct link to Frontend: DarkstarPackagesPanel">​</a></h3>
<p>The <code>ServiceDetailPage.tsx</code> component gained a new <code>DarkstarPackagesPanel</code> that renders both package groups as 4-column grids showing package name and installed version. The panel is intentionally excluded from the generic nested metrics renderer to avoid double-rendering, while flat metrics (R version, uptime, memory, JVM/JDBC) continue to display in the standard Metrics section.</p>
<p>For anyone debugging environment drift between deployments — "why is CohortMethod 5.2.1 on prod but 5.3.0 on staging?" — having this surfaced directly in the admin UI is a meaningful operational improvement.</p>
<p><strong>OHDSI HADES packages tracked:</strong> SqlRender, DatabaseConnector, Andromeda, Cyclops, FeatureExtraction, ResultModelManager, EmpiricalCalibration, ParallelLogger, CohortMethod, PatientLevelPrediction, SelfControlledCaseSeries, EvidenceSynthesis, CohortGenerator, CohortDiagnostics, DeepPatientLevelPrediction, CohortIncidence, Characterization, Strategus, and more.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-27#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The publish workflow is close to a functional end-to-end demo — the remaining gaps are around export (PDF/DOCX rendering) and integrating narrative generation with live analysis results rather than mocked data. That's the next frontier.</p>
<p>On the Darkstar side, the package version display is a foundation for something more useful: version pinning, environment validation, and potentially automated alerts when package versions drift from a known-good baseline. The data is now there; the tooling around it can follow.</p>
<p>It was a good day to be building outcomes research infrastructure.</p>]]></content:encoded>
            <category>development</category>
            <category>ohdsi</category>
            <category>analytics</category>
            <category>frontend</category>
            <category>backend</category>
            <category>infrastructure</category>
        </item>
        <item>
            <title><![CDATA[The Arrival of Ares to Parthenon]]></title>
            <link>http://localhost:8082/docs/blog/arrival-of-ares</link>
            <guid>http://localhost:8082/docs/blog/arrival-of-ares</guid>
            <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[If you've worked in the OHDSI ecosystem, you know the pain: Atlas for cohort definitions, Achilles Results Viewer for characterization, a DQD dashboard for data quality, spreadsheets for feasibility assessments, and a prayer that everyone's looking at the same release of the same data. Ares changes that. Today we're announcing Ares v2 — Parthenon's network-level data observatory — a single unified module that replaces the fragmented constellation of OHDSI data characterization tools with 10 purpose-built analytical panels, 60+ API endpoints, and a clinical UI designed for researchers who need answers, not workarounds.]]></description>
            <content:encoded><![CDATA[<p>If you've worked in the OHDSI ecosystem, you know the pain: Atlas for cohort definitions, Achilles Results Viewer for characterization, a DQD dashboard for data quality, spreadsheets for feasibility assessments, and a prayer that everyone's looking at the same release of the same data. Ares changes that. Today we're announcing Ares v2 — Parthenon's network-level data observatory — a single unified module that replaces the fragmented constellation of OHDSI data characterization tools with 10 purpose-built analytical panels, 60+ API endpoints, and a clinical UI designed for researchers who need answers, not workarounds.</p>
<p>This is the biggest feature release in Parthenon's history.</p>
<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/parthenon-hero.jpg" alt="The Parthenon" style="width:100%;display:block"></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-ares-replaces">What Ares Replaces<a href="http://localhost:8082/docs/blog/arrival-of-ares#what-ares-replaces" class="hash-link" aria-label="Direct link to What Ares Replaces" title="Direct link to What Ares Replaces">​</a></h2>
<p>To appreciate what Ares does, consider what a typical OHDSI site coordinator juggles today:</p>
<ul>
<li><strong>Atlas + WebAPI</strong> for browsing data source reports and Achilles results</li>
<li><strong>Achilles Results Viewer</strong> (an R Shiny app) for characterization dashboards</li>
<li><strong>DQD Dashboard</strong> (another Shiny app, or raw CSVs) for data quality trending</li>
<li><strong>Custom R scripts</strong> for cross-source comparison of concept prevalence</li>
<li><strong>Spreadsheets</strong> for tracking which sources have which domains, when they were last refreshed, and whether they're suitable for a given study</li>
<li><strong>Email threads</strong> for annotating data events and coordinating between data stewards and researchers</li>
<li><strong>No tooling at all</strong> for cost analytics, diversity assessments, or FDA Diversity Action Plan compliance</li>
</ul>
<p>Each tool has its own authentication, its own data model, its own release cycle, and its own way of defining "source." Ares collapses all of this into a single tab within Parthenon's Data Explorer, backed by the same PostgreSQL database, the same RBAC system, and the same API infrastructure that powers every other module.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-10-panels">The 10 Panels<a href="http://localhost:8082/docs/blog/arrival-of-ares#the-10-panels" class="hash-link" aria-label="Direct link to The 10 Panels" title="Direct link to The 10 Panels">​</a></h2>
<p>Ares is organized as a hub with 10 analytical panels, each addressing a distinct research operations question. Here's what we built and why.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-network-overview--situational-awareness-in-5-seconds">1. Network Overview — Situational Awareness in 5 Seconds<a href="http://localhost:8082/docs/blog/arrival-of-ares#1-network-overview--situational-awareness-in-5-seconds" class="hash-link" aria-label="Direct link to 1. Network Overview — Situational Awareness in 5 Seconds" title="Direct link to 1. Network Overview — Situational Awareness in 5 Seconds">​</a></h3>
<p>The first thing a data coordinator needs every morning is a status board. Network Overview provides exactly that: one row per data source, with DQ trend sparklines, freshness indicators (color-coded with STALE badges for sources &gt;30 days without a refresh), domain coverage rings, and person counts. An auto-generated alert banner surfaces the three most common operational emergencies — DQ score drops &gt;5%, stale data, and unmapped code spikes — before you even start looking.</p>
<p>The DQ Radar toggle overlays Kahn framework dimensions (completeness, conformance value, conformance relational, plausibility atemporal, plausibility temporal) as a radar chart per source. Comparing radar "shapes" across sources immediately reveals dimensional weaknesses that aggregate scores hide. A source with 95% overall DQ but 40% plausibility temporal has a very different problem than one with 85% across all dimensions evenly.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="2-concept-comparison--the-question-every-network-study-starts-with">2. Concept Comparison — The Question Every Network Study Starts With<a href="http://localhost:8082/docs/blog/arrival-of-ares#2-concept-comparison--the-question-every-network-study-starts-with" class="hash-link" aria-label="Direct link to 2. Concept Comparison — The Question Every Network Study Starts With" title="Direct link to 2. Concept Comparison — The Question Every Network Study Starts With">​</a></h3>
<p>"How prevalent is Type 2 Diabetes across our network?" is the single most common question in OHDSI network research. Concept Comparison answers it with four view modes:</p>
<ul>
<li><strong>Single Concept</strong>: Bar chart showing rate per 1,000 persons across all sources, with confidence interval error bars</li>
<li><strong>Multi-Concept</strong>: Grouped bar chart comparing 2-5 concepts side-by-side</li>
<li><strong>Attrition Funnel</strong>: TriNetX-style horizontal funnel showing patient attrition as criteria are layered</li>
<li><strong>Temporal</strong>: Line chart tracking prevalence across releases over time</li>
</ul>
<p>The killer feature here is the <strong>Crude / Age-Sex Adjusted toggle</strong>. Comparing a pediatric hospital's diabetes rate against a Medicare claims database using crude rates is meaningless — the demographics are completely different. When you toggle to age-sex standardized rates (using the US Census 2020 reference population), the comparisons become valid. A footnote documents the standardization method for reproducibility.</p>
<p>We also added <strong>CDC Benchmark Lines</strong> — when national prevalence data is available, a dashed reference line shows where each source sits relative to the expected rate. And you can compare entire <strong>Concept Sets</strong>, not just individual concepts — "all T2DM medications" across the network in one chart.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="3-dq-history--quality-is-a-trajectory-not-a-snapshot">3. DQ History — Quality is a Trajectory, Not a Snapshot<a href="http://localhost:8082/docs/blog/arrival-of-ares#3-dq-history--quality-is-a-trajectory-not-a-snapshot" class="hash-link" aria-label="Direct link to 3. DQ History — Quality is a Trajectory, Not a Snapshot" title="Direct link to 3. DQ History — Quality is a Trajectory, Not a Snapshot">​</a></h3>
<p>A DQ score at a single point in time tells you almost nothing. Was it always this bad? Did it get worse after the last ETL? Did someone fix the completeness issues from Q3?</p>
<p>DQ History tracks quality over time with four tabs:</p>
<ul>
<li><strong>Trends</strong>: Line chart of overall DQ pass rate per release, with background zones (green &gt;90%, amber 80-90%, red &lt;80%). Click any release point to open a delta table showing every check that changed status.</li>
<li><strong>Heatmap</strong>: Category-by-release grid, color-coded by pass rate. Instantly spot which quality categories are degrading over time.</li>
<li><strong>Cross-Source</strong>: Overlay DQ trend lines from multiple sources on one chart for direct comparison.</li>
<li><strong>SLA</strong>: Admin-only view where data stewards set minimum pass rate targets per DQ category. Compliance bars show actual vs. target with error budget remaining — like an SRE error budget, but for data quality.</li>
</ul>
<p>Each DQ check also gets its own <strong>6-point sparkline</strong> showing its individual pass/fail history. Annotations from team members appear as markers on the trend chart, providing institutional context for data events.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="4-coverage-matrix--what-data-do-you-actually-have">4. Coverage Matrix — What Data Do You Actually Have?<a href="http://localhost:8082/docs/blog/arrival-of-ares#4-coverage-matrix--what-data-do-you-actually-have" class="hash-link" aria-label="Direct link to 4. Coverage Matrix — What Data Do You Actually Have?" title="Direct link to 4. Coverage Matrix — What Data Do You Actually Have?">​</a></h3>
<p>The coverage matrix is a domain-by-source grid that answers the most fundamental question in study design: does this source have the data I need?</p>
<p>Three view modes (record counts, per-person density, and temporal date ranges) give different perspectives. The <strong>Expected vs. Actual toggle</strong> is particularly powerful — it compares what domains a source <em>type</em> (claims vs. EHR vs. registry) should have against what's actually present, flagging gaps as MISS and unexpected domains as BONUS.</p>
<p>The observation_period column gets a gold accent border because it's the single most important domain for study design — everything downstream depends on it.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="5-feasibility--can-your-network-support-this-study">5. Feasibility — Can Your Network Support This Study?<a href="http://localhost:8082/docs/blog/arrival-of-ares#5-feasibility--can-your-network-support-this-study" class="hash-link" aria-label="Direct link to 5. Feasibility — Can Your Network Support This Study?" title="Direct link to 5. Feasibility — Can Your Network Support This Study?">​</a></h3>
<p>Feasibility assessment is where Ares goes from descriptive to prescriptive. Define your study criteria — required domains, concepts, visit types, date ranges, minimum patient count — and Ares evaluates every source against them.</p>
<p>Results include per-criterion scores with weighted composite scoring (domain 20%, concept 30%, visit 15%, date 15%, patient 20%) and a clear ELIGIBLE/INELIGIBLE verdict. But the real value is in the <strong>Impact Analysis</strong> waterfall chart, which shows which single criterion eliminates the most sources. When you need to relax a constraint to reach your enrollment target, this tells you which constraint to relax.</p>
<p>The <strong>CONSORT Flow</strong> diagram visualizes progressive source exclusion through each criterion gate — the same format used in clinical trial publications, now applied to site selection.</p>
<p>And for sources that pass feasibility, the <strong>Patient Arrival Forecast</strong> projects monthly patient accrual with confidence intervals, showing when you'll reach your target enrollment. It's the difference between "this source is eligible" and "this source will get you 500 patients by September."</p>
<p>Criteria sets can be saved as <strong>templates</strong> and shared across the research team — define your study's requirements once, reuse them as the network evolves.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="6-diversity--fda-diversity-action-plans-built-in">6. Diversity — FDA Diversity Action Plans Built In<a href="http://localhost:8082/docs/blog/arrival-of-ares#6-diversity--fda-diversity-action-plans-built-in" class="hash-link" aria-label="Direct link to 6. Diversity — FDA Diversity Action Plans Built In" title="Direct link to 6. Diversity — FDA Diversity Action Plans Built In">​</a></h3>
<p>The FDA's 2024 Diversity Action Plan guidance fundamentally changed clinical trial enrollment. Sites now need to demonstrate — quantitatively — that their data sources represent diverse populations. Ares provides this out of the box.</p>
<p>The <strong>Overview</strong> tab shows Simpson's Diversity Index per source (0-1 scale, higher = more diverse), with gender/race/ethnicity breakdowns and benchmark overlay lines. The <strong>DAP Gap</strong> tab lets you set enrollment targets by demographic dimension and see which sources meet or miss them in a red/green matrix.</p>
<p>The <strong>Geographic</strong> tab goes deeper: state-level distribution bars, number of states covered, and — critically — an <strong>Area Deprivation Index (ADI) histogram</strong> showing socioeconomic representation. A network that covers 30 states but only draws from affluent ZIP codes isn't truly diverse. The ADI data quantifies this.</p>
<p><strong>Pooled</strong> view lets you select multiple sources and see combined demographics across the pooled population — essential for multi-site study planning.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="7-releases--version-control-for-data">7. Releases — Version Control for Data<a href="http://localhost:8082/docs/blog/arrival-of-ares#7-releases--version-control-for-data" class="hash-link" aria-label="Direct link to 7. Releases — Version Control for Data" title="Direct link to 7. Releases — Version Control for Data">​</a></h3>
<p>Every ETL run produces a new release of a data source. Ares tracks these with per-source release cards showing CDM version, vocabulary version, ETL version, and notes. Each card has an expandable <strong>diff panel</strong> showing what changed: person count deltas, record count deltas, DQ score changes, vocabulary version changes, and domain-level deltas.</p>
<p>The <strong>Swimlane</strong> timeline puts all sources on one horizontal axis with release dots positioned by date — immediately revealing which sources are updated regularly and which are falling behind. The <strong>Calendar</strong> view (GitHub contributions-style heatmap) shows release density by day across the network.</p>
<p>ETL provenance metadata — who ran it, what code version, how long it took — is captured when available, providing an audit trail for regulatory and reproducibility purposes.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="8-unmapped-codes--ai-assisted-vocabulary-remediation">8. Unmapped Codes — AI-Assisted Vocabulary Remediation<a href="http://localhost:8082/docs/blog/arrival-of-ares#8-unmapped-codes--ai-assisted-vocabulary-remediation" class="hash-link" aria-label="Direct link to 8. Unmapped Codes — AI-Assisted Vocabulary Remediation" title="Direct link to 8. Unmapped Codes — AI-Assisted Vocabulary Remediation">​</a></h3>
<p>Unmapped source codes are the single biggest data quality problem in OMOP CDM implementations. Ares prioritizes them using an <strong>impact score</strong> (record count multiplied by domain weight — condition codes weighted 1.0, drug 0.9, procedure 0.8) so you focus mapping effort where it matters most.</p>
<p>The <strong>Pareto chart</strong> demonstrates the 80/20 rule visually: the top 20 unmapped codes typically account for 80%+ of all unmapped records. The <strong>Treemap</strong> view shows unmapped codes grouped by vocabulary, revealing whether the problem is concentrated in a single vocabulary or spread across many.</p>
<p>The standout feature is <strong>AI Mapping Suggestions</strong>: expand any unmapped code row to see the top 5 standard concept suggestions ranked by confidence (0-100%), powered by pgvector concept embedding similarity. Click Accept to stage a mapping — it doesn't write to the CDM directly; an admin must promote approved mappings. This is the same AI mapping infrastructure that powers Parthenon's Aqueduct ETL module, now integrated directly into the data quality workflow.</p>
<p>Export in <strong>Usagi-compatible CSV format</strong> means teams using OHDSI's standard mapping tool can seamlessly integrate Ares's prioritized list into their existing workflows.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="9-annotations--institutional-memory-for-data-events">9. Annotations — Institutional Memory for Data Events<a href="http://localhost:8082/docs/blog/arrival-of-ares#9-annotations--institutional-memory-for-data-events" class="hash-link" aria-label="Direct link to 9. Annotations — Institutional Memory for Data Events" title="Direct link to 9. Annotations — Institutional Memory for Data Events">​</a></h3>
<p>Data events happen constantly: ETL runs complete, schema changes deploy, quality scores drop, researchers discover unexpected patterns. Without a structured way to capture this context, institutional knowledge lives in email threads and Slack messages that nobody can find six months later.</p>
<p>Ares Annotations provides a structured note system with four tag types:</p>
<ul>
<li><strong>Data Event</strong> (teal) — something happened in the data</li>
<li><strong>Research Note</strong> (gold) — researcher observation or insight</li>
<li><strong>Action Item</strong> (crimson) — something that needs to be done</li>
<li><strong>System</strong> (indigo) — auto-generated by the platform</li>
</ul>
<p>Annotations support <strong>threaded discussions</strong> (one level of nesting) for data steward-to-researcher conversations, and can be created directly from chart interactions — click a data point on a DQ trend chart and add context without leaving the visualization.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="10-cost-analysis--healthcare-economics-at-network-scale">10. Cost Analysis — Healthcare Economics at Network Scale<a href="http://localhost:8082/docs/blog/arrival-of-ares#10-cost-analysis--healthcare-economics-at-network-scale" class="hash-link" aria-label="Direct link to 10. Cost Analysis — Healthcare Economics at Network Scale" title="Direct link to 10. Cost Analysis — Healthcare Economics at Network Scale">​</a></h3>
<p>Cost data in OMOP CDM is notoriously tricky. The <code>cost</code> table contains multiple cost types (charged, paid, allowed) that can differ by 3-10x, and mixing them in the same analysis is the #1 cost study error. Ares addresses this head-on with a <strong>cost type filter</strong> that applies globally across all cost views, with an amber warning banner when multiple types exist.</p>
<p>Six tabs cover the full cost analytics workflow: summary cards with Per Patient Per Year (PPPY) metrics, box-and-whisker distributions per domain (revealing the skewness that averages hide), care setting breakdowns, cross-source comparisons, top cost driver concepts, and monthly trends.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="60-api-endpoints-one-authentication-layer">60+ API Endpoints, One Authentication Layer<a href="http://localhost:8082/docs/blog/arrival-of-ares#60-api-endpoints-one-authentication-layer" class="hash-link" aria-label="Direct link to 60+ API Endpoints, One Authentication Layer" title="Direct link to 60+ API Endpoints, One Authentication Layer">​</a></h2>
<p>Every panel is backed by a RESTful API under <code>/api/v1/</code>, split into network-scoped endpoints (cross-source analytics) and source-scoped endpoints (per-source detail). All endpoints require Sanctum authentication and RBAC permission checks — no public access to clinical data characterization.</p>
<p>Network-scoped endpoints include:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/overview              — Network health KPIs</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/alerts                — Auto-generated alerts</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/compare               — Single concept prevalence</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/compare/standardized  — Age-sex adjusted rates</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/coverage              — Domain x source matrix</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/diversity             — Demographics + Simpson's index</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/diversity/geographic  — State distribution + ADI</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">POST /network/ares/diversity/dap-check   — FDA DAP gap analysis</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">POST /network/ares/feasibility           — Run assessment</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /network/ares/cost/compare          — Cross-source cost comparison</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Source-scoped endpoints cover DQ history, unmapped codes with AI suggestions, cost analytics, release management, annotations, and more — over 30 endpoints per source.</p>
<p>Rate-limited (throttled) endpoints protect computationally expensive operations like age-sex standardization, concept set comparisons, and patient arrival forecasts.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="role-based-access">Role-Based Access<a href="http://localhost:8082/docs/blog/arrival-of-ares#role-based-access" class="hash-link" aria-label="Direct link to Role-Based Access" title="Direct link to Role-Based Access">​</a></h2>
<p>Ares respects Parthenon's RBAC hierarchy:</p>
<table><thead><tr><th>Capability</th><th style="text-align:center">Viewer</th><th style="text-align:center">Researcher</th><th style="text-align:center">Data Steward</th><th style="text-align:center">Admin</th></tr></thead><tbody><tr><td>View all panels</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td></tr><tr><td>Run feasibility assessments</td><td style="text-align:center">-</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td></tr><tr><td>Create annotations</td><td style="text-align:center">-</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td></tr><tr><td>Accept AI mapping suggestions</td><td style="text-align:center">-</td><td style="text-align:center">-</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td></tr><tr><td>Set DQ SLA targets</td><td style="text-align:center">-</td><td style="text-align:center">-</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td></tr><tr><td>Promote mappings to CDM</td><td style="text-align:center">-</td><td style="text-align:center">-</td><td style="text-align:center">-</td><td style="text-align:center">Yes</td></tr></tbody></table>
<p>New users get <code>viewer</code> role by default — they can see everything but can't modify anything. This follows Parthenon's principle of least privilege.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-ares">Why "Ares"?<a href="http://localhost:8082/docs/blog/arrival-of-ares#why-ares" class="hash-link" aria-label="Direct link to Why &quot;Ares&quot;?" title="Direct link to Why &quot;Ares&quot;?">​</a></h2>
<p>In Greek mythology, Ares is the god of war — but also of courage, strategy, and the willingness to confront hard truths. In OHDSI, data characterization is exactly that: confronting the hard truths about your data before you bet a clinical study on it. A network overview that hides quality problems isn't helping anyone. A feasibility assessment that ignores demographic bias produces misleading results. Ares doesn't sugarcoat — it shows you the DQ radar with its lopsided dimensions, the unmapped codes with their Pareto distribution, the diversity gaps with their ADI histograms.</p>
<p>The name also fits architecturally. In the Parthenon — both the building and the platform — Ares stands alongside Athena (wisdom, represented by our Abby AI assistant), Apollo (prediction, represented by the analytics engine), and Asclepius (healing, represented by the clinical data model). Each deity governs a domain. Ares governs the hard operational truths that make everything else possible.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-this-means-for-the-ohdsi-community">What This Means for the OHDSI Community<a href="http://localhost:8082/docs/blog/arrival-of-ares#what-this-means-for-the-ohdsi-community" class="hash-link" aria-label="Direct link to What This Means for the OHDSI Community" title="Direct link to What This Means for the OHDSI Community">​</a></h2>
<p>Ares v2 in Parthenon represents something that hasn't existed in the OHDSI ecosystem before: a unified, multi-source data observatory with modern web UI, AI-assisted mapping, standardized rate comparisons, feasibility assessment with arrival forecasting, FDA DAP compliance checking, cost analytics, and institutional annotation — all in one authenticated application with role-based access control.</p>
<p>The individual capabilities aren't new to the community. Achilles has characterized data for years. DQD has tracked quality. Atlas has browsed results. What's new is having all of it in one place, backed by a single API, with cross-source analytics that work at network scale rather than one-source-at-a-time.</p>
<p>For network study coordinators: you no longer need five tools and three spreadsheets to answer "which sites should participate in this study."</p>
<p>For data stewards: you can track quality trajectories, set SLA targets, and monitor unmapped code remediation in the same interface where researchers browse characterization results.</p>
<p>For researchers: feasibility assessment with patient arrival forecasting means you can make quantitative enrollment projections, not just "this source has enough patients."</p>
<p>For compliance teams: FDA Diversity Action Plan gap analysis is built in, with geographic and socioeconomic diversity metrics that go beyond simple demographic breakdowns.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/arrival-of-ares#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>Ares v2 ships with the full 10-panel suite, but there's more on the roadmap:</p>
<ol>
<li><strong>Automated quality alerting</strong> — Push notifications (email, Slack) when DQ scores drop below SLA targets or sources go stale</li>
<li><strong>Federated Ares</strong> — Cross-institution characterization without moving data, leveraging Parthenon's federated study framework</li>
<li><strong>Longitudinal concept tracking</strong> — Automated detection of concept prevalence anomalies (sudden spikes or drops that may indicate coding practice changes or ETL errors)</li>
<li><strong>Cost modeling</strong> — Predictive cost modeling for study budgeting based on historical cost distributions and enrollment projections</li>
</ol>
<p>Ares is live now at <a href="https://parthenon.acumenus.net/" target="_blank" rel="noopener noreferrer">parthenon.acumenus.net</a> under Data Explorer &gt; Ares. Log in, click the tab, and see your network's data like you've never seen it before.</p>
<hr>
<p><em>Ares v2 was developed as part of Parthenon's mission to replace the fragmented OHDSI tool ecosystem with a single, unified platform for outcomes research. For questions, feedback, or feature requests, reach out to the Acumenus team.</em></p>]]></content:encoded>
            <category>ares</category>
            <category>ohdsi</category>
            <category>data-quality</category>
            <category>characterization</category>
            <category>network-analytics</category>
            <category>milestone</category>
        </item>
        <item>
            <title><![CDATA[Achilles Reliability Hardening: A Big Day for OHDSI Analytics]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-25</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-25</guid>
            <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Today was one of those satisfying days where two major workstreams converged: we pushed the Ares data quality module from skeleton to a fully featured analytics suite with four distinct intelligence phases, and we permanently fixed a cluster of compounding bugs that had been making Achilles characterization runs fragile on large real-world datasets. Both efforts move Parthenon meaningfully closer to being a production-grade OHDSI research platform.]]></description>
            <content:encoded><![CDATA[<p>Today was one of those satisfying days where two major workstreams converged: we pushed the Ares data quality module from skeleton to a fully featured analytics suite with four distinct intelligence phases, and we permanently fixed a cluster of compounding bugs that had been making Achilles characterization runs fragile on large real-world datasets. Both efforts move Parthenon meaningfully closer to being a production-grade OHDSI research platform.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="ares-parity-milestone-from-stub-to-suite">Ares Parity+ Milestone: From Stub to Suite<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#ares-parity-milestone-from-stub-to-suite" class="hash-link" aria-label="Direct link to Ares Parity+ Milestone: From Stub to Suite" title="Direct link to Ares Parity+ Milestone: From Stub to Suite">​</a></h2>
<p>The headline work today was the completion of the <strong>Ares Parity+ milestone</strong> — a multi-phase build that brings Ares data quality analytics into Parthenon as a first-class citizen. The full design spec and devlog were committed alongside the code (<code>docs(ares): add devlog and design specs</code>), so future contributors have a clear paper trail for every architectural decision.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="backend-foundation">Backend Foundation<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#backend-foundation" class="hash-link" aria-label="Direct link to Backend Foundation" title="Direct link to Backend Foundation">​</a></h3>
<p>The backend work started with <code>AresController</code>, wiring up release and annotation API routes, and the <code>ares:backfill-releases</code> Artisan command for migrating legacy release data into the new schema. These two pieces together mean Parthenon can ingest historical Ares output <em>and</em> track new releases going forward without any manual data surgery.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="frontend-shell-hub-dashboard-releases--annotations">Frontend Shell: Hub Dashboard, Releases &amp; Annotations<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#frontend-shell-hub-dashboard-releases--annotations" class="hash-link" aria-label="Direct link to Frontend Shell: Hub Dashboard, Releases &amp; Annotations" title="Direct link to Frontend Shell: Hub Dashboard, Releases &amp; Annotations">​</a></h3>
<p>The first frontend phase (<code>feat: add Ares tab frontend</code>) established the hub dashboard, releases list, and annotations views. This is the scaffolding everything else hangs off — a consistent navigation frame and data-loading pattern that the subsequent phase components slot into cleanly.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="phase-2--quality-intelligence">Phase 2 — Quality Intelligence<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#phase-2--quality-intelligence" class="hash-link" aria-label="Direct link to Phase 2 — Quality Intelligence" title="Direct link to Phase 2 — Quality Intelligence">​</a></h3>
<p>Phase 2 (<code>feat(ares): implement Phase 2 Quality Intelligence</code>) delivers the analytical meat that Ares users expect: DQ history trending, unmapped source codes exploration, and domain continuity checks. These views surface the data quality signals that are otherwise buried in raw Ares JSON exports, making them actionable directly inside Parthenon rather than requiring a separate Ares UI instance.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="phase-3--network-intelligence">Phase 3 — Network Intelligence<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#phase-3--network-intelligence" class="hash-link" aria-label="Direct link to Phase 3 — Network Intelligence" title="Direct link to Phase 3 — Network Intelligence">​</a></h3>
<p>Phase 3 (<code>feat(ares): implement Phase 3 Network Intelligence</code>) adds the collaborative research layer: site comparison, population coverage metrics, demographic diversity analysis, and feasibility assessment. This is particularly valuable for multi-site OHDSI network studies where understanding <em>which</em> sites have sufficient data for a given research question is half the battle.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="phase-4--cost-analysis">Phase 4 — Cost Analysis<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#phase-4--cost-analysis" class="hash-link" aria-label="Direct link to Phase 4 — Cost Analysis" title="Direct link to Phase 4 — Cost Analysis">​</a></h3>
<p>Phase 4 (<code>feat(ares): implement Phase 4 Cost Analysis</code>) rounds out the milestone with <code>CostService</code>, dedicated cost endpoints, and <code>CostView</code>. The hub skeletons are in place for further expansion. Healthcare cost data is notoriously messy in CDM mappings, so having a dedicated analysis surface for it — rather than treating it as just another domain — reflects how researchers actually use it.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="ci-cleanup">CI Cleanup<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#ci-cleanup" class="hash-link" aria-label="Direct link to CI Cleanup" title="Direct link to CI Cleanup">​</a></h3>
<p>The Ares build also came with a round of CI fixes: a <code>recharts</code> Tooltip formatter cast to <code>any</code> for strict TypeScript compatibility, PHPStan and TypeScript error resolution across Ares components, and a Pint auto-fix pass that also removed a stale <code>AchillesRunSummary</code> import and corrected <code>react-joyride</code> export references. These aren't glamorous commits, but a green CI pipeline is what lets us ship confidently.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="achilles-engine-reliability-hardening">Achilles Engine Reliability Hardening<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#achilles-engine-reliability-hardening" class="hash-link" aria-label="Direct link to Achilles Engine Reliability Hardening" title="Direct link to Achilles Engine Reliability Hardening">​</a></h2>
<p>Separately, the devlog for <strong>Phase 14 — Achilles Engine Reliability Hardening</strong> documents the root cause analysis and fixes for a set of compounding bugs that had made every characterization run on the SynPUF dataset (source 47, ~100M+ row measurement table) fragile. Smaller datasets like Eunomia never surfaced these issues, which is exactly why production-scale testing matters.</p>
<p>Four bugs were identified and fixed:</p>
<p><strong>Bug 1 (the killer): Non-resumable retries.</strong> <code>RunAchillesJob</code> used <code>AchillesRun::create()</code>, which hit a unique constraint on retry after a timeout. Replaced with <code>AchillesRun::updateOrCreate()</code> — the job is now fully idempotent across retry attempts.</p>
<p><strong>Bug 2: Timeout too short.</strong> The 1-hour timeout (<code>$timeout = 3600</code>) was simply not enough — analysis 1811 alone (measurement records by concept by year-month) takes ~116 minutes on SynPUF. Bumped to 3 hours (<code>$timeout = 10800</code>), with <code>$tries</code> increased to 3 and a 30-second backoff.</p>
<p><strong>Bug 3: No analysis-level resume.</strong> A run that completed 111 of 127 analyses and then died would restart from analysis 1 on retry, throwing away up to 175 minutes of completed work. The fix adds resume capability to <code>AchillesEngineService</code> so restarts pick up where they left off.</p>
<p><strong>Bug 4: Zombie "running" status.</strong> Without a <code>failed()</code> method on <code>RunAchillesJob</code>, any failed run stayed in <code>status=running</code> indefinitely. The UI showed perpetually active jobs with no recovery path. The new <code>failed()</code> handler marks runs as failed with a timestamp, restoring operator visibility.</p>
<p>Worth noting: <code>status</code> remains excluded from <code>$fillable</code> per HIGHSEC 3.1 — all status transitions go through explicit <code>update()</code> calls, not mass assignment. The reliability improvements don't compromise that security invariant.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-25#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>With the Ares Parity+ milestone shipped, the immediate priority is integration testing across all four phases against a real Ares output directory — particularly the network comparison views, which depend on multi-source data being present. We'll also be looking at paginating the cost endpoint responses as cost data can be voluminous.</p>
<p>On the Achilles side, the next step is validating the resume logic under controlled timeout conditions in a staging environment before we consider SynPUF source 47 fully unblocked. Once that's confirmed stable, we can look at parallelizing the slower analyses (1811 in particular) to bring total characterization time down to something more reasonable for routine use.</p>
<p>It was a dense day, but the platform is measurably more capable and more reliable for it.</p>]]></content:encoded>
            <category>development</category>
            <category>ohdsi</category>
            <category>analytics</category>
            <category>frontend</category>
            <category>backend</category>
            <category>infrastructure</category>
            <category>database</category>
            <category>testing</category>
        </item>
        <item>
            <title><![CDATA[Full HADES Parity: Parthenon Now Supports All 12 OHDSI Database Dialects]]></title>
            <link>http://localhost:8082/docs/blog/hades-12-dialect-coverage</link>
            <guid>http://localhost:8082/docs/blog/hades-12-dialect-coverage</guid>
            <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[One of OHDSI's greatest strengths is database agnosticism. The HADES ecosystem — via SqlRender and DatabaseConnector — lets researchers write analyses once and run them against SQL Server, PostgreSQL, Oracle, Snowflake, BigQuery, and seven other platforms without modification. Today, Parthenon achieved full parity with that capability: all 12 HADES-supported database dialects are now covered across both the PHP SQL translator and the R runtime.]]></description>
            <content:encoded><![CDATA[<p>One of OHDSI's greatest strengths is database agnosticism. The HADES ecosystem — via SqlRender and DatabaseConnector — lets researchers write analyses once and run them against SQL Server, PostgreSQL, Oracle, Snowflake, BigQuery, and seven other platforms without modification. Today, Parthenon achieved full parity with that capability: all 12 HADES-supported database dialects are now covered across both the PHP SQL translator and the R runtime.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-this-matters">Why This Matters<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#why-this-matters" class="hash-link" aria-label="Direct link to Why This Matters" title="Direct link to Why This Matters">​</a></h2>
<p>OMOP CDM databases live everywhere. Academic medical centers often run Oracle or SQL Server. Cloud-native organizations are increasingly moving to Snowflake or BigQuery. Federated networks span multiple database platforms simultaneously. If you're building a platform that replaces Atlas and WebAPI, you can't afford to be PostgreSQL-only in your SQL rendering — even if your internal database is PostgreSQL.</p>
<p>Parthenon has always used PostgreSQL as its production database, but the SQL translation layer is critical for two capabilities:</p>
<ol>
<li>
<p><strong>Query Library rendering</strong> — OHDSI's standard SQL templates are written in T-SQL (SQL Server syntax). When a researcher executes a query from the library, it gets translated to the target source's dialect at render time.</p>
</li>
<li>
<p><strong>Federated analysis</strong> — Each <code>Source</code> in Parthenon can point to a different database with its own dialect. A study might pull cohorts from a local PostgreSQL CDM, run against a collaborator's Snowflake warehouse, and compare with results from an Oracle-backed registry. The <code>HadesBridgeService</code> handles the connection abstraction; the SQL translator handles the syntax.</p>
</li>
</ol>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-12-dialects">The 12 Dialects<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#the-12-dialects" class="hash-link" aria-label="Direct link to The 12 Dialects" title="Direct link to The 12 Dialects">​</a></h2>
<p>OHDSI's SqlRender package (the canonical R/Java SQL translation layer) supports these 12 database platforms:</p>
<table><thead><tr><th>#</th><th>Dialect</th><th>SQL Family</th><th>Typical Deployment</th></tr></thead><tbody><tr><td>1</td><td><strong>SQL Server</strong></td><td>T-SQL (canonical source)</td><td>Enterprise on-prem, Azure SQL</td></tr><tr><td>2</td><td><strong>PostgreSQL</strong></td><td>ANSI SQL</td><td>Academic, cloud, Parthenon internal</td></tr><tr><td>3</td><td><strong>Oracle</strong></td><td>PL/SQL</td><td>Large health systems, pharma</td></tr><tr><td>4</td><td><strong>Redshift</strong></td><td>PostgreSQL variant</td><td>AWS data warehouses</td></tr><tr><td>5</td><td><strong>Snowflake</strong></td><td>ANSI SQL variant</td><td>Cloud analytics</td></tr><tr><td>6</td><td><strong>BigQuery</strong></td><td>GoogleSQL</td><td>Google Cloud OMOP deployments</td></tr><tr><td>7</td><td><strong>Azure Synapse</strong></td><td>T-SQL variant</td><td>Microsoft cloud OLAP</td></tr><tr><td>8</td><td><strong>Spark / Databricks</strong></td><td>SparkSQL</td><td>Big data / lakehouse</td></tr><tr><td>9</td><td><strong>Apache Hive</strong></td><td>HiveQL</td><td>Hadoop ecosystems</td></tr><tr><td>10</td><td><strong>Apache Impala</strong></td><td>Impala SQL</td><td>Hadoop real-time queries</td></tr><tr><td>11</td><td><strong>IBM Netezza</strong></td><td>PostgreSQL variant</td><td>Enterprise data warehouses</td></tr><tr><td>12</td><td><strong>DuckDB</strong></td><td>PostgreSQL variant</td><td>Embedded analytics, local dev</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="parthenons-two-translation-layers">Parthenon's Two Translation Layers<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#parthenons-two-translation-layers" class="hash-link" aria-label="Direct link to Parthenon's Two Translation Layers" title="Direct link to Parthenon's Two Translation Layers">​</a></h2>
<p>Parthenon translates OHDSI SQL in two places, each serving a different part of the stack:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="php-ohdsisqltranslator">PHP: <code>OhdsiSqlTranslator</code><a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#php-ohdsisqltranslator" class="hash-link" aria-label="Direct link to php-ohdsisqltranslator" title="Direct link to php-ohdsisqltranslator">​</a></h3>
<p>The PHP translator (<code>backend/app/Services/SqlRenderer/OhdsiSqlTranslator.php</code>) handles server-side SQL rendering for the Query Library, Achilles analysis templates, and any custom SQL that needs to target a non-PostgreSQL source. It converts T-SQL constructs — <code>DATEADD</code>, <code>DATEDIFF</code>, <code>GETDATE()</code>, <code>CHARINDEX</code>, <code>LEN</code>, <code>ISNULL</code>, <code>COUNT_BIG</code>, <code>CONVERT</code>, <code>TOP N</code>, <code>DATEFROMPARTS</code> — into dialect-appropriate equivalents.</p>
<p>The translation groups dialects by SQL family:</p>
<ul>
<li><strong>PostgreSQL family</strong> (PostgreSQL, Redshift, Netezza, DuckDB) — <code>INTERVAL</code> arithmetic, <code>EXTRACT</code>, <code>POSITION</code>, <code>COALESCE</code>, <code>LIMIT</code></li>
<li><strong>Oracle</strong> — <code>ADD_MONTHS</code>, <code>MONTHS_BETWEEN</code>, <code>TRUNC(SYSDATE)</code>, <code>FETCH FIRST N ROWS ONLY</code></li>
<li><strong>BigQuery</strong> — <code>DATE_ADD</code>/<code>DATE_DIFF</code> with interval syntax, <code>CURRENT_DATE()</code></li>
<li><strong>Snowflake</strong> — Native <code>DATEADD</code>/<code>DATEDIFF</code> (same names, different argument order from T-SQL)</li>
<li><strong>Spark family</strong> (Spark, Hive, Impala) — <code>DATE_ADD</code> with interval syntax</li>
<li><strong>T-SQL family</strong> (SQL Server, Synapse) — pass-through (canonical format)</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="r-runtime-connectionr">R Runtime: <code>connection.R</code><a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#r-runtime-connectionr" class="hash-link" aria-label="Direct link to r-runtime-connectionr" title="Direct link to r-runtime-connectionr">​</a></h3>
<p>The Darkstar R runtime (<code>r-runtime/R/connection.R</code>) wraps OHDSI's <code>DatabaseConnector</code> package, which handles JDBC connections to all supported platforms. When Parthenon dispatches a HADES analysis (CohortMethod, PatientLevelPrediction, SCCS), the <code>HadesBridgeService</code> translates the <code>Source</code> model into a connection spec that the R runtime uses to create a <code>DatabaseConnector::connectionDetails</code> object. SqlRender handles the SQL translation natively within R.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="adding-duckdb-a-three-line-change">Adding DuckDB: A Three-Line Change<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#adding-duckdb-a-three-line-change" class="hash-link" aria-label="Direct link to Adding DuckDB: A Three-Line Change" title="Direct link to Adding DuckDB: A Three-Line Change">​</a></h2>
<p>The gap we closed today was DuckDB — supported in the R runtime's <code>DatabaseConnector</code> but missing from the PHP translator. The fix was anticlimactic:</p>
<div class="language-php codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-php codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)">// In the match expression:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'duckdb'</span><span class="token plain"> </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=&gt;</span><span class="token plain"> </span><span class="token variable" style="color:hsl(207, 82%, 66%)">$this</span><span class="token operator" style="color:hsl(207, 82%, 66%)">-&gt;</span><span class="token function" style="color:hsl(207, 82%, 66%)">toPostgresql</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token variable" style="color:hsl(207, 82%, 66%)">$sql</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)">// In the supported dialects list:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token string single-quoted-string" style="color:hsl(95, 38%, 62%)">'duckdb'</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>DuckDB's SQL dialect is effectively PostgreSQL-compatible. It supports <code>EXTRACT</code>, <code>CURRENT_DATE</code>, <code>INTERVAL</code> arithmetic, <code>LIMIT</code>, <code>COALESCE</code>, <code>LENGTH</code>, <code>POSITION</code>, and <code>CAST</code> — all the constructs our PostgreSQL translator already handles. No new translation methods, no edge cases, no special handling.</p>
<p>This is by design. DuckDB was built as an embeddable analytical database with a familiar SQL interface. For OHDSI use cases — particularly local development, testing, and lightweight CDM exploration — DuckDB is an excellent option: it runs in-process, requires no server, and handles analytical workloads efficiently.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="dialect-coverage-matrix">Dialect Coverage Matrix<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#dialect-coverage-matrix" class="hash-link" aria-label="Direct link to Dialect Coverage Matrix" title="Direct link to Dialect Coverage Matrix">​</a></h2>
<p>Here's the final state of dialect coverage across Parthenon's stack:</p>
<table><thead><tr><th>Dialect</th><th style="text-align:center">PHP Translator</th><th style="text-align:center">R Runtime</th><th style="text-align:center">Source UI</th><th>Status</th></tr></thead><tbody><tr><td>PostgreSQL</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Production-tested</td></tr><tr><td>SQL Server</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Oracle</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Redshift</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Snowflake</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>BigQuery</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Synapse</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Pass-through (T-SQL)</td></tr><tr><td>Spark</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Hive</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Impala</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>Netezza</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td>Translated, untested at scale</td></tr><tr><td>DuckDB</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td style="text-align:center">Yes</td><td><strong>New today</strong></td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/hades-12-dialect-coverage#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>Full dialect coverage is table stakes for OHDSI platform interoperability, but coverage and correctness are different things. The next steps are:</p>
<ol>
<li>
<p><strong>Integration testing</strong> — We need to validate the PHP translator against real CDM queries on at least SQL Server and Oracle, the two most common non-PostgreSQL OMOP deployments in clinical research networks.</p>
</li>
<li>
<p><strong>Federated study execution</strong> — With the connection plumbing in place, the goal is to demonstrate a study that federates across two different database platforms within Parthenon's study execution framework.</p>
</li>
<li>
<p><strong>DuckDB for local development</strong> — DuckDB could replace the PostgreSQL dependency for developers who want to run Parthenon locally without a full database server. A lightweight CDM loader that writes to a DuckDB file would dramatically simplify onboarding.</p>
</li>
</ol>
<p>The OHDSI ecosystem's commitment to database agnosticism is one of its strongest differentiators. Parthenon now fully inherits that capability — 12 dialects, two translation layers, one unified research platform.</p>]]></content:encoded>
            <category>ohdsi</category>
            <category>hades</category>
            <category>database</category>
            <category>sql</category>
            <category>architecture</category>
            <category>interoperability</category>
        </item>
        <item>
            <title><![CDATA[CI Green at Last: Codebase Hardening, AtlanticHealth Synthesis, and a 147-Test Renaissance]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-22</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-22</guid>
            <pubDate>Sun, 22 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[After months of a perpetually red CI pipeline, today marks a turning point for Parthenon: 92 commits, a full-spectrum codebase review, a complete AtlanticHealth patient synthesis pipeline, and — most satisfying of all — every CI job green. Here's how we got there.]]></description>
            <content:encoded><![CDATA[<p>After months of a perpetually red CI pipeline, today marks a turning point for Parthenon: 92 commits, a full-spectrum codebase review, a complete AtlanticHealth patient synthesis pipeline, and — most satisfying of all — every CI job green. Here's how we got there.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-ci-pipeline-was-never-green-until-today">The CI Pipeline Was Never Green (Until Today)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#the-ci-pipeline-was-never-green-until-today" class="hash-link" aria-label="Direct link to The CI Pipeline Was Never Green (Until Today)" title="Direct link to The CI Pipeline Was Never Green (Until Today)">​</a></h2>
<p>The most impactful work today was a ~6-hour, five-phase codebase hardening sprint that touched virtually every layer of the stack. The starting state was grim: CI failing on every push, 6% test coverage, and four files well past their size limits. The ending state: all six CI jobs passing, 147 new tests written, and a documented methodology for keeping things that way.</p>
<p>The failure modes were stacking and masking each other, which made the pipeline feel intractable. Once we untangled them, the root causes were addressable one by one:</p>
<ul>
<li><strong>37 TypeScript errors</strong> in the investigation module — mostly Lucide icon casting issues, incorrect property access on <code>PaginatedResponse</code> (<code>.data</code> vs <code>.items</code>), and <code>useRef</code> strict mode violations. Fixed with proper <code>LucideProps</code> typing and a pass to remove dead code.</li>
<li><strong>80+ Pint code style violations</strong> — Pint 1.29 quietly introduced the <code>fully_qualified_strict_types</code> rule. We resolved these by running auto-format through a Docker Pint container pinned to the same version as CI, ensuring parity. The final straggler — a <code>single_quote</code> and <code>unary_operator</code> violation in <code>MorpheusPatientService</code> — was cleaned up in commit <code>7ad77af</code>.</li>
<li><strong>11 PHPStan errors</strong> outside the baseline — caused by the strict_types changes shuffling what PHPStan was tracking. Regenerated the baseline (33 → 31 known errors) and committed it cleanly.</li>
<li><strong>6 Python test failures</strong> — the FastAPI app was still using the deprecated <code>@app.on_event("startup")</code> pattern. Migrated to the modern <code>lifespan</code> context manager.</li>
<li><strong>CI database schema mismatches</strong> — the CI environment was still referencing legacy schema names (<code>vocab</code>, <code>cdm</code>, <code>achilles_results</code>) instead of the current ones (<code>omop</code>, <code>results</code>, <code>gis</code>). A PostGIS extension failure was also aborting migration transactions mid-run.</li>
</ul>
<p>The fix methodology is now codified as an internal ADR so future contributors have a clear playbook when CI goes red.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="atlantichealth-synthesis-pipeline-3250-patients-mimic-standard">AtlanticHealth Synthesis Pipeline: 3,250 Patients, MIMIC-Standard<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#atlantichealth-synthesis-pipeline-3250-patients-mimic-standard" class="hash-link" aria-label="Direct link to AtlanticHealth Synthesis Pipeline: 3,250 Patients, MIMIC-Standard" title="Direct link to AtlanticHealth Synthesis Pipeline: 3,250 Patients, MIMIC-Standard">​</a></h2>
<p>On the data generation side, we shipped a complete AtlanticHealth synthesis pipeline today. The headline: <strong>3,250 synthetic patients with full MIMIC-standard data</strong>, generated end-to-end through a multi-phase pipeline.</p>
<p>Phases 4–7 were added to cover the full clinical picture: procedure events, microbiology results, and input/output events. Earlier phases handle the patient cohort, admissions, and diagnoses. The result is a realistic, MIMIC-schema-compatible dataset sourced from AtlanticHealth's structure — which required adapting the <code>labevents</code>, <code>chartevents</code>, and <code>transfers</code> queries to match AtlanticHealth's actual schema (commit <code>c5f05e83</code>).</p>
<p>We also cleaned up <code>\\N</code> bulk-import artifacts left over from PostgreSQL <code>COPY</code> operations on AtlanticHealth source data (commit <code>37b871063</code>). These null-sentinel strings were leaking into text fields and causing downstream parsing issues — a subtle bug that would have been painful to debug later in the OMOP conversion layer.</p>
<p>This synthetic dataset is foundational: it gives us a realistic, large-scale cohort for testing the Morpheus ETL pipeline without touching any real patient data.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="morpheus-ux-dataset-parameter-persistence">Morpheus UX: Dataset Parameter Persistence<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#morpheus-ux-dataset-parameter-persistence" class="hash-link" aria-label="Direct link to Morpheus UX: Dataset Parameter Persistence" title="Direct link to Morpheus UX: Dataset Parameter Persistence">​</a></h2>
<p>A smaller but user-facing fix worth calling out: the <code>dataset</code> query parameter was being dropped when users switched tabs or clicked breadcrumb navigation inside Morpheus. This meant the UI would silently lose context, forcing users to re-select their dataset. The fix ensures the parameter is persisted through tab switches and breadcrumb navigation — a subtle but frustrating UX regression that's now resolved (commit <code>36222e5</code>).</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="codebase-architecture-adrs-docs-and-decomposition">Codebase Architecture: ADRs, Docs, and Decomposition<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#codebase-architecture-adrs-docs-and-decomposition" class="hash-link" aria-label="Direct link to Codebase Architecture: ADRs, Docs, and Decomposition" title="Direct link to Codebase Architecture: ADRs, Docs, and Decomposition">​</a></h2>
<p>Part of the hardening sprint involved structural improvements that won't show up in feature metrics but matter enormously for maintainability:</p>
<ul>
<li><strong>8 Architecture Decision Records (ADRs)</strong> written, covering decisions that were previously implicit or tribal knowledge.</li>
<li><strong>11 new documentation pages</strong> across five previously underdocumented modules.</li>
<li><strong>4 oversized files decomposed</strong> — each was more than 3× the project's file size guideline. Breaking these apart improves testability and makes the codebase easier to navigate.</li>
<li><strong>Docker hardening</strong> — the development and CI Docker configurations were reviewed and tightened.</li>
</ul>
<p>Going from zero ADRs to eight in a single session is a significant knowledge capture moment. These documents will pay dividends the next time someone asks "why does it work this way?"</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="dependency-updates">Dependency Updates<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#dependency-updates" class="hash-link" aria-label="Direct link to Dependency Updates" title="Direct link to Dependency Updates">​</a></h2>
<p>We also rolled forward several key dependencies today:</p>
<ul>
<li><strong>Vite 8</strong> and <strong>plugin-react 6</strong> — keeping the frontend build toolchain current.</li>
<li><strong>Ollama 0.6</strong> and <strong>LangChain 1</strong> — AI integration libraries bumped to latest stable.</li>
<li><strong>sentence-transformers</strong> and <strong>transformers</strong> — Python AI requirements updated.</li>
<li><strong>laravel/tinker 3.0.0</strong> — bumped from 2.11.1.</li>
</ul>
<p>None of these are risky upgrades in isolation, but doing them together while CI is green (rather than red) makes it much easier to catch any regressions they introduce.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-22#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>With CI green and a solid synthetic dataset in hand, the immediate priorities are:</p>
<ol>
<li><strong>OMOP ETL validation</strong> — run the AtlanticHealth synthetic cohort through the Morpheus OMOP conversion pipeline and validate concept mapping coverage.</li>
<li><strong>Test coverage growth</strong> — 147 new tests is a great start from 6%, but we want to reach a meaningful floor (targeting 40%+) before the next major feature push.</li>
<li><strong>PHPStan baseline reduction</strong> — the 31 known errors in the baseline are technical debt. Now that CI is stable, we can chip away at these systematically.</li>
<li><strong>Investigation module hardening</strong> — the TypeScript fixes today were correctness patches; a deeper review of the investigation module's data flow is warranted.</li>
</ol>
<p>Today was a grind in the best sense — the kind of session where you clear out months of accumulated friction and leave the codebase meaningfully better for everyone who touches it next.</p>]]></content:encoded>
            <category>development</category>
            <category>backend</category>
            <category>frontend</category>
            <category>infrastructure</category>
            <category>testing</category>
            <category>database</category>
            <category>ai</category>
        </item>
        <item>
            <title><![CDATA[Keeping the Lights On: Documentation Sync and Daily Dev Log Infrastructure]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-23</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-23</guid>
            <pubDate>Sun, 22 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A quieter day on the Parthenon platform — today's commits centered on documentation infrastructure rather than feature work, with automated help content synchronization and the daily development blog pipeline keeping the platform's knowledge base fresh and up to date.]]></description>
            <content:encoded><![CDATA[<p>A quieter day on the Parthenon platform — today's commits centered on documentation infrastructure rather than feature work, with automated help content synchronization and the daily development blog pipeline keeping the platform's knowledge base fresh and up to date.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="documentation-infrastructure-the-unsung-hero">Documentation Infrastructure: The Unsung Hero<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-23#documentation-infrastructure-the-unsung-hero" class="hash-link" aria-label="Direct link to Documentation Infrastructure: The Unsung Hero" title="Direct link to Documentation Infrastructure: The Unsung Hero">​</a></h2>
<p>It's easy to overlook documentation commits in favor of splashier feature work, but today's two commits to the Parthenon repository represent something quietly important: the machinery that keeps developers and users informed is itself being maintained and improved.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="auto-sync-help-content-bcb10bbf4">Auto-Sync Help Content (<code>bcb10bbf4</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-23#auto-sync-help-content-bcb10bbf4" class="hash-link" aria-label="Direct link to auto-sync-help-content-bcb10bbf4" title="Direct link to auto-sync-help-content-bcb10bbf4">​</a></h3>
<p>The first commit landed just before midnight on March 21st — an auto-sync of the platform's help documentation. This kind of automated synchronization is a cornerstone of keeping a living platform like Parthenon from developing the dreaded documentation drift, where the actual behavior of the system and what the docs say about it gradually diverge until they're barely recognizable as describing the same product.</p>
<p>For a unified OHDSI outcomes research platform serving healthcare analysts, accurate help content isn't just a nice-to-have. Researchers relying on Parthenon to configure cohort definitions, execute population-level effect estimation studies, or interpret characterization outputs need to trust that the guidance they're reading reflects the system they're actually using. An auto-sync pipeline that pulls help content in lockstep with the platform itself is a meaningful safeguard against confusion downstream.</p>
<p>If you're working in this area of the codebase, the auto-sync mechanism is worth understanding well. It ensures that whenever platform behavior changes — whether through a backend update, a new analytics module, or a UI workflow revision — the corresponding help text follows automatically rather than waiting for someone to remember to update it manually.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="daily-dev-blog-post-cd6dadff6">Daily Dev Blog Post (<code>cd6dadff6</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-23#daily-dev-blog-post-cd6dadff6" class="hash-link" aria-label="Direct link to daily-dev-blog-post-cd6dadff6" title="Direct link to daily-dev-blog-post-cd6dadff6">​</a></h3>
<p>The second commit adds today's daily development blog post to the repository — which is, in a pleasantly recursive way, the post you're reading right now. The dev blog pipeline is part of how the Acumenus team maintains transparency about ongoing development, giving both internal collaborators and external platform users a running narrative of what's changing and why.</p>
<p>Keeping this cadence going on quieter days matters just as much as on the days when major features land. A consistent record of even incremental progress — documentation updates, dependency bumps, infrastructure maintenance — tells a more honest story of how a platform like Parthenon actually evolves over time. It also creates a searchable audit trail that's proven useful more than once when tracking down <em>when</em> a particular behavior changed or <em>why</em> a certain architectural decision was made.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-documentation-days-matter">Why Documentation Days Matter<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-23#why-documentation-days-matter" class="hash-link" aria-label="Direct link to Why Documentation Days Matter" title="Direct link to Why Documentation Days Matter">​</a></h2>
<p>There's a temptation in developer culture to treat documentation work as lesser than "real" engineering. But for a platform operating in the OHDSI ecosystem — where reproducibility, transparency, and methodological rigor are foundational values — the documentation layer is part of the science, not separate from it.</p>
<p>Parthenon aims to be a unified environment where outcomes researchers can move from study design through execution to result interpretation without leaving the platform. Every time a help article is stale, a workflow is undocumented, or a developer blog goes dark, that unified experience frays a little. Today's commits, modest as they are, push in the right direction.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-23#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>With the documentation infrastructure ticking along reliably, the focus will shift back toward feature and platform work in the coming days. A few areas on the near-term radar:</p>
<ul>
<li><strong>Analytics module development</strong> — continued work on expanding Parthenon's native OHDSI study execution capabilities, with particular attention to result visualization and interpretation workflows.</li>
<li><strong>Platform integrations</strong> — ongoing coordination across the broader Acumenus suite to ensure Parthenon's analytics outputs connect cleanly with companion tools.</li>
<li><strong>Help content expansion</strong> — now that the auto-sync pipeline is running cleanly, there's an opportunity to invest in the <em>content</em> itself, filling gaps in the help documentation for newer platform features.</li>
</ul>
<p>Quiet days are good days. The foundation stays solid, and tomorrow we build on it.</p>]]></content:encoded>
            <category>development</category>
            <category>analytics</category>
        </item>
        <item>
            <title><![CDATA[Welcome to Acropolis: One Command from Clone to Production]]></title>
            <link>http://localhost:8082/docs/blog/welcome-to-acropolis</link>
            <guid>http://localhost:8082/docs/blog/welcome-to-acropolis</guid>
            <pubDate>Sat, 21 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Eighteen Docker services. Three environment files. A reverse proxy with auto-TLS. Database admin GUI. Container management dashboard. Enterprise SSO. And if you want the full stack? One command:]]></description>
            <content:encoded><![CDATA[<p>Eighteen Docker services. Three environment files. A reverse proxy with auto-TLS. Database admin GUI. Container management dashboard. Enterprise SSO. And if you want the full stack? One command:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">python3 install.py --with-infrastructure</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This is the story of how we built Acropolis — the infrastructure layer that turns Parthenon from a research application into a production platform — and what we learned when we decided to ship it inside the same repository.</p>
<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/aqueduct.jpg" alt="Roman aqueduct — Acropolis infrastructure" style="width:100%;display:block"></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-infrastructure-belongs-in-the-application">Why Infrastructure Belongs in the Application<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#why-infrastructure-belongs-in-the-application" class="hash-link" aria-label="Direct link to Why Infrastructure Belongs in the Application" title="Direct link to Why Infrastructure Belongs in the Application">​</a></h2>
<p>For two months, Parthenon ran in production with a manual deployment story. Apache sat in front, configured by hand. No auto-TLS — I renewed certificates manually. No container management UI — if a researcher reported a problem, I SSHed in and ran <code>docker compose ps</code>. No centralized log view — I <code>grep</code>'d through container logs one at a time.</p>
<p>This works when you're the only operator. It stops working the moment someone else needs to deploy it.</p>
<p>The OHDSI community has a deployment problem that mirrors ours. Atlas requires a WebAPI backend (Java), an R runtime, a CDM database, and a web server. Each has its own configuration. Most institutions spend weeks getting Atlas running, and many never get past the installation phase. We built Parthenon to collapse that complexity. But we'd only collapsed the <em>application</em> complexity — the infrastructure was still manual.</p>
<p>So we built Acropolis.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-acropolis-provides">What Acropolis Provides<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#what-acropolis-provides" class="hash-link" aria-label="Direct link to What Acropolis Provides" title="Direct link to What Acropolis Provides">​</a></h2>
<p>Acropolis is not a separate application. It's a production infrastructure layer that wraps Parthenon with everything an operator needs:</p>
<table><thead><tr><th>Layer</th><th>Service</th><th>What It Does</th></tr></thead><tbody><tr><td><strong>Reverse Proxy</strong></td><td>Traefik v3.3</td><td>Auto-TLS via Let's Encrypt, subdomain routing for every service, HTTP→HTTPS redirect</td></tr><tr><td><strong>Container Management</strong></td><td>Portainer CE</td><td>Web GUI for Docker — restart containers, view logs, manage volumes</td></tr><tr><td><strong>Database Admin</strong></td><td>pgAdmin 4</td><td>Pre-configured with Parthenon's PostgreSQL connection</td></tr><tr><td><strong>Workflow Automation</strong></td><td>n8n</td><td>ETL pipelines, quality check automation, alerting (Enterprise)</td></tr><tr><td><strong>BI Dashboards</strong></td><td>Apache Superset 4.1</td><td>SQL analytics and visualization over OMOP CDM data (Enterprise)</td></tr><tr><td><strong>Data Catalog</strong></td><td>DataHub v0.15</td><td>Track data lineage from raw sources through OMOP to analysis outputs (Enterprise)</td></tr><tr><td><strong>SSO</strong></td><td>Authentik 2025.2</td><td>SAML/OIDC identity provider for all services (Enterprise)</td></tr></tbody></table>
<p>Two editions: <strong>Community</strong> (Traefik + Portainer + pgAdmin, free under Apache 2.0) and <strong>Enterprise</strong> (adds n8n, Superset, DataHub, Authentik — license-gated).</p>
<p>After installation, every service gets a subdomain:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://parthenon.acumenus.net     — The research platform</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://portainer.acumenus.net     — Container management</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://pgadmin.acumenus.net       — Database administration</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://grafana.acumenus.net       — Monitoring dashboards</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://ai.acumenus.net            — AI service (MedGemma)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://jupyter.acumenus.net       — JupyterHub notebooks</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://solr.acumenus.net          — Search administration</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://darkstar.acumenus.net      — R analytics runtime</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://n8n.acumenus.net           — Workflow automation (Enterprise)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">https://superset.acumenus.net      — BI dashboards (Enterprise)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>All with automatic TLS certificates. No nginx config files. No manual cert rotation.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-two-repo-problem">The Two-Repo Problem<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#the-two-repo-problem" class="hash-link" aria-label="Direct link to The Two-Repo Problem" title="Direct link to The Two-Repo Problem">​</a></h2>
<p>Acropolis started as a separate repository. The logic was clean: Parthenon is the application, Acropolis is the infrastructure. Separate concerns, separate repos, separate release cycles.</p>
<p>In practice, this created a coordination nightmare.</p>
<p>The Acropolis installer needed to know Parthenon's container names. Parthenon's compose file defined them. If we renamed a service — say, <code>r-runtime</code> became <code>darkstar</code> — the Acropolis service registry broke silently. Traefik routed to a container that no longer existed.</p>
<p>The Acropolis installer also needed to run Parthenon's installer. We solved this with a subprocess call:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)"># The old way: Acropolis shelling out to Parthenon</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">subprocess</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">run</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"python3"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">parthenon_path </span><span class="token operator" style="color:hsl(207, 82%, 66%)">/</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"install.py"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"--defaults-file"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(95, 38%, 62%)">str</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">defaults_file</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This meant Acropolis had to clone or locate the Parthenon repo, manage path resolution across the two repos, pass credentials through a temporary JSON file, and then detect what Parthenon's installer had done after the fact. Four topology modes — <code>fresh_install</code>, <code>local</code>, <code>remote</code>, <code>standalone</code> — each with different code paths.</p>
<p>When we tested this on a VM, three bugs surfaced in the first run:</p>
<ol>
<li>
<p><strong>Port detection used <code>bind()</code> instead of <code>connect_ex()</code></strong> — <code>bind()</code> requires elevated privileges on ports below 1024. Acropolis couldn't check if ports 80 and 443 were free on a fresh Ubuntu 24.04 install.</p>
</li>
<li>
<p><strong>Docker Compose prefixed the network name</strong> — Parthenon's compose file declared a network called <code>parthenon</code>, but Docker Compose automatically prepended the project name, creating <code>parthenon_parthenon</code>. Acropolis checked for <code>parthenon</code> and didn't find it.</p>
</li>
<li>
<p><strong>Internal services flagged as "unknown"</strong> — PHP, PostgreSQL, Redis, Horizon, and other non-routable containers showed up in Docker network inspection but weren't in the curated service registry. The installer prompted the user to configure Traefik routes for <code>parthenon-php</code> — a backend container that should never be exposed.</p>
</li>
</ol>
<p>All three were fixable. But they were symptoms of a deeper issue: <strong>two repos that had to agree on implementation details but couldn't enforce that agreement at build time.</strong></p>
<p>The final straw was a port mismatch we discovered during the consolidation. The Acropolis service registry listed nginx at port 8082:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"nginx"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-nginx"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">8082</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"always"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>But 8082 is the <em>host-mapped</em> port. Inside the Docker network — where Traefik connects — nginx listens on port 80. The static Traefik config file (<code>traefik/dynamic/parthenon.yml</code>) had the correct port, because it was written by hand. But the auto-generator in <code>routing.py</code> read from the registry and would produce:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)"># Wrong — 8082 is the host port, not the container port</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">services</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">parthenon-parthenon</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">loadBalancer</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">      </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">servers</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">        </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain"> </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">url</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"http://parthenon-nginx:8082"</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The same mismatch existed for three other services: <code>python-ai</code> (8002 vs 8000), <code>morpheus-ingest</code> (8004 vs 8000), and <code>jupyterhub</code> (8888 vs 8000). All had host-mapped ports in the registry where container-internal ports belonged.</p>
<p>This class of bug is invisible in manual testing — the static config works fine. It only surfaces when the auto-generator runs during a fresh installation. And it would have been impossible if both the container definitions and the service registry lived in the same repository.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-consolidation">The Consolidation<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#the-consolidation" class="hash-link" aria-label="Direct link to The Consolidation" title="Direct link to The Consolidation">​</a></h2>
<p>We moved everything into <code>Parthenon/acropolis/</code>:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">acropolis/</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── installer/              14 Python modules (~2,000 lines)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── cli.py              Phase orchestrator</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── topology.py         Parthenon detection (simplified to local-only)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── editions.py         Community / Enterprise selection</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── discovery.py        24-service curated registry</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── config.py           Domain, TLS, credentials collection</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── network.py          Docker network bridging</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── deploy.py           Docker compose orchestration + health polling</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── routing.py          Traefik dynamic config generation</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── generator.py        Day-2 CLI script generator</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── verify.py           Post-install smoke tests</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── preflight.py        System validation</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   ├── state.py            Resume-on-failure state machine</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│   └── utils.py            Docker, network, and password utilities</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── docker-compose.base.yml       Traefik + acropolis_network</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── docker-compose.community.yml  Portainer + pgAdmin</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── docker-compose.enterprise.yml n8n + Superset + DataHub + Authentik</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── traefik/                      Static + dynamic route configs</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── config/                       pgAdmin servers, Superset config</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">├── k8s/                          Helm charts + Kustomize overlays</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└── tests/                        6 unit test files + smoke test</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The key architectural changes:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="direct-import-instead-of-subprocess">Direct Import Instead of Subprocess<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#direct-import-instead-of-subprocess" class="hash-link" aria-label="Direct link to Direct Import Instead of Subprocess" title="Direct link to Direct Import Instead of Subprocess">​</a></h3>
<p>The Acropolis installer now imports Parthenon's installer as a Python module:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)"># The new way: direct import in the same repo</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token keyword" style="color:hsl(286, 60%, 67%)">from</span><span class="token plain"> installer</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">cli </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">import</span><span class="token plain"> run </span><span class="token keyword" style="color:hsl(286, 60%, 67%)">as</span><span class="token plain"> run_parthenon_installer</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">run_parthenon_installer</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">pre_seed</span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"admin_email"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> config</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parthenon_admin_email</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"admin_name"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> config</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parthenon_admin_name</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"admin_password"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> config</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parthenon_admin_password</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"app_url"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:hsl(95, 38%, 62%)">f"https://parthenon.</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token string-interpolation interpolation">config</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token string-interpolation interpolation">domain</span><span class="token string-interpolation interpolation punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token string-interpolation string" style="color:hsl(95, 38%, 62%)">"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"timezone"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> config</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">timezone</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>No temporary credentials file. No path resolution. No subprocess exit code interpretation. If the Parthenon installer raises an exception, the Acropolis installer catches it in the same process.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="topology-simplified-to-local-only">Topology Simplified to Local-Only<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#topology-simplified-to-local-only" class="hash-link" aria-label="Direct link to Topology Simplified to Local-Only" title="Direct link to Topology Simplified to Local-Only">​</a></h3>
<p>The four topology modes collapsed into one. In a monorepo, Parthenon is always the parent directory:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">ACROPOLIS_ROOT </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> Path</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token plain">__file__</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">resolve</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parent</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parent  </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># acropolis/</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">PARTHENON_ROOT </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> ACROPOLIS_ROOT</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">.</span><span class="token plain">parent                    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Parthenon/</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The <code>fresh_install</code> mode (clone Parthenon from GitHub) is gone. The <code>remote</code> mode (connect to Parthenon on another host) is gone. The <code>standalone</code> mode (Acropolis without Parthenon) is gone. The installer detects whether Parthenon's containers are already running and installs them if not.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="stable-network-name">Stable Network Name<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#stable-network-name" class="hash-link" aria-label="Direct link to Stable Network Name" title="Direct link to Stable Network Name">​</a></h3>
<p>We added <code>name: parthenon</code> to the Docker network definition:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token key atrule" style="color:hsl(29, 54%, 61%)">networks</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">parthenon</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">name</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> parthenon    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Prevents Docker from prefixing as "parthenon_parthenon"</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token key atrule" style="color:hsl(29, 54%, 61%)">driver</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"> bridge</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This one line eliminated the network detection logic that had to check three candidate names.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-service-registry-24-containers-mapped">The Service Registry: 24 Containers, Mapped<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#the-service-registry-24-containers-mapped" class="hash-link" aria-label="Direct link to The Service Registry: 24 Containers, Mapped" title="Direct link to The Service Registry: 24 Containers, Mapped">​</a></h2>
<p>Acropolis maintains a curated registry of every Parthenon container. This registry drives two things: Traefik route generation and post-install health checks.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">CURATED_SERVICES </span><span class="token operator" style="color:hsl(207, 82%, 66%)">=</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Routable — exposed through Traefik with subdomains</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"nginx"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">           </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-nginx"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">           </span><span class="token number" style="color:hsl(29, 54%, 61%)">80</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">"always"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"darkstar"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">        </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-darkstar"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">        </span><span class="token number" style="color:hsl(29, 54%, 61%)">8787</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"darkstar"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token string" style="color:hsl(95, 38%, 62%)">"always"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"python-ai"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">       </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-ai"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">              </span><span class="token number" style="color:hsl(29, 54%, 61%)">8000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"ai"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">           </span><span class="token string" style="color:hsl(95, 38%, 62%)">"always"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"morpheus-ingest"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-morpheus-ingest"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">8000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"morpheus"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token string" style="color:hsl(95, 38%, 62%)">"always"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"solr"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">            </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-solr"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">            </span><span class="token number" style="color:hsl(29, 54%, 61%)">8983</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"solr"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">         </span><span class="token string" style="color:hsl(95, 38%, 62%)">"if_running"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"jupyterhub"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">      </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-jupyterhub"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">      </span><span class="token number" style="color:hsl(29, 54%, 61%)">8000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"jupyter"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">      </span><span class="token string" style="color:hsl(95, 38%, 62%)">"if_running"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"grafana"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">         </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-grafana"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">         </span><span class="token number" style="color:hsl(29, 54%, 61%)">3000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"grafana"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">      </span><span class="token string" style="color:hsl(95, 38%, 62%)">"if_running"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"study-agent"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-study-agent"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token number" style="color:hsl(29, 54%, 61%)">8765</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"study-agent"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"if_running"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"hecate"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">          </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-hecate"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">          </span><span class="token number" style="color:hsl(29, 54%, 61%)">8080</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"hecate"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">       </span><span class="token string" style="color:hsl(95, 38%, 62%)">"if_running"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># ... plus 15 more (reverb, prometheus, whiterabbit, fhir-to-cdm, orthanc, etc.)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Internal — recognized but never routed</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"php"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">       </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-php"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">       </span><span class="token number" style="color:hsl(29, 54%, 61%)">9000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">""</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"internal"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"postgres"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-postgres"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token number" style="color:hsl(29, 54%, 61%)">5432</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">""</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"internal"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"redis"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-redis"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">     </span><span class="token number" style="color:hsl(29, 54%, 61%)">6379</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">""</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"internal"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"horizon"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">   </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-horizon"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">   </span><span class="token number" style="color:hsl(29, 54%, 61%)">0</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">    </span><span class="token string" style="color:hsl(95, 38%, 62%)">""</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"internal"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    CuratedService</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">(</span><span class="token string" style="color:hsl(95, 38%, 62%)">"chromadb"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-chromadb"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain">  </span><span class="token number" style="color:hsl(29, 54%, 61%)">8000</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">""</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"internal"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">)</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># ... plus 4 more monitoring containers</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Every port in this registry is the <strong>container-internal</strong> port — the one Traefik connects to over the Docker network. Not the host-mapped port. This distinction cost us four bugs before we learned it.</p>
<p>The registry also drives auto-discovery. When the installer scans the Docker network, it matches running containers against this list. Known containers get their predefined subdomain. Unknown containers prompt the user: "Expose <code>parthenon-custom-service</code> through Traefik?"</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-network-bridge">The Network Bridge<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#the-network-bridge" class="hash-link" aria-label="Direct link to The Network Bridge" title="Direct link to The Network Bridge">​</a></h2>
<p>Parthenon and Acropolis services run on separate Docker networks. This is intentional — Parthenon's internal services (PHP, PostgreSQL, Redis) should not be accessible from the Acropolis network, and vice versa.</p>
<p>The bridge works through selective attachment. During Phase 6 (Network Setup), the installer connects only the <em>routable</em> Parthenon containers to <code>acropolis_network</code>:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">┌─────────────────────────────────────────────────────────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                    acropolis_network                      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  traefik ──→ parthenon-nginx:80                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│          ──→ parthenon-darkstar:8787                     │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│          ──→ parthenon-ai:8000                           │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│          ──→ parthenon-grafana:3000                      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│          ──→ parthenon-solr:8983                         │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  portainer ──→ /var/run/docker.sock                      │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  pgadmin   ──→ host.docker.internal:5432                 │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└─────────────────────────────────────────────────────────┘</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">┌─────────────────────────────────────────────────────────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                    parthenon (internal)                   │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  nginx ↔ php ↔ postgres ↔ redis ↔ horizon               │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  python-ai ↔ chromadb ↔ study-agent                     │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  darkstar ↔ solr ↔ hecate ↔ qdrant                     │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└─────────────────────────────────────────────────────────┘</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Containers like <code>parthenon-nginx</code> exist on both networks simultaneously. They can reach internal services via the <code>parthenon</code> network and receive external traffic from Traefik via <code>acropolis_network</code>.</p>
<p>This is rolled back automatically if the installation fails. The rollback disconnects each container from <code>acropolis_network</code> and removes the network if it was created during installation.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-unified-installer">The Unified Installer<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#the-unified-installer" class="hash-link" aria-label="Direct link to The Unified Installer" title="Direct link to The Unified Installer">​</a></h2>
<p>The final product is a single entry point with two modes:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token comment" style="color:hsl(220, 10%, 40%)"># Application only — local development, no infrastructure overhead</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">python3 install.py</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Full stack — application + infrastructure (Traefik, Portainer, pgAdmin, Enterprise)</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">python3 install.py --with-infrastructure</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The <code>--with-infrastructure</code> flag runs the Acropolis orchestrator, which in turn calls the Parthenon installer internally. The combined flow:</p>
<p><strong>Acropolis Phase 1</strong> — Preflight: Docker version, daemon running, ports 80/443 free, disk space.</p>
<p><strong>Acropolis Phase 2</strong> — Topology: Detect whether Parthenon is already running. If yes, skip Parthenon installation. If no, run it after configuration.</p>
<p><strong>Acropolis Phase 3</strong> — Edition: Community or Enterprise. Enterprise requires a license key (<code>ACRO-XXXX-XXXX-XXXX</code> format).</p>
<p><strong>Acropolis Phase 4</strong> — Service Discovery: Enumerate running Parthenon containers and match against the 24-service registry.</p>
<p><strong>Acropolis Phase 5</strong> — Configuration: Domain, TLS mode (Let's Encrypt, self-signed, or none), timezone, per-service credentials (pgAdmin, Portainer, and optionally n8n, Superset, DataHub, Authentik).</p>
<p><strong>Parthenon Phases 1-9</strong> — If Parthenon isn't running: preflight, configuration (pre-seeded from Acropolis), Docker pull/build/start, Laravel bootstrap (composer, migrate, seed), Eunomia demo data, frontend build, Solr indexing, admin account creation.</p>
<p><strong>Acropolis Phase 6</strong> — Network: Create <code>acropolis_network</code>, connect routable Parthenon containers.</p>
<p><strong>Acropolis Phase 7</strong> — Deploy: <code>docker compose up -d</code> for infrastructure services, health polling with live-updating terminal table.</p>
<p><strong>Acropolis Phase 8</strong> — Routing: Generate Traefik dynamic configs for every discovered service. WebSocket support for Laravel Reverb. Security headers and compression middleware.</p>
<p><strong>Acropolis Phase 9</strong> — Verification: Smoke test every service. Generate <code>acropolis.sh</code> day-2 operations script. Display URL matrix with credentials.</p>
<p>State persistence through <code>.install-state.json</code> means any phase can fail and the installer resumes from the last completed phase. Credentials never touch the state file — they're stored separately in <code>.install-credentials</code> with <code>chmod 0600</code>.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="day-2-operations">Day-2 Operations<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#day-2-operations" class="hash-link" aria-label="Direct link to Day-2 Operations" title="Direct link to Day-2 Operations">​</a></h2>
<p>After installation, the generated <code>acropolis.sh</code> script handles ongoing operations:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh up              </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Start infrastructure services</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh down            </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Stop everything</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh status          </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Health overview of all services</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh logs </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">[</span><span class="token plain">service</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">]</span><span class="token plain">  </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Follow logs</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh urls            </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Print full URL matrix</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh backup          </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Backup all volumes to timestamped archives</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh smoke-test      </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Re-run health checks</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">./acropolis.sh update          </span><span class="token comment" style="color:hsl(220, 10%, 40%)"># Pull latest images and restart</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This script is standalone bash with embedded configuration — no Python dependency for day-2 ops. It knows which compose files to use (base, community, enterprise) and which domain to display in URLs.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-we-learned">What We Learned<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#what-we-learned" class="hash-link" aria-label="Direct link to What We Learned" title="Direct link to What We Learned">​</a></h2>
<p><strong>Host ports and container ports are different.</strong> Obvious in hindsight. Docker's <code>8082:80</code> mapping means port 8082 on the host, port 80 in the container. Traefik, running inside Docker, connects via the Docker network to port 80. We got this wrong in four services and found it only during consolidation — the static Traefik config had been hand-written with the correct ports, masking the bug in the auto-generator.</p>
<p><strong>Docker Compose network naming is unpredictable.</strong> A network declared as <code>parthenon</code> in a compose file whose project name is also <code>parthenon</code> becomes <code>parthenon_parthenon</code>. Adding <code>name: parthenon</code> to the network definition forces the exact name. One YAML line saved us a detection function that checked three candidate names.</p>
<p><strong>Monorepos enforce interface contracts.</strong> When the service registry and the container definitions live in the same repository, a rename shows up in <code>git diff</code>. When they're in separate repos, it shows up in production. We caught four port mismatches, one network name issue, and would have caught the container rename from <code>r-runtime</code> to <code>darkstar</code> automatically if we'd been in a monorepo from the start.</p>
<p><strong>Subprocess calls across repos are fragile.</strong> Path resolution, environment variable inheritance, exit code semantics, credential passing through temporary files — every one of these is a failure mode. A direct Python import eliminates all of them.</p>
<p><strong>Infrastructure as code means infrastructure in the same repo as the code.</strong> Not in a separate repo that references the code. Not in a wiki page. In the same <code>git blame</code> history, the same CI pipeline, the same pull request review.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/welcome-to-acropolis#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>The Acropolis layer is functional but young. On the roadmap:</p>
<ul>
<li><strong>Authentik SSO integration with Parthenon's Sanctum auth</strong> — OIDC provider in Authentik, client in Laravel, so researchers sign in once for the entire platform.</li>
<li><strong>Pre-built Superset dashboards for OMOP CDM</strong> — demographic breakdowns, condition prevalence, drug utilization, and data quality metrics, all pointing at Parthenon's PostgreSQL.</li>
<li><strong>n8n workflow templates</strong> — automated Achilles runs on new data loads, DQD quality gates, Slack notifications on analysis completion.</li>
<li><strong>Kubernetes Helm chart finalization</strong> — the chart structure exists but values need pinning for production HA deployments.</li>
</ul>
<p>The <code>Acropolis-v2</code> repository is now deprecated. All development continues in the Parthenon monorepo under <code>acropolis/</code>.</p>
<hr>
<p>If you're deploying Parthenon and want the full infrastructure stack:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token function" style="color:hsl(207, 82%, 66%)">git</span><span class="token plain"> clone https://github.com/sudoshi/Parthenon.git</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token builtin class-name" style="color:hsl(29, 54%, 61%)">cd</span><span class="token plain"> Parthenon</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">python3 install.py --with-infrastructure</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>That's it. One repo, one command, one platform.</p>]]></content:encoded>
            <category>infrastructure</category>
            <category>acropolis</category>
            <category>traefik</category>
            <category>docker</category>
            <category>devops</category>
            <category>architecture</category>
            <category>installer</category>
            <category>deployment</category>
            <category>portainer</category>
            <category>enterprise</category>
        </item>
        <item>
            <title><![CDATA[The Rise of Darkstar: How We Rebuilt the OHDSI R Runtime for Production]]></title>
            <link>http://localhost:8082/docs/blog/rise-of-darkstar</link>
            <guid>http://localhost:8082/docs/blog/rise-of-darkstar</guid>
            <pubDate>Fri, 20 Mar 2026 23:59:00 GMT</pubDate>
            <description><![CDATA[Every platform has a weak link. For Parthenon, it was the R container.]]></description>
            <content:encoded><![CDATA[<p>Every platform has a weak link. For Parthenon, it was the R container.</p>
<p>PHP handled 200 concurrent API requests without breaking a sweat. Python served AI inference with async workers. PostgreSQL managed million-row queries across six schemas. Redis cached sessions at sub-millisecond latency. And then there was R — single-threaded, fragile, running bare <code>Rscript</code> as PID 1 with no supervision, no timeouts, and a health check that lied.</p>
<p>This is the story of how we tore it down and built <strong>Darkstar</strong> — a production-grade R analytics engine that runs OHDSI HADES analyses concurrently, recovers from crashes automatically, and executes 35% faster than the container it replaced.</p>
<div style="border-radius:12px;overflow:hidden;margin-bottom:2rem"><img src="http://localhost:8082/docs/img/parthenon-hero.jpg" alt="The Parthenon" style="width:100%;display:block"></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-inheritance">The Inheritance<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-inheritance" class="hash-link" aria-label="Direct link to The Inheritance" title="Direct link to The Inheritance">​</a></h2>
<p>Parthenon didn't start from scratch. We inherited the R runtime architecture from <strong>OHDSI Broadsea</strong>, the community's standard Docker deployment for the OMOP CDM analytics stack. Broadsea ships a single R container running Plumber v1 — the venerable HTTP API framework for R that's been the community standard since 2017.</p>
<p>And for what Broadsea was designed to do — run a single analysis at a time on a researcher's laptop — Plumber v1 is perfectly fine. It's simple, well-documented, and every OHDSI tutorial uses it.</p>
<p>But Parthenon isn't a single-user research tool. It's a multi-tenant clinical research platform serving 18 users across multiple institutions, running CohortMethod estimations, PatientLevelPrediction models, Self-Controlled Case Series analyses, Cohort Diagnostics, and Characterization reports against a million-patient OMOP CDM database. Simultaneously.</p>
<p>That's where things fell apart.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-breaking-point">The Breaking Point<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-breaking-point" class="hash-link" aria-label="Direct link to The Breaking Point" title="Direct link to The Breaking Point">​</a></h2>
<p>The first sign of trouble was a Slack message from a researcher: <em>"My estimation has been running for 20 minutes. Is the system down?"</em></p>
<p>It wasn't down. Another user had kicked off a CohortMethod propensity score matching job five minutes earlier. Because Plumber v1 is single-threaded, every subsequent request — health checks, status queries, the second user's estimation — queued behind that first analysis with zero feedback.</p>
<p>Here's what was actually happening inside the container:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">┌──────────────────────────────────────────────────────────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  PID 1: Rscript plumber_api.R                            │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│         └─ plumber v1 (SINGLE THREAD)                    │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│              ├─ /health         → BLOCKED (behind job)   │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│              ├─ /estimation/run → RUNNING (20 min)       │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│              ├─ /prediction/run → QUEUED (no feedback)   │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│              └─ /status         → BLOCKED                │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  Docker health check: curl localhost:8787/health          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│    interval: 600s (TEN MINUTES between checks)           │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│    Response: {"status":"ok"} (even if JVM is dead)       │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                          │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  No JDBC timeouts. No process supervision.               │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  No garbage collection. No crash recovery.               │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└──────────────────────────────────────────────────────────┘</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Over the following weeks, I cataloged five distinct failure modes:</p>
<p><strong>Blocked health checks.</strong> Docker's health probe couldn't reach the <code>/health</code> endpoint because the single thread was locked in a Cox regression. After 5 retries at 600-second intervals (50 minutes!), Docker finally marked the container unhealthy. But by then, the analysis had probably finished — and the restart killed the cleanup.</p>
<p><strong>Ghost containers.</strong> With 10-minute health check intervals, a crashed R process sat undetected. The Laravel backend got <code>connection refused</code> errors and returned generic 500s. Users saw "analysis failed" with no explanation.</p>
<p><strong>Hung JDBC connections.</strong> Twice I watched the R process freeze completely — not crashed, not high-CPU, just stuck. <code>strace</code> showed it blocked on a socket read to PostgreSQL with no timeout set. The database had closed the connection during a long-running covariate extraction, but R didn't know. The only fix was <code>docker compose restart r-runtime</code>, which killed any active analysis.</p>
<p><strong>Unsafe disconnects.</strong> <code>DatabaseConnector::disconnect()</code> throws if the connection is already dead. Several endpoint files had bare <code>disconnect()</code> calls in their cleanup code. A disconnect error would mask the actual analysis result and return a 500 to the user — even though the analysis had completed successfully. The results were computed, stored in R memory, and then lost because the HTTP response errored on cleanup.</p>
<p><strong>Memory creep.</strong> Long-running sessions accumulated R objects across requests with no GC strategy. The default JVM garbage collector would pause unpredictably — sometimes 2-5 seconds — during large covariate matrix operations. Eventually the heap ran out and <code>rJava</code> calls started throwing <code>OutOfMemoryError</code>.</p>
<p>I spent weeks applying band-aids: extending health check intervals to avoid false-positive restarts, adding retry logic in Laravel's <code>RService</code>, telling users to "wait for the current analysis to finish." But the core problem was architectural. Plumber v1 is single-threaded by design. No amount of application-level workarounds fixes that.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-decision">The Decision<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-decision" class="hash-link" aria-label="Direct link to The Decision" title="Direct link to The Decision">​</a></h2>
<p>On March 17, 2026, I decided to stop patching and start rebuilding. The goal was simple:</p>
<blockquote>
<p>Replace the entire R runtime infrastructure with something that can handle concurrent requests, recover from crashes, and not lie about its health.</p>
</blockquote>
<p>The constraints were equally clear:</p>
<ul>
<li>Every HADES analysis that worked before must work identically after. Zero breaking changes.</li>
<li>The 12 existing API endpoint files must be portable. We're not rewriting CohortMethod integration.</li>
<li>Memory budget: 32GB container limit, shared between R and the JVM.</li>
<li>Cold start under 2 minutes (HADES package loading is unavoidably heavy).</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-1-stop-the-bleeding-march-4">Phase 1: Stop the Bleeding (March 4)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-1-stop-the-bleeding-march-4" class="hash-link" aria-label="Direct link to Phase 1: Stop the Bleeding (March 4)" title="Direct link to Phase 1: Stop the Bleeding (March 4)">​</a></h2>
<p>Before the big rewrite, I made two immediate changes to buy time.</p>
<p>First, the health check interval dropped from 600 seconds to 30. Three failures at 30-second intervals means the container is marked unhealthy in 90 seconds instead of 50 minutes. I also added <code>start_period: 120s</code> to account for HADES package loading — without this, Docker would kill the container before R even finished booting.</p>
<p>Second, I tuned the JVM. The default garbage collector pauses unpredictably during large operations. Switching to G1GC with <code>MaxGCPauseMillis=200</code> keeps pauses short. Combined with <code>R_MAX_VSIZE=24Gb</code> for the R vector heap, this eliminated the OOM crashes and GC stalls:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token key atrule" style="color:hsl(29, 54%, 61%)">environment</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain"> _JAVA_OPTIONS=</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain">Xmx8g </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain">Xms2g </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain">XX</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain">+UseG1GC </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain">XX</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">:</span><span class="token plain">MaxGCPauseMillis=200</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">-</span><span class="token plain"> R_MAX_VSIZE=24Gb</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-2-an-honest-health-check-march-7">Phase 2: An Honest Health Check (March 7)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-2-an-honest-health-check-march-7" class="hash-link" aria-label="Direct link to Phase 2: An Honest Health Check (March 7)" title="Direct link to Phase 2: An Honest Health Check (March 7)">​</a></h2>
<p>The old health check was four lines of R that returned <code>{"status":"ok"}</code> unconditionally. The JVM could be dead, memory at 95%, JDBC driver missing — and it would still say "ok."</p>
<p>I replaced it with a deep validation endpoint that checks five things on every 30-second probe:</p>
<ol>
<li><strong>HADES packages loadable</strong> — <code>requireNamespace()</code> for CohortMethod, PatientLevelPrediction, DatabaseConnector. Catches corrupted installs.</li>
<li><strong>JVM alive</strong> — actually creates a Java object via <code>rJava::.jnew()</code>. If the heap is exhausted, this fails.</li>
<li><strong>Memory usage</strong> — <code>gc()</code> returns current consumption. Alerts at 87% of the 32GB limit.</li>
<li><strong>JDBC driver present</strong> — verifies <code>/opt/jdbc/postgresql-42.7.3.jar</code> exists. (This was a real bug — the volume mount at <code>/app</code> was clobbering the driver.)</li>
<li><strong>Uptime tracking</strong> — detects unexpected restarts. If uptime drops to zero when nobody restarted the container, something crashed.</li>
</ol>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"status"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"ok"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"service"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"parthenon-r-runtime"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"version"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"0.2.0"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"r_version"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token string" style="color:hsl(95, 38%, 62%)">"4.4.3"</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"uptime_seconds"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">3847</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token property" style="color:hsl(355, 65%, 65%)">"checks"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"packages"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">true</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"jvm"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">true</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"memory_used_mb"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token number" style="color:hsl(29, 54%, 61%)">4821.3</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"memory_ok"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">true</span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    </span><span class="token property" style="color:hsl(355, 65%, 65%)">"jdbc_driver"</span><span class="token operator" style="color:hsl(207, 82%, 66%)">:</span><span class="token plain"> </span><span class="token boolean" style="color:hsl(29, 54%, 61%)">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  </span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain"></span><span class="token punctuation" style="color:hsl(220, 14%, 71%)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>When any check fails, <code>status</code> changes to <code>"degraded"</code>. Docker still gets a 200 (so it doesn't restart mid-analysis), but the Laravel backend knows not to submit new work.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-3-jdbc-timeouts-march-8">Phase 3: JDBC Timeouts (March 8)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-3-jdbc-timeouts-march-8" class="hash-link" aria-label="Direct link to Phase 3: JDBC Timeouts (March 8)" title="Direct link to Phase 3: JDBC Timeouts (March 8)">​</a></h2>
<p>The hung-connection problem was insidious. R would issue a SQL query, the database would close the connection during a long covariate extraction, and R would sit on a socket read forever. No timeout. No error. Just silence.</p>
<p>I added explicit JDBC timeouts to every PostgreSQL connection string:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">socketTimeout=300        # Kill queries hung at socket level (5 min)</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">connectTimeout=30        # Fail fast if DB unreachable</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">loginTimeout=30          # Fail fast if auth hangs</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">tcpKeepAlive=true        # Detect dead connections via TCP probes</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Then I wrapped every <code>DatabaseConnector::disconnect()</code> call in a <code>tryCatch</code>. There were 10 disconnect call sites across 6 endpoint files. Each one got a <code>safe_disconnect()</code> wrapper that logs the error but doesn't throw — so a dead connection during cleanup never masks a successful analysis result.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-4-async-job-registry-march-10">Phase 4: Async Job Registry (March 10)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-4-async-job-registry-march-10" class="hash-link" aria-label="Direct link to Phase 4: Async Job Registry (March 10)" title="Direct link to Phase 4: Async Job Registry (March 10)">​</a></h2>
<p>Even with health check and timeout improvements, the fundamental problem remained: Plumber v1 is single-threaded. While I planned the full migration to plumber2, I built an interim solution using <code>callr::r_bg()</code>.</p>
<p>The idea: instead of blocking the HTTP thread for 20 minutes, dispatch the analysis to a background R subprocess and return a job ID immediately. The Laravel backend polls for completion.</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">POST /jobs/submit   → dispatch to callr::r_bg(), return {job_id}</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">GET  /jobs/status/X → check if background process finished</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">POST /jobs/cancel/X → kill background process</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Each job runs in its own R process with full HADES environment. The main Plumber thread stays free for health checks and status queries. Job results are cached in memory with a 5-minute TTL.</p>
<p>This was a stopgap, but it proved the pattern that Darkstar would later implement properly with mirai daemons.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-5-the-big-migration-march-17-19">Phase 5: The Big Migration (March 17-19)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-5-the-big-migration-march-17-19" class="hash-link" aria-label="Direct link to Phase 5: The Big Migration (March 17-19)" title="Direct link to Phase 5: The Big Migration (March 17-19)">​</a></h2>
<p>Three days. Complete infrastructure overhaul.</p>
<p><strong>Plumber v1 → Plumber2 0.2.0.</strong> Plumber2 is the async-first successor to Plumber, designed for production workloads. It uses the httpuv2 event loop and supports native integration with mirai for concurrent execution.</p>
<p><strong>mirai 2.6.1 with 3 daemon workers.</strong> mirai ("future" in Japanese) provides persistent R worker processes. Instead of spawning a new <code>callr::r_bg()</code> process per job, mirai maintains 3 pre-warmed daemon workers that share the HADES package load. Each daemon is a separate R process with ~3GB memory footprint (R heap + JVM heap).</p>
<p><strong>s6-overlay for process supervision.</strong> The legacy container ran bare <code>Rscript</code> as PID 1. If the process crashed, Docker's restart policy would recreate the container — a 60-second cold start including HADES package loading. With s6-overlay, PID 1 is a proper init system. If the Plumber process crashes, s6 restarts it <em>inside the same container</em> in seconds. The JVM stays warm. The JDBC driver stays loaded.</p>
<p>The new architecture:</p>
<div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">┌──────────────────────────────────────────────────────────────┐</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  Docker: parthenon-darkstar (s6-overlay as PID 1)            │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                              │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  ┌────────────────────────────────────────────────────────┐  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │  s6-overlay (init system, signal handling, supervision) │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │    └─ plumber2 event loop (non-blocking)               │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │         ├─ /health      → instant (deep validation)    │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │         ├─ /estimation  → dispatched to mirai daemon   │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │         ├─ /prediction  → dispatched to mirai daemon   │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │         └─ /sccs        → dispatched to mirai daemon   │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │                                                        │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │  mirai daemon pool:                                    │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │    ├─ daemon 1: [IDLE]     ← ready for work            │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │    ├─ daemon 2: [RUNNING CohortMethod, 12min elapsed]  │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  │    └─ daemon 3: [IDLE]     ← ready for work            │  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  └────────────────────────────────────────────────────────┘  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│                                                              │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  Memory: 32GB limit (~3GB per daemon + 3GB event loop)       │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  JDBC: socketTimeout=300s, connectTimeout=30s, tcpKeepAlive  │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  JVM: G1GC, -Xmx8g, MaxGCPauseMillis=200ms                 │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  Health: 30s interval, deep validation, degraded state       │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">│  Crash recovery: s6 auto-restart, exit code/signal logging   │</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">└──────────────────────────────────────────────────────────────┘</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>The migration required rewriting all 12 endpoint files from Plumber v1 syntax to Plumber2's router API. The core analysis logic — the HADES function calls, the SQL generation, the result transformation — remained untouched. Only the HTTP layer changed.</p>
<p>The Dockerfile went from a simple <code>install.packages("plumber")</code> to a multi-stage, 7-layer build:</p>
<table><thead><tr><th>Layer</th><th>Contents</th><th>Purpose</th></tr></thead><tbody><tr><td>1</td><td>plumber2, mirai, rJava, duckdb</td><td>Native compilation (Rust toolchain for plumber2's waysign dependency)</td></tr><tr><td>2</td><td>DatabaseConnector, SqlRender, Andromeda</td><td>OHDSI connectivity</td></tr><tr><td>3</td><td>Cyclops, FeatureExtraction</td><td>Analytics core</td></tr><tr><td>4</td><td>CohortMethod, PLP, SCCS, EvidenceSynthesis</td><td>HADES analysis packages</td></tr><tr><td>5</td><td>DeepPatientLevelPrediction</td><td>Deep learning (optional)</td></tr><tr><td>6</td><td>CohortDiagnostics, CohortGenerator</td><td>Cohort tools</td></tr><tr><td>7</td><td>Strategus</td><td>Study orchestration</td></tr></tbody></table>
<p>Each layer is cached independently. A code change in the R API files only rebuilds the final application stage — a 30-second rebuild instead of the 45-minute full HADES compilation.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="phase-6-namespace-warmup-march-19">Phase 6: Namespace Warmup (March 19)<a href="http://localhost:8082/docs/blog/rise-of-darkstar#phase-6-namespace-warmup-march-19" class="hash-link" aria-label="Direct link to Phase 6: Namespace Warmup (March 19)" title="Direct link to Phase 6: Namespace Warmup (March 19)">​</a></h2>
<p>One last optimization. Cold start time was ~60 seconds because R lazy-loads package namespaces on first use. The first health check after boot would take 8 seconds instead of 118ms because it triggered CohortMethod compilation.</p>
<p>I added a build-time warmup step that forces all HADES packages to compile their bytecode during <code>docker build</code>:</p>
<div class="language-dockerfile codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-dockerfile codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">RUN Rscript -e " \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  suppressMessages({ \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(rJava); .jinit(); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(DatabaseConnector); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(CohortMethod); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(PatientLevelPrediction); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(SelfControlledCaseSeries); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">    library(EvidenceSynthesis); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">  }); \</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">"</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>This moved the compilation cost from runtime to build time. Cold start dropped from 60 seconds to ~40 seconds.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-benchmark">The Benchmark<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-benchmark" class="hash-link" aria-label="Direct link to The Benchmark" title="Direct link to The Benchmark">​</a></h2>
<p>On March 19, I ran the legacy container (Plumber v1, pre-hardening commit <code>c76884236</code>) and Darkstar side by side against the same OMOP CDM database, executing the same CohortMethod estimation spec.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="health-probe-responsiveness-during-analysis">Health Probe Responsiveness During Analysis<a href="http://localhost:8082/docs/blog/rise-of-darkstar#health-probe-responsiveness-during-analysis" class="hash-link" aria-label="Direct link to Health Probe Responsiveness During Analysis" title="Direct link to Health Probe Responsiveness During Analysis">​</a></h3>
<p>Both containers ran a 2-minute analysis. I probed <code>/health</code> every 5 seconds during execution.</p>
<table><thead><tr><th>Metric</th><th>Legacy</th><th>Darkstar</th><th>Change</th></tr></thead><tbody><tr><td>Health probes OK</td><td>13/24 (54%)</td><td><strong>17/24 (71%)</strong></td><td><strong>+31%</strong></td></tr><tr><td>Probes blocked</td><td>11/24 (46%)</td><td><strong>7/24 (29%)</strong></td><td><strong>36% fewer</strong></td></tr><tr><td>Max consecutive blocked</td><td>11 (55s dark)</td><td><strong>7 (35s)</strong></td><td><strong>20s faster recovery</strong></td></tr></tbody></table>
<p>The legacy container went completely dark for 55 seconds straight — nearly a minute where no request of any kind could be served. Darkstar recovered responsiveness 20 seconds sooner. The remaining 35-second blocking window happens during the synchronous JDBC connection establishment and initial SQL burst, which locks the R process handling the request. Once CohortMethod transitions to its computation phase, the plumber2 event loop regains control and health probes resume.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="execution-performance">Execution Performance<a href="http://localhost:8082/docs/blog/rise-of-darkstar#execution-performance" class="hash-link" aria-label="Direct link to Execution Performance" title="Direct link to Execution Performance">​</a></h3>
<p>Both containers ran the identical pipeline: data extraction, covariate building, propensity score fitting. Both hit the same clinical error at the same point (high covariate-treatment correlation — a study design issue, not a container issue).</p>
<table><thead><tr><th>Metric</th><th>Legacy</th><th>Darkstar</th><th>Change</th></tr></thead><tbody><tr><td><strong>R execution time</strong></td><td>102.8s</td><td><strong>66.3s</strong></td><td><strong>35% faster</strong></td></tr><tr><td>Wall time</td><td>168s</td><td>159s</td><td>5% faster</td></tr></tbody></table>
<p>The 35% speedup comes from three sources: G1GC reducing GC pause overhead, namespace warmup eliminating first-request compilation, and the larger JVM heap reducing garbage collection frequency.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="cold-start">Cold Start<a href="http://localhost:8082/docs/blog/rise-of-darkstar#cold-start" class="hash-link" aria-label="Direct link to Cold Start" title="Direct link to Cold Start">​</a></h3>
<table><thead><tr><th>Container</th><th>Cold Start</th></tr></thead><tbody><tr><td>Legacy</td><td>2s</td></tr><tr><td>Darkstar</td><td>4s</td></tr></tbody></table>
<p>Darkstar is 2 seconds slower due to s6-overlay init and mirai daemon startup. This is a one-time cost at container creation — an acceptable tradeoff for crash recovery and process supervision.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-bugs-we-found-along-the-way">The Bugs We Found Along the Way<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-bugs-we-found-along-the-way" class="hash-link" aria-label="Direct link to The Bugs We Found Along the Way" title="Direct link to The Bugs We Found Along the Way">​</a></h2>
<p>Building Darkstar wasn't just an infrastructure project. Running real HADES analyses against real clinical data surfaced bugs that would have been invisible in a test environment.</p>
<p><strong>1. Silent covariate exclusion bypass.</strong> <code>CohortMethod::createCovariateSettings(excludedConceptIds = c(1234))</code> was being silently ignored because we were passing the IDs in the wrong argument position. Patients were getting propensity scores contaminated by the exposure concept.</p>
<p><strong>2. CohortMethod v6 API break.</strong> Between v5 and v6, every function switched from positional arguments to <code>Args</code> objects: <code>createPs(cohortMethodData, population)</code> became <code>createPs(cohortMethodData, population, createPsArgs = createCreatePsArgs())</code>. Every endpoint needed updating.</p>
<p><strong>3. jsonlite auto-simplification.</strong> R's <code>jsonlite::toJSON(simplifyVector = TRUE)</code> was converting single-element arrays into scalar values. A cohort with one patient would serialize as <code>"person_id": 42</code> instead of <code>"person_id": [42]</code>. Laravel's JSON decoder would then treat it as an integer instead of an array, breaking downstream processing.</p>
<p><strong>4. PLP non-serializable objects.</strong> PatientLevelPrediction returns S3 objects with custom print methods, environment closures, and external pointers that <code>jsonlite</code> can't serialize. We had to write custom extraction functions to pull the numeric results out of the PLP result objects.</p>
<p><strong>5. SCCS anchor normalization.</strong> The SCCS package expects <code>era_start</code> as an anchor value, but our frontend sent <code>era start</code> (no underscore). R silently accepted the invalid anchor and computed results with a different reference point.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="production-validation">Production Validation<a href="http://localhost:8082/docs/blog/rise-of-darkstar#production-validation" class="hash-link" aria-label="Direct link to Production Validation" title="Direct link to Production Validation">​</a></h2>
<p>Between March 7 and March 20, Darkstar processed <strong>5 original research studies</strong> with 37 cohort definitions and 29 analysis configurations against a million-patient OMOP CDM:</p>
<ul>
<li><strong>CKD Progression Study</strong> — ACEi vs CCB comparative effectiveness on renal outcomes. 73K propensity-score matched pairs. HR=0.989. 9-14 minute execution.</li>
<li><strong>Post-MI Secondary Prevention</strong> — Aspirin vs Clopidogrel on recurrent MACE. Stratified Cox regression with 12 negative control outcomes.</li>
<li><strong>Prediabetes Metformin Study</strong> — Metformin vs watchful waiting on T2DM progression. PS-stratified Cox with 8 outcome definitions.</li>
<li><strong>Statin Primary vs Secondary Prevention</strong> — IHD vs no-IHD composite MACE risk in statin users.</li>
<li><strong>Hypertension vs Metabolic Syndrome</strong> — Multi-cohort MACE risk comparison with PS stratification.</li>
</ul>
<p>Every analysis completed successfully. Every result was clinically plausible. Every execution was tracked through the Jobs page with live progress bars.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-name">The Name<a href="http://localhost:8082/docs/blog/rise-of-darkstar#the-name" class="hash-link" aria-label="Direct link to The Name" title="Direct link to The Name">​</a></h2>
<p>On March 20, 2026, we renamed <code>parthenon-r</code> to <code>parthenon-darkstar</code>. The old name was descriptive — "R runtime." The new name reflects what it became: a hardened, production-grade engine that runs in the background, processes the heaviest workloads in the stack, and never asks for attention.</p>
<p>Sixteen files changed. Zero breaking changes. The HADES packages don't know. The OMOP CDM doesn't know. The researchers don't know. They just see their analyses finish faster and more reliably than before.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-darkstar-is">What Darkstar Is<a href="http://localhost:8082/docs/blog/rise-of-darkstar#what-darkstar-is" class="hash-link" aria-label="Direct link to What Darkstar Is" title="Direct link to What Darkstar Is">​</a></h2>
<table><thead><tr><th>Capability</th><th>Legacy</th><th>Darkstar</th></tr></thead><tbody><tr><td>Concurrent requests</td><td>1 (everything queues)</td><td>3 mirai daemons + event loop</td></tr><tr><td>Health monitoring</td><td>10-min interval, trivial check</td><td>30s interval, deep validation</td></tr><tr><td>Process supervision</td><td>None (bare Rscript as PID 1)</td><td>s6-overlay auto-restart</td></tr><tr><td>JDBC resilience</td><td>No timeouts</td><td>300s socket, 30s connect, TCP keepalive</td></tr><tr><td>Crash recovery</td><td>Docker restart → 60s cold start</td><td>s6 in-container restart → seconds</td></tr><tr><td>GC strategy</td><td>Default JVM GC (2-5s pauses)</td><td>G1GC with 200ms pause target</td></tr><tr><td>Memory management</td><td>Default R limits</td><td>24GB vector heap, 8GB JVM</td></tr><tr><td>Cold start</td><td>60s</td><td>40s (namespace warmup)</td></tr><tr><td>R execution speed</td><td>Baseline</td><td>35% faster</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/rise-of-darkstar#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<p>Darkstar currently runs HADES analyses that were originally designed for batch execution on a single workstation. The next frontier is <strong>volcano plots for CodeWAS</strong> — running per-concept logistic regressions across thousands of OMOP concepts to generate effect estimates and p-values for phenome-wide association studies.</p>
<p>The infrastructure is ready. CohortMethod already produces hazard ratios and p-values per outcome. The mirai daemon pool can handle the concurrent workload. The async job registry can track thousands of sub-analyses. The D3 visualization layer in the frontend has forest plot patterns ready to extend.</p>
<p>Darkstar was built to handle exactly this kind of workload: computationally expensive, highly parallelizable, and too important to fail silently.</p>
<hr>
<p><em>Darkstar is open source as part of Parthenon. The container definition, plumber2 API, and s6-overlay configuration are in <code>docker/r/</code> and <code>r-runtime/</code> in the <a href="https://github.com/sudoshi/Parthenon" target="_blank" rel="noopener noreferrer">Parthenon repository</a>.</em></p>]]></content:encoded>
            <category>development</category>
            <category>darkstar</category>
            <category>r-runtime</category>
            <category>infrastructure</category>
            <category>plumber2</category>
            <category>mirai</category>
            <category>hades</category>
            <category>ohdsi</category>
            <category>docker</category>
            <category>devops</category>
            <category>architecture</category>
        </item>
        <item>
            <title><![CDATA[Workbench Launcher and the Single-Database Migration Plan]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-20</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-20</guid>
            <pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A big architectural day on Parthenon: we shipped the new Workbench launcher experience and drafted the formal plan to collapse our multi-database mess into a single, schema-isolated parthenon database. A noticeably cleaner codebase on the other side.]]></description>
            <content:encoded><![CDATA[<p>A big architectural day on Parthenon: we shipped the new <strong>Workbench launcher experience</strong> and drafted the formal plan to collapse our multi-database mess into a single, schema-isolated <code>parthenon</code> database. A noticeably cleaner codebase on the other side.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="workbench-from-dropdown-to-launcher">Workbench: From Dropdown to Launcher<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#workbench-from-dropdown-to-launcher" class="hash-link" aria-label="Direct link to Workbench: From Dropdown to Launcher" title="Direct link to Workbench: From Dropdown to Launcher">​</a></h2>
<p>The old FinnGen entry point was awkward — a toolset dropdown buried inside the FinnGen UI that never made much sense to new users. Today we replaced it with a proper <strong>Workbench launcher</strong>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="what-changed">What changed<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#what-changed" class="hash-link" aria-label="Direct link to What changed" title="Direct link to What changed">​</a></h3>
<p><strong>Routing was restructured cleanly:</strong></p>
<ul>
<li><code>/workbench</code> → new <code>WorkbenchLauncherPage</code> (the hub)</li>
<li><code>/workbench/finngen</code> → FinnGen tool (previously at <code>/workbench</code>)</li>
</ul>
<p>This gives us a scalable pattern. Every future tool gets its own sub-route under <code>/workbench/*</code>, and the launcher is the natural entry point rather than an afterthought.</p>
<p><strong><code>WorkbenchLauncherPage</code></strong> renders a responsive toolset grid. Each tile is a <code>ToolsetCard</code> component that displays the tool's name, description, status badge (e.g., <em>Active</em>, <em>Beta</em>, <em>Coming Soon</em>), and an accent glow tied to the tool's color identity. The cards pull from a central <strong>toolset registry</strong> — a typed <code>ToolsetDescriptor</code> array that will be the single source of truth as we add more tools. Adding a new tool to the Workbench is now a one-liner in the registry.</p>
<p><strong>Sidebar navigation</strong> was updated to always show the Workbench link regardless of context. Previously it only appeared when you were already inside a Workbench tool, which made discoverability poor. Workbench is now a first-class citizen in the nav.</p>
<p><strong>Inside FinnGen</strong>, the toolset dropdown has been replaced with a simple back-to-Workbench link. The UI is significantly less cluttered.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="studies-phase-b-integration-test-complete">Studies: Phase B Integration Test Complete<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#studies-phase-b-integration-test-complete" class="hash-link" aria-label="Direct link to Studies: Phase B Integration Test Complete" title="Direct link to Studies: Phase B Integration Test Complete">​</a></h2>
<p>The studies module passed its Phase B integration checkpoint today. Seven cohorts were generated and their counts were verified against data exploration results. This is exactly the kind of manual validation checkpoint that catches discrepancies between the cohort generation pipeline and what's actually in the CDM — worth calling out explicitly in the log. Phase B passing means we're on track for the next milestone.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-single-database-migration-plan">The Single-Database Migration Plan<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#the-single-database-migration-plan" class="hash-link" aria-label="Direct link to The Single-Database Migration Plan" title="Direct link to The Single-Database Migration Plan">​</a></h2>
<p>This is the most consequential architectural decision documented today, even though the implementation work starts tomorrow. The plan lives in <code>single-database-migration-plan.md</code> and the core idea is straightforward: <strong>one database, multiple schemas, full schema isolation</strong>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-it-solves">The problem it solves<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#the-problem-it-solves" class="hash-link" aria-label="Direct link to The problem it solves" title="Direct link to The problem it solves">​</a></h3>
<p>Parthenon currently ships with two physical databases (<code>ohdsi</code> and a secondary) and seven named Laravel connections, with search paths scattered across fifteen-plus <code>.env</code> variables. This configuration has caused repeated data-loss incidents in the past — usually because an environment was misconfigured and a query landed in the wrong schema. It's also a documentation and onboarding nightmare.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-target-architecture">The target architecture<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#the-target-architecture" class="hash-link" aria-label="Direct link to The target architecture" title="Direct link to The target architecture">​</a></h3>
<p>Everything moves into a single <code>parthenon</code> database with schemas doing the isolation work:</p>
<table><thead><tr><th>Schema</th><th>Purpose</th></tr></thead><tbody><tr><td><code>app</code></td><td>Users, roles, cohorts, sources, studies, analyses</td></tr><tr><td><code>omop</code></td><td>CDM + vocabulary (standard OHDSI layout)</td></tr><tr><td><code>results</code></td><td>Achilles / DQD output</td></tr><tr><td><code>gis</code></td><td>Geospatial tables</td></tr><tr><td><code>eunomia</code></td><td>Demo dataset</td></tr><tr><td><code>eunomia_results</code></td><td>Demo Achilles results</td></tr><tr><td><code>public</code></td><td>Laravel internals (migrations, jobs, cache)</td></tr></tbody></table>
<p>Seven connections collapse to five, and <strong>all five point at the same database</strong>. The <code>cdm</code> and <code>vocab</code> connections merge into <code>omop</code>. The <code>docker_pg</code> connection goes away entirely.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-env-simplification">The .env simplification<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#the-env-simplification" class="hash-link" aria-label="Direct link to The .env simplification" title="Direct link to The .env simplification">​</a></h3>
<p>Before: 15+ database variables, some redundant, some subtly wrong across environments. After:</p>
<div class="language-env codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(220, 13%, 18%);--prism-color:hsl(220, 14%, 71%)"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-env codeBlock_bY9V thin-scrollbar" style="background-color:hsl(220, 13%, 18%);color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">DB_HOST=pgsql.acumenus.net</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">DB_PORT=5432</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">DB_DATABASE=parthenon</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">DB_USERNAME=smudoshi</span><br></span><span class="token-line" style="color:hsl(220, 14%, 71%);text-shadow:0 1px rgba(0, 0, 0, 0.3)"><span class="token plain">DB_PASSWORD=acumenus</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div>
<p>Search paths are hardcoded in <code>database.php</code> because they're structural — a given connection always hits the same schema regardless of environment. The only thing that legitimately varies per environment is the host, credentials, and database name. Separating structural config from environmental config is the right call here and will prevent an entire class of misconfiguration bugs.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-20#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<ul>
<li><strong>Single-database migration implementation</strong> — update <code>database.php</code>, migrate <code>.env</code> templates, write the schema consolidation migrations, and test against the full connection matrix.</li>
<li><strong>Workbench registry expansion</strong> — now that the <code>ToolsetDescriptor</code> pattern is in place, start populating the registry with upcoming tools currently in planning.</li>
<li><strong>Studies Phase C</strong> — Phase B is green, Phase C begins.</li>
<li><strong>FinnGen UX polish</strong> — the back-to-Workbench link is functional but the transition animation needs work.</li>
</ul>
<p>Today felt like a good cleanup-and-foundation day. The Workbench has real bones, and we have a credible plan for a database architecture that won't bite us anymore.</p>]]></content:encoded>
            <category>development</category>
            <category>ohdsi</category>
            <category>analytics</category>
            <category>frontend</category>
            <category>backend</category>
            <category>infrastructure</category>
            <category>database</category>
        </item>
        <item>
            <title><![CDATA[Evidence Investigation Goes Full-Stack: FinnGen Retirement, Multi-Dataset Morpheus, and the Road to Volcano Plots]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-21</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-21</guid>
            <pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A massive 116-commit push today centered almost entirely on maturing the Evidence Investigation workbench — from retiring the old FinnGen UI to hardening the investigation experience with proper navigation, KPI metrics, URL-synced state, and ARIA accessibility. We also landed multi-dataset support in Morpheus and set the stage for one of the most requested features on the roadmap: volcano plots powered by the newly-renamed Darkstar R runtime.]]></description>
            <content:encoded><![CDATA[<p>A massive 116-commit push today centered almost entirely on maturing the <strong>Evidence Investigation</strong> workbench — from retiring the old FinnGen UI to hardening the investigation experience with proper navigation, KPI metrics, URL-synced state, and ARIA accessibility. We also landed multi-dataset support in Morpheus and set the stage for one of the most requested features on the roadmap: volcano plots powered by the newly-renamed Darkstar R runtime.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="finngen-workbench-retirement">FinnGen Workbench Retirement<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#finngen-workbench-retirement" class="hash-link" aria-label="Direct link to FinnGen Workbench Retirement" title="Direct link to FinnGen Workbench Retirement">​</a></h2>
<p>The legacy FinnGen workbench has been officially decommissioned. The dedicated FinnGen card on the workbench landing page (<code>c41f7afbc</code>) now launches <strong>Evidence Investigation</strong> instead, and the old FinnGen workbench code has been removed entirely (<code>a667f94ca</code>). This isn't just a cleanup — it's a consolidation of intent. Evidence Investigation is the unified surface for exploring GWAS signals, phenotype associations, and concept-level evidence, and FinnGen data fits naturally within that framing rather than deserving its own siloed experience.</p>
<p>If you're working on workbench routing, note that the card rewiring lives in the workbench feature directory and the deprecated component has been fully pruned, so there's no dead code to worry about.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="evidence-investigation-a-day-of-hardening">Evidence Investigation: A Day of Hardening<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#evidence-investigation-a-day-of-hardening" class="hash-link" aria-label="Direct link to Evidence Investigation: A Day of Hardening" title="Direct link to Evidence Investigation: A Day of Hardening">​</a></h2>
<p>The bulk of today's commits were focused on making Evidence Investigation feel like a production-grade tool rather than a prototype. Here's what changed:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="navigation--layout-ab7530fe4">Navigation &amp; Layout (<code>ab7530fe4</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#navigation--layout-ab7530fe4" class="hash-link" aria-label="Direct link to navigation--layout-ab7530fe4" title="Direct link to navigation--layout-ab7530fe4">​</a></h3>
<p>The investigation view now has a proper <strong>top bar</strong> with a title, breadcrumb trail, and back navigation. This sounds small but it's critical UX — users were getting lost when drilling into sub-views with no clear path back to the workbench. The breadcrumbs also provide context for where a particular evidence thread lives within the broader investigation.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="kpi-metrics--contextcard-0b2a2185c">KPI Metrics &amp; ContextCard (<code>0b2a2185c</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#kpi-metrics--contextcard-0b2a2185c" class="hash-link" aria-label="Direct link to kpi-metrics--contextcard-0b2a2185c" title="Direct link to kpi-metrics--contextcard-0b2a2185c">​</a></h3>
<p>The <code>ContextCard</code> component was significantly enhanced to surface <strong>KPI metrics</strong> — high-level summary statistics that orient the analyst before they dive into domain-level evidence. URL-synced sub-tabs were also wired in here, meaning deep links into specific sub-views now work correctly and browser history behaves as expected.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="url-synced-domain-sidebar-states--error-handling-fcf5c919c">URL-Synced Domain, Sidebar States &amp; Error Handling (<code>fcf5c919c</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#url-synced-domain-sidebar-states--error-handling-fcf5c919c" class="hash-link" aria-label="Direct link to url-synced-domain-sidebar-states--error-handling-fcf5c919c" title="Direct link to url-synced-domain-sidebar-states--error-handling-fcf5c919c">​</a></h3>
<p>Domain selection is now reflected in the URL, so sharing a link to "I'm looking at the Drug domain for concept X" actually works. Sidebar loading states were added to prevent the jarring empty-panel flash during data fetches, and execute error handling ensures analysts see a meaningful message rather than a silent failure when a backend query goes wrong.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="leftrail-aria--responsive-layout-6b3b25811">LeftRail, ARIA &amp; Responsive Layout (<code>6b3b25811</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#leftrail-aria--responsive-layout-6b3b25811" class="hash-link" aria-label="Direct link to leftrail-aria--responsive-layout-6b3b25811" title="Direct link to leftrail-aria--responsive-layout-6b3b25811">​</a></h3>
<p>The <code>LeftRail</code> component received attention on three fronts: clickable counts (so analysts can click a domain count to navigate directly to it), a sidebar badge showing active evidence pins, and a full pass of ARIA roles for screen reader compatibility. Responsive layout fixes round this out — the investigation view now holds together on narrower viewports.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="gwas-catalog-endpoints--evidencepinservice-7514f14e6-d1e310592">GWAS Catalog Endpoints &amp; EvidencePinService (<code>7514f14e6</code>, <code>d1e310592</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#gwas-catalog-endpoints--evidencepinservice-7514f14e6-d1e310592" class="hash-link" aria-label="Direct link to gwas-catalog-endpoints--evidencepinservice-7514f14e6-d1e310592" title="Direct link to gwas-catalog-endpoints--evidencepinservice-7514f14e6-d1e310592">​</a></h3>
<p>Two targeted fixes corrected the GWAS Catalog API calls to use the proper <code>findByDiseaseTrait</code> and <code>findByGene</code> endpoints, and <code>EvidencePinService</code> was updated to correctly thread <code>concept_ids</code> and <code>gene_symbols</code> through to those calls. These were silent failures before — the UI looked fine but no GWAS data was actually being fetched.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="morpheus-multi-dataset-support-f86ec2342">Morpheus: Multi-Dataset Support (<code>f86ec2342</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#morpheus-multi-dataset-support-f86ec2342" class="hash-link" aria-label="Direct link to morpheus-multi-dataset-support-f86ec2342" title="Direct link to morpheus-multi-dataset-support-f86ec2342">​</a></h2>
<p>Morpheus gained a <strong>dataset selector</strong>, parameterized queries, and a registry table today. Previously, Morpheus queries ran against a single implicit dataset — a significant limitation for any platform claiming to be multi-CDM. The dataset selector allows analysts to choose which CDM they're querying against, the queries are now parameterized accordingly, and the registry table tracks which datasets have been analyzed. This is foundational infrastructure for the cross-CDM comparison workflows that are coming later this quarter.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="postgresql-numeric-type-fix-aa02db2be">PostgreSQL Numeric Type Fix (<code>aa02db2be</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#postgresql-numeric-type-fix-aa02db2be" class="hash-link" aria-label="Direct link to postgresql-numeric-type-fix-aa02db2be" title="Direct link to postgresql-numeric-type-fix-aa02db2be">​</a></h3>
<p>A subtle but painful bug: <code>durationHours</code> was coming back from PostgreSQL as a string-typed numeric, causing downstream arithmetic to silently produce <code>NaN</code>. Wrapping it in <code>Number()</code> is a one-line fix, but finding it required actually debugging a Morpheus duration calculation that was returning nonsense values. Worth noting for anyone writing queries against PostgreSQL columns that <em>look</em> like numbers but arrive as strings in certain ORM/driver configurations.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="on-the-horizon-volcano-plots-via-darkstar">On the Horizon: Volcano Plots via Darkstar<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#on-the-horizon-volcano-plots-via-darkstar" class="hash-link" aria-label="Direct link to On the Horizon: Volcano Plots via Darkstar" title="Direct link to On the Horizon: Volcano Plots via Darkstar">​</a></h2>
<p>Today's work laid groundwork documented in <code>volcano-plot-darkstar-handoff.md</code> for what's coming next. The <code>CodeWASResults.tsx</code> component currently renders a placeholder where an interactive volcano plot will live. The blocker hasn't been the visualization layer — it's been the data. The current CodeWAS backend only returns <code>{label, count}</code> aggregate signals with no per-concept statistical significance data.</p>
<p>That changes with <strong>Darkstar</strong>. The R runtime container (recently renamed from <code>parthenon-r</code> to <code>parthenon-darkstar</code>, service name <code>darkstar</code> in docker-compose) already computes per-outcome <code>{log_hr, p_value, ci_95_lower, ci_95_upper}</code> via CohortMethod in <code>r-runtime/api/estimation.R</code>. The plumbing to call it from Laravel is straightforward — <code>config('services.r_runtime.url')</code> resolves to <code>http://darkstar:8787</code>. The implementation task is connecting CodeWAS results to a new Darkstar endpoint and rendering the volcano plot with those coordinates.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-21#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<ul>
<li><strong>Volcano plot implementation</strong> — wire <code>CodeWASResults.tsx</code> to Darkstar's estimation endpoint and render a proper interactive <code>log_HR</code> vs <code>-log10(p)</code> scatter plot with significance thresholds</li>
<li><strong>Cross-CDM comparison in Morpheus</strong> — the dataset registry table sets up the UI; the backend aggregation layer needs to follow</li>
<li><strong>Evidence Investigation polish</strong> — the pin/unpin workflow and evidence export are the two remaining rough edges before this can be considered feature-complete</li>
<li><strong>Darkstar endpoint expansion</strong> — <code>PatientLevelPrediction</code> feature importance scores are available in the container but not yet surfaced anywhere in the frontend; a feature importance panel for PLP models is a natural next step</li>
</ul>
<p>Today was a grind in the best sense — lots of small fixes that collectively make Evidence Investigation feel solid enough to hand to a real analyst. The foundation is there. Now we build upward.</p>]]></content:encoded>
            <category>development</category>
            <category>ohdsi</category>
            <category>analytics</category>
            <category>frontend</category>
            <category>backend</category>
            <category>infrastructure</category>
        </item>
        <item>
            <title><![CDATA[Fortifying Parthenon: Codebase Health Audit, E2E Regression Guards, and the StudyAgent Fork]]></title>
            <link>http://localhost:8082/docs/blog/dev-diary-2026-03-19</link>
            <guid>http://localhost:8082/docs/blog/dev-diary-2026-03-19</guid>
            <pubDate>Thu, 19 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A big day on the quality and resilience front: 34 commits landed in Parthenon focused on a comprehensive codebase health audit, a major expansion of our Playwright E2E test suite, and a fork of the StudyAgent submodule. No flashy new features today — instead, we did the unglamorous but essential work of making sure what we've already built actually works, is safe to change, and won't silently break in production.]]></description>
            <content:encoded><![CDATA[<p>A big day on the quality and resilience front: 34 commits landed in Parthenon focused on a comprehensive codebase health audit, a major expansion of our Playwright E2E test suite, and a fork of the StudyAgent submodule. No flashy new features today — instead, we did the unglamorous but essential work of making sure what we've already built actually <em>works</em>, is <em>safe to change</em>, and won't <em>silently break</em> in production.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="full-codebase-health-audit">Full Codebase Health Audit<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#full-codebase-health-audit" class="hash-link" aria-label="Direct link to Full Codebase Health Audit" title="Direct link to Full Codebase Health Audit">​</a></h2>
<p>The day started with a full audit of the Parthenon codebase, documented in <code>e2e-regression-guard-plan.md</code> and the accompanying devlog entry. The audit surfaced several deferred issues across type safety, modal consistency, error handling, and empty state guidance — the kind of paper cuts that accumulate invisibly until they become real user-facing bugs.</p>
<p>Four of those audit items were resolved in a single focused fix commit (<code>f3359b5a5</code>):</p>
<ul>
<li><strong>Type safety gaps</strong> — tightened TypeScript types in areas where <code>any</code> or loose inference had crept in</li>
<li><strong>Modal consistency</strong> — standardized modal open/close behavior across components that had drifted from the shared pattern</li>
<li><strong>Empty state guidance</strong> — added meaningful empty states where components were previously rendering blank space or <code>[object Object]</code> to end users</li>
<li><strong>Error handling deduplication</strong> — Phase 6 of the audit cleanup (<code>5e621c8db</code>) extracted a shared <code>getErrorMessage</code> utility to replace scattered, inconsistent error-to-string coercions throughout the app</li>
</ul>
<p>The <code>getErrorMessage</code> refactor is small but high-leverage: previously, different parts of the codebase handled thrown errors differently (some assumed <code>Error</code> objects, others assumed strings, some did nothing). Centralizing that logic means we get consistent, human-readable error messages everywhere without thinking about it.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="e2e-test-suite-expansion">E2E Test Suite Expansion<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#e2e-test-suite-expansion" class="hash-link" aria-label="Direct link to E2E Test Suite Expansion" title="Direct link to E2E Test Suite Expansion">​</a></h2>
<p>The audit made one thing painfully clear: several production bugs — a crashing FHIR Export page, ingestion jobs rendering as <code>[object Object]</code>, genomics hardcoded <code>sourceId</code> values, gene filter buttons that didn't actually filter — would have been caught immediately by E2E tests. They weren't caught because those E2E tests didn't exist.</p>
<p>We fixed that today with three test commits:</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="smoke-suite-53-tests-8-new-routes-f0c10e804">Smoke Suite: 53 Tests, 8 New Routes (<code>f0c10e804</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#smoke-suite-53-tests-8-new-routes-f0c10e804" class="hash-link" aria-label="Direct link to smoke-suite-53-tests-8-new-routes-f0c10e804" title="Direct link to smoke-suite-53-tests-8-new-routes-f0c10e804">​</a></h3>
<p>The smoke suite now covers 53 routes, up from the previous 29. New routes added include <code>/admin/fhir-export</code> (now tested in its "coming soon" state), <code>/admin/solr</code>, and several others that had been added to the app but never wired into the test harness. Critically, we added <strong><code>[object Object]</code> detection</strong> — every page load now asserts that the rendered text does not contain the literal string <code>[object Object]</code>, catching serialization bugs before they reach users.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="regression-guard-specs-7-specs-for-audit-findings-7c7054683">Regression Guard Specs: 7 Specs for Audit Findings (<code>7c7054683</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#regression-guard-specs-7-specs-for-audit-findings-7c7054683" class="hash-link" aria-label="Direct link to regression-guard-specs-7-specs-for-audit-findings-7c7054683" title="Direct link to regression-guard-specs-7-specs-for-audit-findings-7c7054683">​</a></h3>
<p>Each bug surfaced in the audit now has a dedicated regression guard spec. The spec table maps directly to the audit findings:</p>
<table><thead><tr><th>Bug</th><th>Guard test</th></tr></thead><tbody><tr><td>Ingestion API envelope not unwrapped</td><td>Verify job list renders real text, not <code>[object Object]</code></td></tr><tr><td>Gene buttons don't filter</td><td>Click gene → assert ClinVar input contains gene name</td></tr><tr><td>History loses query metadata</td><td>Generate query → open history → assert explanation is non-empty</td></tr><tr><td>Dashboard rows not keyboard-accessible</td><td>Tab to row → Enter → assert navigation occurred</td></tr></tbody></table>
<p>These aren't happy-path tests — they're specifically designed to catch regressions of known past failures.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="cross-feature-journey-tests-12-tests-across-5-specs-eca464bd4">Cross-Feature Journey Tests: 12 Tests Across 5 Specs (<code>eca464bd4</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#cross-feature-journey-tests-12-tests-across-5-specs-eca464bd4" class="hash-link" aria-label="Direct link to cross-feature-journey-tests-12-tests-across-5-specs-eca464bd4" title="Direct link to cross-feature-journey-tests-12-tests-across-5-specs-eca464bd4">​</a></h3>
<p>Beyond regression guards, we added 5 cross-feature journey specs covering end-to-end user workflows that span multiple modules. These are the tests most likely to catch integration breakage when two independently-working features interact unexpectedly.</p>
<p>One important housekeeping fix also landed here: <code>15e7ea23d</code> ensures that <code>admin@acumenus.net</code> is <strong>never used</strong> in auth E2E tests. Using a shared admin account in parallel test runs is a classic source of flaky, order-dependent failures. Auth tests now use isolated test credentials.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="implementation-plan-documented-f0c10e804">Implementation Plan Documented (<code>f0c10e804</code>)<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#implementation-plan-documented-f0c10e804" class="hash-link" aria-label="Direct link to implementation-plan-documented-f0c10e804" title="Direct link to implementation-plan-documented-f0c10e804">​</a></h3>
<p>The full three-phase E2E regression guard rollout plan is now documented in <code>docs/e2e-regression-guard-plan.md</code>. Phase 1 (fix and baseline existing tests) is largely complete. Phases 2 and 3 — Page Object Model implementation and CI enforcement — are queued for the coming days.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="studyagent-submodule-fork">StudyAgent Submodule Fork<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#studyagent-submodule-fork" class="hash-link" aria-label="Direct link to StudyAgent Submodule Fork" title="Direct link to StudyAgent Submodule Fork">​</a></h2>
<p><code>f4cec79c5</code> forks the <code>study-agent</code> submodule from its upstream source to <code>sudoshi/StudyAgent</code>. This gives us full control over the StudyAgent codebase — we can apply Parthenon-specific changes, pin dependencies, and iterate without waiting on upstream. The fork is now the canonical submodule reference going forward.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-next">What's Next<a href="http://localhost:8082/docs/blog/dev-diary-2026-03-19#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next">​</a></h2>
<ul>
<li><strong>Phase 2 of the E2E plan</strong>: Implement the Page Object Model architecture outlined in <code>E2E_TEST_PLAN.md</code>. The infrastructure is in place; we need to lift the raw selector strings in specs into reusable page objects.</li>
<li><strong>CI enforcement</strong>: Wire the expanded smoke and regression guard suites into the CI pipeline so no PR can merge if E2E tests are red. Today we proved the tests find real bugs — next step is making them mandatory.</li>
<li><strong>StudyAgent integration</strong>: With the fork in place, begin adapting StudyAgent to Parthenon's auth and data model conventions.</li>
<li><strong>Remaining audit items</strong>: The health audit flagged more than the four items fixed today. We'll work through the backlog systematically, using the regression guard framework to ensure fixes stay fixed.</li>
</ul>
<p>Today was a day of paying down technical debt with receipts — every bug we documented got a test, every inconsistency got a fix, and the codebase came out measurably more trustworthy than it started.</p>]]></content:encoded>
            <category>development</category>
            <category>testing</category>
            <category>frontend</category>
            <category>analytics</category>
            <category>ohdsi</category>
        </item>
    </channel>
</rss>