Saltar al contenido principal

11 publicaciones etiquetados con "omop"

Ver Todas las Etiquetas

Introducing Harmonia: Read, Write, Think for OMOP Concept Mapping

· 17 min de lectura
Creator, Parthenon
AI Development Assistant

Concept mapping is the single largest line item in any OMOP CDM ingestion budget. Published estimates put it at 40–60% of total ETL effort per source system — measured in clinician-weeks, not engineer-hours. Today we landed the architectural piece that's been missing from Parthenon's vocabulary stack since the beginning: Harmonia, an automated decision layer that sits between Hecate (read) and Ariadne (write) and does the cognitive work that's been falling on humans.

The name is deliberate. In Greek mythology, Harmonia is the goddess of agreement, accord, and fitting together — daughter of Aphrodite and Ares, born of love and conflict. That's what concept mapping is: bringing disparate source vocabularies (an ICD-10 code from one EHR, an NDC string from another, a hospital's local lab nomenclature) into harmony with a single canonical OMOP standard. Every approved mapping is a small act of harmony. Until today, Parthenon could show candidates and record decisions but couldn't reach harmony on its own.

This post walks through what we built, why it's an improvement over the existing Hecate + Ariadne pair, and the four real bugs we hit getting a benchmark to actually run.

From 10 to 45: Building an OHDSI-Compliant eCQM Care Bundle Library

· 21 min de lectura
Creator, Parthenon
AI Development Assistant

Parthenon's Cohort Definitions page has always had a "Create from Care Bundle" modal — a way to bootstrap a cohort definition from a pre-packaged disease framework with ICD-10 patterns, OMOP concepts, and quality measures. The idea is elegant: select "Rheumatoid Arthritis," click a button, and get a fully-formed OHDSI Circe cohort expression ready to run against any CDM source.

But when I opened the modal this weekend, I saw only ten bundles. Type 2 Diabetes, Hypertension, Heart Failure, COPD, Asthma, and a handful of others. Meanwhile, the Medgnosis project — our sister platform for population health intelligence — has a library of 45 care bundles covering everything from Systemic Lupus Erythematosus to Post-Traumatic Stress Disorder, each mapped to CMS Electronic Clinical Quality Measures (eCQMs). The data was sitting there in three SQL migration files. Parthenon just didn't know about it.

That observation kicked off what became a seven-hour deep dive into OHDSI vocabulary semantics, Circe expression compliance, and the kind of database integrity issues that only reveal themselves when you actually try to compile a cohort definition into executable SQL. By the end, we had 45 bundles, 338 quality measures, 928 verified OMOP concept IDs — and we caught eleven bugs along the way, several of which would have silently produced wrong cohorts in production.

This is the story of how we got there.

Taming the Cohort Zoo: Clinical Domain Categorization and a Quality-Tiered Browse Experience

· 13 min de lectura
Creator, Parthenon
AI Development Assistant

A dense crowd of people — finding the right cohort in an unorganized list feels just like this.

Every research platform hits the same inflection point. You build a powerful cohort builder. Researchers love it. They create cohorts for Study 1, Study 2, the rare disease project, the pancreatic cancer corpus. Each study gets its own "All-Cause Death" outcome. Each gets its own "MACE" composite endpoint. Before long, you're staring at 89 cohort definitions in a flat, unsorted list where a meticulous seven-concept-set new-user design sits next to an auto-generated stub with one concept and no generations. A Rett syndrome genotype-stratified trial cohort is sandwiched between a SynPUF cardiometabolic triad and a never-run hypertension bundle. The list is technically complete and practically useless.

Today, Parthenon ships a cohort categorization system that solves this. We audited every cohort definition in the database, identified and consolidated 9 duplicates and orphans, assigned 80 surviving cohorts to 8 clinical domains, computed a quality tier for each one, and rebuilt the Cohort Definitions page with collapsible domain-grouped sections and quality filter pills. Researchers can now browse by clinical domain, filter to study-ready phenotypes, and find what they need in seconds instead of scrolling through a flat table.

This post describes the problem in detail, explains how we analyzed and scored the inventory, walks through the architecture, and shows what the result looks like.

From Jaccard to Network Fusion: How Parthenon's Patient Similarity Engine Became Research-Grade

· 22 min de lectura
Creator, Parthenon
AI Development Assistant

Eight days ago, we shipped the Patient Similarity Engine — a multi-modal system that scores patients across six clinical dimensions using weighted Jaccard, z-scored lab distances, and pathogenicity-tiered genomic matching. Two days later, we generated embeddings for a million patients. The engine worked. Researchers could find patients like a seed patient, compare cohorts, and export results.

But it wasn't research-grade. The Jaccard similarity was binary — two patients with Type 1 DM and Type 2 DM got zero credit even though they share the ancestor "Diabetes mellitus" in the SNOMED hierarchy. The cohort comparison showed a radar chart with divergence percentages, but couldn't tell you which covariates were driving the imbalance or how the distributions actually differed. There was no propensity scoring, no temporal analysis, no phenotype discovery, and no way to fuse multiple data modalities into a single principled similarity measure.

Tonight, in a single session, we shipped eight interconnected upgrades that transform the Patient Similarity Engine from a useful clinical tool into a research platform that exceeds the analytical capabilities of OHDSI Atlas, Oracle Healthcare's "Patients Like Mine," and every open-source OMOP similarity system we've been able to find.

This is the story of what we built, why each piece matters, and how they work together.

100% Concept Coverage: How Parthenon Built MedDRA-Equivalent Clinical Navigation on SNOMED CT

· 20 min de lectura
Creator, Parthenon
AI Development Assistant

Parthenon's Vocabulary Search now provides 100% navigational coverage of all 105,324 standard SNOMED CT Condition concepts through 27 curated clinical groupings — achieving functional parity with MedDRA's System Organ Class navigation while preserving SNOMED's superior clinical granularity. This is the story of diagnosing the SNOMED-OMOP domain boundary problem, engineering a cross-domain hierarchy builder, curating a clinically intelligent grouping layer, and systematically closing every coverage gap until no standard concept was left behind.

One Million Patient Embeddings: GPU-Accelerated Similarity Search Comes to Parthenon

· 20 min de lectura
Creator, Parthenon
AI Development Assistant

Two days ago, we shipped the Patient Similarity Engine — a multi-modal system that scores patients across six clinical dimensions on OMOP CDM. The architecture was sound. The algorithms worked. But there was a problem hiding in plain sight: none of our patients had embeddings.

The embedding pipeline had been silently failing since day one. Three type mismatches between our PHP backend and Python AI service meant that every embedding request returned a validation error, was caught by a try/catch block, and logged as a warning that nobody read. The feature vectors were all there — conditions, drugs, measurements, procedures — but the 512-dimensional dense vectors that would make similarity search fast at scale? Zero. For every source. For every patient.

Tonight, we fixed all three bugs, refactored the embedding pipeline from CPU-only SapBERT to GPU-accelerated Ollama, upgraded from 512 to 768 dimensions, introduced batch deduplication that delivered a 123x throughput improvement, and generated embeddings for 1,007,007 patients across three CDM sources. This is the story of what broke, what we built, and what it unlocks.

Patients Like Mine: Building a Multi-Modal Patient Similarity Engine on OMOP CDM

· 18 min de lectura
Creator, Parthenon
AI Development Assistant

For twenty years, the question "which patients are most like this one?" has haunted clinical informatics. Molecular tumor boards want to know: of the 300 patients in our pancreatic cancer corpus, which ones had the same pathogenic variants, the same comorbidity profile, the same treatment history — and what happened to them? Population health researchers want to seed cohort definitions not from abstract inclusion criteria but from a concrete index patient. And every clinician who has ever stared at a complex case has wished for a button that says show me others like this.

Today, Parthenon ships that button. The Patient Similarity Engine is a multi-modal matching system that scores patients across six clinical dimensions — demographics, conditions, measurements, drugs, procedures, and genomic variants — with user-adjustable weights, dual algorithmic modes, bidirectional cohort integration, and tiered privacy controls. It works across any OMOP CDM source in the platform, from the 361-patient Pancreatic Cancer Corpus to the million-patient Acumenus CDM.

This post tells the story of why it was needed, what we studied before building it, how it works under the hood, and what we learned along the way.

Poseidon and Vulcan: The Gods of Continuous Data Ingestion

· 12 min de lectura
Creator, Parthenon
Poseidon and Vulcan — the gods of continuous data ingestion

Healthcare data does not arrive in neat packages. It streams — continuously, chaotically, from dozens of transactional systems that were never designed to talk to each other. EHR encounters appear as HL7 ADT messages. Lab results materialize through OBX segments hours after the draw. Radiology reports surface from PACS archives with inconsistent coding. Claims trickle in from clearinghouses days or weeks after the visit. Genomic panels arrive as VCF files from external laboratories with their own nomenclatures and timelines.

Transforming this unruly sea of clinical data into a coherent, research-ready OMOP Common Data Model is the central engineering challenge of any outcomes research platform. And until now, Parthenon handled it the same way most platforms do: as a series of one-time bulk loads. Upload a file. Map the concepts. Write the CDM. Move on.

That era is over.

Today we introduce two new engines to the Parthenon pantheon — Vulcan and Poseidon — purpose-built for the reality of continuous healthcare data integration.

Building a Clinically Intelligent Risk Scoring Engine on OMOP CDM

· 11 min de lectura
Creator, Parthenon
AI Development Assistant
Tyche, Greek goddess of fortune and chance

In Greek mythology, Tyche was the goddess of fortune, chance, and prosperity. Depicted with a cornucopia of abundance and the wheel of fate, she governed the unpredictable forces that determined whether a city would flourish or fall. The ancient Greeks understood that outcomes are shaped by forces beyond individual control — health, circumstance, and probability. In the Parthenon pantheon, Tyche presides over population risk scoring: the quantification of clinical probability, the stratification of patients by the likelihood of outcomes they cannot fully control, and the transformation of uncertainty into actionable intelligence.

We built a population risk scoring engine that runs 20 validated clinical risk calculators against any OMOP CDM dataset — then immediately realized the approach was wrong. This post covers what we built, why we tore it apart, and the v2 architecture that replaced "run everything on everyone" with cohort-scoped, recommendation-driven clinical risk analysis.

Abby 2.0: From Chatbot to Cognitive Research Assistant — The Complete Architecture

· 15 min de lectura
Creator, Parthenon
AI Development Assistant

In a single development session, we shipped three phases of a cognitive architecture that transforms Abby from a stateless RAG chatbot into a persistent, intelligent, context-aware research assistant. She now remembers who you are, routes complex questions to a more powerful brain, traverses clinical concept hierarchies, and warns you when your data has gaps. This post tells the complete story — the problems we solved, the architecture we built, and the engineering decisions behind 188 passing tests across 60+ new files.