Parthenon v1.0.5 — Data Quality & Validation
v1.0.5 — Data Quality & Validation
v1.0.5 is the second stabilization release in the v1.0.x arc. With test infrastructure in place from v1.0.4, this release focuses on data integrity across the platform — programmatic audits that verify correctness of SQL generation, schema routing, vocabulary resolution, FHIR transformation, migration safety, and cross-schema referential integrity.
Why data quality matters
Parthenon queries OMOP CDM data across 5 sources, each in its own PostgreSQL
schema but sharing a single vocab schema for vocabulary. Every SQL template,
every DQD check, every concept set resolution must correctly substitute the
right schema name — a single hardcoded omop. in a template breaks silently
for SynPUF, IRSF, Pancreas, and Eunomia. v1.0.5 adds programmatic guards
that catch these issues automatically.
Achilles & DQD audit
- 128 Achilles SQL templates audited — every analysis verified for correct
{@cdmSchema},{@resultsSchema}, and{@vocabSchema}placeholder usage. No vocabulary tables using{@cdmSchema}, no hardcoded schema names, no unresolved placeholders. Zero violations found; test serves as regression guard. - 170+ DQD checks validated across all 5 CDM sources — each check's
sqlTotal()andsqlViolated()verified for correct schema substitution with Acumenus (omop/vocab), SynPUF (synpuf/vocab), IRSF (irsf/vocab), Pancreas (pancreas/vocab), and Eunomia (eunomia/eunomia). 4,770 assertions. - Results schema routing validated — confirmed each source resolves to a
distinct results schema (results, synpuf_results, irsf_results,
pancreas_results, eunomia_results) with no collisions, and that
SET search_pathsucceeds for each.
Vocabulary validation
- Solr index completeness command (
solr:validate-vocabulary) — compares Solrvocabularycore document count againstvocab.conceptstandard concepts, with spot-check sampling. Reports coverage % and exits non-zero if below 95%. - Concept set resolution schema audit — verified
resolveToSql()generates correctvocab.concept_ancestorandvocab.concept_relationshipreferences, uses singular OMOP table names, and correctly substituteseunomiaschema for the Eunomia demo source.
Ingestion & ETL validation
- Row count verification infrastructure — validated that
PostLoadValidationService,ValidationResult, andIngestionJobhave the correct method signatures, column schemas, and relationship wiring for end-to-end row count tracking through the pipeline. - FHIR-to-CDM transformation fidelity — 31 tests covering Patient (gender mapping to OMOP concept IDs, birth date parsing, US Core race/ethnicity extensions), Condition (SNOMED/ICD-10-CM mapping, onset/abatement dates), MedicationRequest (RxNorm mapping), Observation (category-based routing to measurement vs observation), and code system URI resolution.
Database integrity
- 242 migrations audited for idempotency — verified all have both
up()anddown()methods, no unsafeDROP TABLEwithoutIF EXISTSin rollback, no$guarded = []HIGHSEC violations. 3 advisorydropIfExistswarnings inup()(all intentional cleanup migrations). - Cross-schema FK integrity validated — live queries against localhost PG17 verifying person.gender_concept_id, condition_concept_id, measurement_concept_id, and visit_occurrence.person_id all resolve to valid vocab.concept or person records. Finding: orphan drug_concept_ids in the 40213xxx range (SynPUF vocabulary version mismatch) — flagged as warning, investigation pending.
- OMOP CDM CHECK constraints migration — adds 24 database-level CHECK constraints across 4 CDM schemas (omop, synpuf, irsf, pancreas) enforcing required fields: person gender/year_of_birth, visit/condition/drug start dates, and observation_period date ordering. Idempotent via DO/EXCEPTION.
OMOP Extension Bridge validation
- 1,715 imaging + 47 genomics records validated — read-only count verification of the OMOP extension bridge (image_occurrence, specimen, genomic_test, variant_occurrence, variant_annotation) and all app-layer xref tables. 10 Pest smoke tests for bridge model queryability.
By the numbers
- New test files: 11
- New tests: 68
- New assertions: 4,916
- Achilles analyses audited: 128
- DQD checks validated: 170+
- CDM sources cross-validated: 5
- Migrations audited: 242 (now 243)
- CHECK constraints added: 24
Data quality finding
The cross-schema FK audit discovered orphan drug_concept_id values in the
40213xxx range within omop.drug_exposure. These are SynPUF-era concept IDs
that don't exist in the current vocab.concept table — a vocabulary version
mismatch. This is flagged as a warning and will be resolved in a future
vocabulary re-index or concept remapping pass.
Upgrade notes
One new migration: 2026_04_11_000001_add_omop_cdm_check_constraints.php.
Run php artisan migrate to apply the CHECK constraints. The migration is
idempotent — safe to re-run.
New Artisan command: php artisan solr:validate-vocabulary for operational
Solr index validation.
All other changes are test files — no API changes, no frontend changes, no breaking changes.
Contributors
Claude Code + @sudoshi