Skip to main content

Data Quality Dashboard

The Data Quality Dashboard (DQD) evaluates the conformance, completeness, and plausibility of your OMOP CDM data against the OHDSI Data Quality Dashboard specification. It runs approximately 3,000 data quality checks and summarizes results by category, CDM table, and concept domain. Data quality assessment is a critical prerequisite before using any CDM for research -- it helps identify ETL errors, source data issues, and potential biases.

DQD Check Categories

The DQD framework organizes checks into three categories, each targeting a different dimension of data quality:

CategoryDescriptionApproximate CountExamples
ConformanceData conforms to CDM structural requirements~1,200Valid concept IDs, correct data types, referential integrity, date ranges within bounds
CompletenessExpected columns and records are populated~800Non-null required fields, expected record counts per domain, observation period coverage
PlausibilityValues are clinically reasonable~1,000Age at death within lifespan, drug duration within expected range, lab values within physiological limits

Each check returns:

  • Pass/Fail status -- based on the configured failure threshold
  • Failure count -- number of records violating the check
  • Failure rate -- percentage of applicable records that fail
  • Violation sample -- up to 10 example records for investigation

The overall DQD score is the weighted percentage of passing checks, giving a single number that summarizes database quality.

Viewing DQD Results

  1. Navigate to Data Explorer and select a Data Source from the dropdown.
  2. Click the Data Quality tab (4th tab).
  3. The summary panel shows:
    • Overall pass rate with a color-coded gauge (green > 90%, yellow 70-90%, red < 70%)
    • Category breakdown -- pass rate for Conformance, Completeness, and Plausibility independently
    • CDM table breakdown -- pass rates grouped by clinical table

Check Detail Table

Below the summary, the full check list displays every DQD check with:

ColumnDescription
Check NameDescriptive name of the quality check
DescriptionWhat the check evaluates
CategoryConformance / Completeness / Plausibility
CDM TableThe clinical table being checked (e.g., condition_occurrence)
CDM ColumnThe specific column (e.g., condition_concept_id)
ThresholdConfigurable failure rate limit (default: 5%)
Failure CountNumber of records failing this check
Failure RatePercentage of applicable records failing
StatusPass / Fail icon

Click any check row to expand the violation detail panel, showing example records and suggested remediation steps.

Filtering and Searching

The filter bar provides multiple dimensions for narrowing the check list:

  • Category -- Conformance / Completeness / Plausibility (toggle buttons)
  • Status -- Passing / Failing / All (dropdown)
  • CDM Table -- select a specific table (e.g., measurement, drug_exposure)
  • Concept Domain -- filter by OMOP domain (Condition, Drug, Measurement, etc.)
  • Search -- free-text search across check names and descriptions
Prioritization

Start by filtering to Failing checks sorted by Failure Count descending. This surfaces the highest-impact data quality issues first. A single check with 100,000 failures is more urgent than 10 checks with 5 failures each.

Achilles Heel Checks

The Achilles Heel tab (5th tab in Data Explorer) shows rule-based quality notifications generated as part of the Achilles analysis run. These are simpler, faster checks compared to the full DQD:

SeverityIconDescriptionExamples
ERRORRed circleCritical data quality issues requiring immediate attentionFuture dates in observation period, negative drug exposure durations, orphaned records
WARNINGYellow trianglePotential issues that may affect analysis validityBirth year after death year, extremely long observation periods, unusual gender distributions
NOTIFICATIONBlue infoInformational items about data characteristicsLow record counts in certain domains, single-value columns, vocabulary coverage gaps

Heel checks are stored in the achilles_heel_results table and update automatically whenever Achilles is re-run. They are significantly faster than a full DQD execution (seconds vs. minutes).

Heel Check Table

The Heel tab displays checks in a sortable, filterable table:

  • Filter by severity (Error / Warning / Notification)
  • Search by message text
  • Sort by record count to find the most prevalent issues
  • Click any check to see the underlying Achilles analysis that triggered it

Running a Full DQD Check

  1. Navigate to Admin > System > DQD Jobs (requires admin role).
  2. Select a Data Source from the dropdown.
  3. Optionally configure:
    • Failure threshold -- percentage above which a check is considered failing (default: 5%)
    • Check categories -- run all categories or select specific ones
    • CDM tables -- restrict checks to specific tables (useful for targeted re-evaluation after ETL fixes)
  4. Click Run DQD.
  5. The DQD executes as a background job via Laravel Horizon. Typical execution times:
    • Small CDM (< 10K patients): 5-15 minutes
    • Medium CDM (10K-1M patients): 15-45 minutes
    • Large CDM (> 1M patients): 30-90 minutes
Failure thresholds

The default failure threshold is 5% -- checks where more than 5% of applicable records fail are marked as "failing." This threshold should be tuned based on your data quality standards:

  • Research networks (OHDSI, PCORnet): Typically use 1-5% thresholds
  • Regulatory submissions: May require 0% tolerance for certain checks
  • Exploratory analysis: 10% may be acceptable for initial data assessment

Adjust thresholds in Admin > System Configuration > Data Quality Settings.

DQD Results History

Parthenon stores historical DQD results for trend analysis. Navigate to Data Quality > History to view:

  • Score trend -- line chart of overall DQD score over time
  • Category trends -- individual trend lines for Conformance, Completeness, and Plausibility
  • Run comparison -- side-by-side diff of two DQD runs to see which checks improved or regressed after an ETL update

This historical view is invaluable for tracking data quality improvement over successive ETL iterations.

Do not ignore failing checks

Failing DQD checks can silently bias research results. For example, if 20% of condition_occurrence records have condition_concept_id = 0 (unmapped), prevalence estimates will be systematically underestimated. Always review and address failing checks before using a CDM for published research.