Skip to main content

Achilles Characterization

The Data Explorer provides pre-computed summary statistics for your OMOP CDM databases powered by OHDSI Achilles. Achilles runs once against a CDM and stores hundreds of aggregate queries in a results schema, enabling fast interactive exploration without running expensive real-time queries against the clinical tables.

What Achilles Contains

Achilles computes approximately 170 analysis queries covering every major clinical domain in the OMOP CDM:

Analysis GroupAnalysis IDsExamples
Person1-10Gender distribution, year of birth, race, ethnicity, gender x year of birth
Observation period101-117Observation years, duration distributions, continuous observation per year
Visit200-220Visit type frequencies, visit duration, visits per year
Condition400-420Condition prevalence, co-occurrence, by age/gender, trends by year
Drug700-720Drug prevalence, era duration, days supply distributions
Measurement1800-1820Lab value distributions, units, value ranges, measurement frequency
Procedure600-620Procedure frequency by type and clinical setting
Death500-510Cause of death distributions, time from observation start to death

Each analysis produces rows in the achilles_results and achilles_results_dist tables, identified by analysis_id, with breakdowns by stratum_1 through stratum_5 for dimensional slicing.

CDM Characterization Dashboard

The Data Explorer Overview tab presents a 6-section CDM characterization dashboard that provides an at-a-glance summary of your database:

Source Selector

A dropdown at the top of the dashboard lets you switch between configured data sources. Only sources with a populated results daimon appear in the selector. Parthenon uses dynamic SET search_path to route queries to the correct results schema for each source, enabling seamless multi-source exploration.

Metric Cards

Four summary cards display key population statistics:

CardMetricSource
Total PatientsUnique person countAnalysis 1
Observation PeriodEarliest to latest date rangeAnalysis 101, 109
Gender SplitMale / Female / Other countsAnalysis 2
Median Follow-upMedian observation period durationAnalysis 105

Gender Distribution Bar

A horizontal bar chart shows the gender distribution across the entire patient population, color-coded by gender concept (Male = blue, Female = pink, Other = gray).

Age-Gender Pyramid

A population pyramid chart displays the age distribution in 5-year bands, split by gender. This visualization immediately reveals the demographic profile of the database -- whether it skews young (Medicaid), elderly (Medicare), or balanced (commercial claims).

Domain Record Counts

A table showing total record counts per clinical domain (conditions, drugs, procedures, measurements, observations, visits, death) with per-patient averages.

Temporal Coverage

A line chart showing the number of patients with active observation by calendar year, revealing enrollment trends and the effective study window.

The Data Explorer has six tabs:

  1. Overview -- CDM characterization dashboard (described above)
  2. Conditions -- condition prevalence treemap and drill-down
  3. Drugs -- drug exposure prevalence and era statistics
  4. Measurements -- lab value distributions with box plots
  5. Data Quality -- Achilles Heel results (see Chapter 19)
  6. Ares -- network-level data observatory for cross-source characterization, quality tracking, and feasibility analysis (see Chapter 21)

Conditions Tab

The Conditions tab displays conditions sorted by prevalence. The default view is a treemap where tile size corresponds to patient count. Select any condition to see:

  • Prevalence -- percentage of patients with at least one occurrence
  • Age and gender distribution -- stratified bar chart
  • Prevalence by year -- temporal trend line
  • Top co-occurring conditions -- conditions most frequently observed in the same patients
  • Source codes -- original ICD/SNOMED codes mapped to this standard concept

Drugs Tab

The Drugs tab shows drug exposures aggregated by RxNorm ingredient. Select any drug to see:

  • Prevalence by ingredient -- percentage of patients exposed
  • Days supply distribution -- histogram of prescription durations
  • Era duration distribution -- how long patients remain on continuous therapy
  • First exposure year trend -- when patients first start the drug over time
  • Dose distribution -- if quantity and days supply are populated

Measurements Tab

The Measurements tab shows LOINC-coded measurement statistics. Select any measurement to see:

  • Value distribution -- histogram with configurable bin width
  • Summary statistics -- median, interquartile range, 5th/95th percentiles, min/max
  • Unit of measure breakdown -- distribution of recording units
  • Gender stratification -- separate distributions for male and female patients
  • Temporal trend -- measurement frequency over calendar years

Multi-Source Achilles

Parthenon supports multiple data sources, each with its own Achilles results. The system handles this via the Source/Daimon pattern:

  1. Each data source has a results daimon pointing to a specific schema (e.g., achilles_results, eunomia_results).
  2. When you select a source in the Data Explorer, the AchillesResultReaderService calls SET search_path on the results connection to the appropriate schema.
  3. This is stateless per-request -- each API call sets the search path independently, so concurrent users exploring different sources do not interfere with each other.
tip

The Achilles result freshness is shown in the dashboard footer as "Results as of [date]" to help users understand data currency. Stale results (older than 30 days) display a warning badge.

Refreshing Achilles Results

Achilles results are static snapshots -- they reflect the state of the CDM when Achilles was last executed. To refresh:

  1. Navigate to Admin > System > Achilles Jobs (requires admin role).
  2. Select a data source and click Run Achilles.
  3. Achilles runs as a background job via Laravel Horizon; monitor progress in the job queue.
Performance considerations

Running Achilles on a large CDM (millions of patients) can take 30-120 minutes and generates significant database load. Schedule Achilles runs during off-peak hours and avoid running multiple sources concurrently. For the bundled Eunomia demo dataset (~2,700 patients), Achilles completes in under 30 seconds using the SQL-based mini-Achilles built into Parthenon.