Achilles Characterization
The Data Explorer provides pre-computed summary statistics for your OMOP CDM databases powered by OHDSI Achilles. Achilles runs once against a CDM and stores hundreds of aggregate queries in a results schema, enabling fast interactive exploration without running expensive real-time queries against the clinical tables.
What Achilles Contains
Achilles computes approximately 170 analysis queries covering every major clinical domain in the OMOP CDM:
| Analysis Group | Analysis IDs | Examples |
|---|---|---|
| Person | 1-10 | Gender distribution, year of birth, race, ethnicity, gender x year of birth |
| Observation period | 101-117 | Observation years, duration distributions, continuous observation per year |
| Visit | 200-220 | Visit type frequencies, visit duration, visits per year |
| Condition | 400-420 | Condition prevalence, co-occurrence, by age/gender, trends by year |
| Drug | 700-720 | Drug prevalence, era duration, days supply distributions |
| Measurement | 1800-1820 | Lab value distributions, units, value ranges, measurement frequency |
| Procedure | 600-620 | Procedure frequency by type and clinical setting |
| Death | 500-510 | Cause of death distributions, time from observation start to death |
Each analysis produces rows in the achilles_results and achilles_results_dist tables, identified by analysis_id, with breakdowns by stratum_1 through stratum_5 for dimensional slicing.
CDM Characterization Dashboard
The Data Explorer Overview tab presents a 6-section CDM characterization dashboard that provides an at-a-glance summary of your database:
Source Selector
A dropdown at the top of the dashboard lets you switch between configured data sources. Only sources with a populated results daimon appear in the selector. Parthenon uses dynamic SET search_path to route queries to the correct results schema for each source, enabling seamless multi-source exploration.
Metric Cards
Four summary cards display key population statistics:
| Card | Metric | Source |
|---|---|---|
| Total Patients | Unique person count | Analysis 1 |
| Observation Period | Earliest to latest date range | Analysis 101, 109 |
| Gender Split | Male / Female / Other counts | Analysis 2 |
| Median Follow-up | Median observation period duration | Analysis 105 |
Gender Distribution Bar
A horizontal bar chart shows the gender distribution across the entire patient population, color-coded by gender concept (Male = blue, Female = pink, Other = gray).
Age-Gender Pyramid
A population pyramid chart displays the age distribution in 5-year bands, split by gender. This visualization immediately reveals the demographic profile of the database -- whether it skews young (Medicaid), elderly (Medicare), or balanced (commercial claims).
Domain Record Counts
A table showing total record counts per clinical domain (conditions, drugs, procedures, measurements, observations, visits, death) with per-patient averages.
Temporal Coverage
A line chart showing the number of patients with active observation by calendar year, revealing enrollment trends and the effective study window.
Navigating Data Explorer Tabs
The Data Explorer has six tabs:
- Overview -- CDM characterization dashboard (described above)
- Conditions -- condition prevalence treemap and drill-down
- Drugs -- drug exposure prevalence and era statistics
- Measurements -- lab value distributions with box plots
- Data Quality -- Achilles Heel results (see Chapter 19)
- Ares -- network-level data observatory for cross-source characterization, quality tracking, and feasibility analysis (see Chapter 21)
Conditions Tab
The Conditions tab displays conditions sorted by prevalence. The default view is a treemap where tile size corresponds to patient count. Select any condition to see:
- Prevalence -- percentage of patients with at least one occurrence
- Age and gender distribution -- stratified bar chart
- Prevalence by year -- temporal trend line
- Top co-occurring conditions -- conditions most frequently observed in the same patients
- Source codes -- original ICD/SNOMED codes mapped to this standard concept
Drugs Tab
The Drugs tab shows drug exposures aggregated by RxNorm ingredient. Select any drug to see:
- Prevalence by ingredient -- percentage of patients exposed
- Days supply distribution -- histogram of prescription durations
- Era duration distribution -- how long patients remain on continuous therapy
- First exposure year trend -- when patients first start the drug over time
- Dose distribution -- if quantity and days supply are populated
Measurements Tab
The Measurements tab shows LOINC-coded measurement statistics. Select any measurement to see:
- Value distribution -- histogram with configurable bin width
- Summary statistics -- median, interquartile range, 5th/95th percentiles, min/max
- Unit of measure breakdown -- distribution of recording units
- Gender stratification -- separate distributions for male and female patients
- Temporal trend -- measurement frequency over calendar years
Multi-Source Achilles
Parthenon supports multiple data sources, each with its own Achilles results. The system handles this via the Source/Daimon pattern:
- Each data source has a
resultsdaimon pointing to a specific schema (e.g.,achilles_results,eunomia_results). - When you select a source in the Data Explorer, the
AchillesResultReaderServicecallsSET search_pathon the results connection to the appropriate schema. - This is stateless per-request -- each API call sets the search path independently, so concurrent users exploring different sources do not interfere with each other.
The Achilles result freshness is shown in the dashboard footer as "Results as of [date]" to help users understand data currency. Stale results (older than 30 days) display a warning badge.
Refreshing Achilles Results
Achilles results are static snapshots -- they reflect the state of the CDM when Achilles was last executed. To refresh:
- Navigate to Admin > System > Achilles Jobs (requires admin role).
- Select a data source and click Run Achilles.
- Achilles runs as a background job via Laravel Horizon; monitor progress in the job queue.
Running Achilles on a large CDM (millions of patients) can take 30-120 minutes and generates significant database load. Schedule Achilles runs during off-peak hours and avoid running multiple sources concurrently. For the bundled Eunomia demo dataset (~2,700 patients), Achilles completes in under 30 seconds using the SQL-based mini-Achilles built into Parthenon.