Population Statistics

The Population Statistics section in Data Explorer provides high-level demographic and temporal summaries of the patient population in a data source. These summaries are essential for quickly assessing database coverage, demographic composition, and suitability for a specific research question before investing time in full cohort definition and generation.

Overview Panel

The Overview panel consolidates key database metrics into a single summary view:

Statistic	Description	Source
Total Patients	Unique `person_id` count in the CDM	`person` table
Date Range	Earliest to latest observation period start/end dates	`observation_period` table
Observation Years	Total person-years of follow-up across all patients	Sum of observation period durations
Median Follow-up	Median observation period duration in days	Achilles analysis 105
Vocabulary Version	Name and date of the loaded OMOP vocabulary	`vocabulary` table metadata
CDM Version	Detected OMOP CDM schema version	`cdm_source` table

Metric interpretation

Total person-years is a better measure of database "size" than patient count alone. A database with 100,000 patients and 10 years of follow-up (1M person-years) is more useful for longitudinal studies than one with 500,000 patients and 6 months of follow-up (250K person-years).

Age and Gender Distribution

A population pyramid chart displays the age-gender distribution in 5-year age bands (0-4, 5-9, ..., 85+). This visualization provides immediate insight into the demographic profile:

Medicare databases -- pyramid heavily weighted toward 65+ age bands
Medicaid databases -- bimodal distribution (children and young adults)
Commercial claims -- concentrated in working-age adults (18-64)
EHR databases -- distribution reflects the patient population of the health system

Controls

Toggle counts/percentages -- switch between absolute counts and relative percentages
Hover tooltips -- exact count for each age-gender cell
Export -- download the chart as PNG or the underlying data as CSV

Observation Period Timeline

A time-series line chart shows the number of patients with active observation in each calendar month or quarter. This chart reveals critical information about your database:

Enrollment patterns -- gradual growth (expanding health system), seasonal drops (academic medical centers), or sharp changes (insurance plan changes)
Coverage gaps -- months with anomalously low patient counts may indicate data capture failures or ETL issues
Effective study window -- the years with stable, consistent coverage suitable for epidemiological analysis
Data currency -- how recent the data extends, and whether the latest months show complete capture

Study period selection

When designing a study, use the timeline chart to identify the period with the most stable coverage. Avoid using the first and last 6 months of data -- enrollment ramp-up and data lag often create incomplete capture at the boundaries.

Domain Coverage

The domain coverage table shows record counts and patient coverage across all major OMOP CDM clinical domains:

Domain	Record Count	Patient Count	% of Population	Avg Records per Patient
`condition_occurrence`	--	--	--	--
`drug_exposure`	--	--	--	--
`measurement`	--	--	--	--
`procedure_occurrence`	--	--	--	--
`visit_occurrence`	--	--	--	--
`observation`	--	--	--	--
`death`	--	--	--	--
`device_exposure`	--	--	--	--

Interpreting Domain Coverage

Low patient coverage in a domain relative to the overall population size may indicate incomplete data capture. For example, if only 10% of patients have measurement records in an EHR database, lab data may not be routinely captured.
Zero records in a domain means the ETL did not populate that table -- it does not necessarily mean the source system lacks that data.
High records-per-patient in measurement (often 50-200+) is normal for EHR databases with lab data; claims databases typically have much lower measurement density.

Multi-Source Comparison

When multiple data sources are configured, the Compare Sources view provides side-by-side population statistics for cross-database assessment:

Comparison Table

A multi-column table aligning key metrics across selected sources:

Metric	Source A	Source B	Source C
Total patients	--	--	--
Date range	--	--	--
Person-years	--	--	--
% Female	--	--	--
Median age	--	--	--
Condition records	--	--	--
Drug records	--	--	--

Use Cases for Comparison

Source selection -- identify which database has the longest follow-up for your target population
Multi-site studies -- compare demographic distributions across sites to assess representativeness
Data completeness -- compare domain coverage to identify which sources have the richest data for specific domains
Feasibility assessment -- quickly determine whether sufficient patients exist across sources for a multi-database study

Overlay Charts

Toggle the Overlay view to superimpose age-gender pyramids and timeline charts from multiple sources on a single plot, with each source shown in a different color.

Research feasibility

Use Population Statistics as your first stop when evaluating feasibility for a new study. Before building cohort definitions, verify that:

The database covers your intended study period
Sufficient patients exist in the age/gender strata of interest
The relevant clinical domains (conditions, drugs, procedures, labs) have adequate coverage
Observation period durations are long enough for your required follow-up time

This 5-minute feasibility check can save hours of cohort development work on an unsuitable database.

Population Analytics Tab

The Population Analytics tab extends basic statistics with advanced demographic analysis:

Age distribution histogram -- continuous distribution with kernel density estimate
Enrollment duration distribution -- how long patients typically remain in the database
Year-over-year growth -- patient acquisition and attrition rates
Geographic distribution -- if location data is available (state, region)
Payer mix -- distribution of insurance types (if captured in the CDM)

These analytics draw from both Achilles pre-computed results and real-time queries against the CDM tables, providing a comprehensive demographic profile of each data source.

Overview Panel​

Age and Gender Distribution​

Controls​

Observation Period Timeline​

Domain Coverage​

Interpreting Domain Coverage​

Multi-Source Comparison​

Comparison Table​

Use Cases for Comparison​

Overlay Charts​

Population Analytics Tab​