Schema Mapping
Schema mapping is the process of aligning columns in your uploaded source files to the corresponding columns in OMOP CDM v5.4 tables. After mapping, Parthenon generates and executes SQL that transforms your staged data into properly structured CDM tables. This is the structural transformation step -- vocabulary-level concept mapping is handled separately in Chapter 17.
Overview
OMOP CDM v5.4 defines approximately 40 clinical and standardized tables with precise column specifications, data types, and foreign key constraints. Schema mapping creates a translation layer between your source schema (which may be claims, EHR extract, registry data, or any tabular format) and the target CDM schema. This step handles four categories of transformation:
| Transformation | Example |
|---|---|
| Column name alignment | PATIENT_ID maps to person_id |
| Data type coercion | String dates ("2025-01-15") cast to PostgreSQL DATE |
| Static value assignment | Hardcode condition_type_concept_id = 32817 for "EHR" origin |
| Expression transforms | CONCAT(year, '-', month, '-01') computes drug_exposure_start_date |
Opening the Mapping Editor
- Navigate to Data Ingestion > Upload History.
- Click on an upload batch with status Staged (upload completed successfully).
- Click Create Schema Mapping to open the visual mapping editor.
If a mapping already exists for this batch, click Edit Mapping to modify it.
Mapping Interface
The mapping editor is divided into three panels:
Source Panel (Left)
Displays your source table columns with:
- Column name -- as detected from the file header
- Inferred type -- the PostgreSQL type detected during parsing (text, integer, numeric, date, etc.)
- Sample values -- first 5 non-null values from the staging table for reference
- Null rate -- percentage of rows where this column is null
Target Panel (Right)
Displays the selected OMOP CDM table and its columns:
- Column name -- official CDM column name
- Data type -- CDM-specified PostgreSQL type
- Required -- columns marked with a red asterisk are mandatory
- Description -- brief description from the CDM specification
Mapping Rows (Center)
Drag a source column onto a target column to create a mapping. Connected pairs are shown as lines between the panels. Each mapping row supports:
- Direct mapping -- source column value is used as-is (with automatic type casting)
- Expression -- custom SQL expression using one or more source columns
- Static value -- a constant value applied to all rows
- Lookup -- reference a lookup table for value translation
Target Table Selection
Click Select Target Table to choose which OMOP CDM table this source file maps to. Tables are organized by domain for easy navigation:
| Source Content | Target CDM Table |
|---|---|
| Diagnoses / ICD codes | condition_occurrence |
| Drug dispensings / prescriptions | drug_exposure |
| Lab results | measurement |
| Patient demographics | person |
| Visit / encounter records | visit_occurrence |
| Procedures / CPT codes | procedure_occurrence |
| Clinical observations | observation |
| Death records | death |
| Cost / charge data | cost |
| Care site information | care_site |
| Provider information | provider |
If your source file contains data that maps to multiple CDM tables (e.g., a flat file with both diagnosis and procedure columns), create separate mappings for each target table. Each mapping can select different columns from the same source file.
Required Columns
Required columns (marked with a red asterisk) must be mapped before the transformation can execute. Common required columns include:
*_id-- primary key (auto-generated if not mapped)person_id-- patient identifier (must match existingpersontable records)*_concept_id-- standard concept ID (populated during concept mapping)*_start_date-- event start date*_type_concept_id-- provenance concept (e.g., 32817 = "EHR", 32810 = "Claim")
Optional columns left unmapped are populated with NULL or a configurable default value.
Expression Transforms
For complex column transformations, click the Expression icon (fx) on any mapping row to enter a SQL expression. Expressions have access to all source columns prefixed with source.:
-- Date parsing from non-standard format
CAST(TO_DATE(source.visit_date, 'MM/DD/YYYY') AS DATE)
-- Concatenating year/month into a date
MAKE_DATE(source.year::int, source.month::int, 1)
-- Conditional logic
CASE
WHEN source.gender = 'M' THEN 8507
WHEN source.gender = 'F' THEN 8532
ELSE 0
END
-- String cleanup
TRIM(UPPER(source.drug_name))
Click Test Expression to evaluate your expression against the first 10 rows of staging data and preview the output before saving. This catches syntax errors and type mismatches early.
Saving and Executing the Mapping
Save
Click Save Mapping to persist your column mapping configuration. Saved mappings can be edited, duplicated, or exported at any time. Saving does not transform any data.
Execute
Click Execute Transform to run the mapping SQL against the staging data and write the results into the target CDM schema. The execution page shows:
- Row counts written to each CDM table
- Constraint violations (e.g., invalid foreign key references, type casting failures)
- Skipped rows with reasons (null required fields, duplicate keys)
- Execution time and throughput (rows/second)
Execution is transactional -- if a critical error occurs, the entire batch is rolled back and no partial data is written.
Mapping Templates
Once you create a mapping for a data source, save it as a Mapping Template for reuse:
- After saving a mapping, click Save as Template.
- Enter a template name (e.g., "Epic Clarity Claims Extract v3.2").
- Future uploads from the same source system can apply the template, requiring only minor adjustments for schema changes.
Templates are shared across your organization -- any user with the researcher role or higher can apply an existing template.
If your source system changes column names or adds new columns between exports, the template will flag mismatches. Review the Template Diff panel before executing to ensure all required columns still map correctly.
Validation Rules
Parthenon applies automatic validation during execution:
| Rule | Description | Action on Failure |
|---|---|---|
| Type coercion | Source values must cast to target type | Row skipped, logged |
| NOT NULL | Required CDM columns cannot be null | Row skipped, logged |
| Foreign key | person_id must exist in person table | Row skipped, logged |
| Date ordering | Start date must precede or equal end date | Warning logged |
| Concept ID range | Concept IDs must be non-negative integers | Row skipped, logged |
Review the validation report after execution to identify data quality issues that need correction in the source data.