Skip to main content

Schema Mapping

Schema mapping is the process of aligning columns in your uploaded source files to the corresponding columns in OMOP CDM v5.4 tables. After mapping, Parthenon generates and executes SQL that transforms your staged data into properly structured CDM tables. This is the structural transformation step -- vocabulary-level concept mapping is handled separately in Chapter 17.

Overview

OMOP CDM v5.4 defines approximately 40 clinical and standardized tables with precise column specifications, data types, and foreign key constraints. Schema mapping creates a translation layer between your source schema (which may be claims, EHR extract, registry data, or any tabular format) and the target CDM schema. This step handles four categories of transformation:

TransformationExample
Column name alignmentPATIENT_ID maps to person_id
Data type coercionString dates ("2025-01-15") cast to PostgreSQL DATE
Static value assignmentHardcode condition_type_concept_id = 32817 for "EHR" origin
Expression transformsCONCAT(year, '-', month, '-01') computes drug_exposure_start_date

Opening the Mapping Editor

  1. Navigate to Data Ingestion > Upload History.
  2. Click on an upload batch with status Staged (upload completed successfully).
  3. Click Create Schema Mapping to open the visual mapping editor.

If a mapping already exists for this batch, click Edit Mapping to modify it.

Mapping Interface

The mapping editor is divided into three panels:

Source Panel (Left)

Displays your source table columns with:

  • Column name -- as detected from the file header
  • Inferred type -- the PostgreSQL type detected during parsing (text, integer, numeric, date, etc.)
  • Sample values -- first 5 non-null values from the staging table for reference
  • Null rate -- percentage of rows where this column is null

Target Panel (Right)

Displays the selected OMOP CDM table and its columns:

  • Column name -- official CDM column name
  • Data type -- CDM-specified PostgreSQL type
  • Required -- columns marked with a red asterisk are mandatory
  • Description -- brief description from the CDM specification

Mapping Rows (Center)

Drag a source column onto a target column to create a mapping. Connected pairs are shown as lines between the panels. Each mapping row supports:

  • Direct mapping -- source column value is used as-is (with automatic type casting)
  • Expression -- custom SQL expression using one or more source columns
  • Static value -- a constant value applied to all rows
  • Lookup -- reference a lookup table for value translation

Target Table Selection

Click Select Target Table to choose which OMOP CDM table this source file maps to. Tables are organized by domain for easy navigation:

Source ContentTarget CDM Table
Diagnoses / ICD codescondition_occurrence
Drug dispensings / prescriptionsdrug_exposure
Lab resultsmeasurement
Patient demographicsperson
Visit / encounter recordsvisit_occurrence
Procedures / CPT codesprocedure_occurrence
Clinical observationsobservation
Death recordsdeath
Cost / charge datacost
Care site informationcare_site
Provider informationprovider
Multiple tables from one file

If your source file contains data that maps to multiple CDM tables (e.g., a flat file with both diagnosis and procedure columns), create separate mappings for each target table. Each mapping can select different columns from the same source file.

Required Columns

Required columns (marked with a red asterisk) must be mapped before the transformation can execute. Common required columns include:

  • *_id -- primary key (auto-generated if not mapped)
  • person_id -- patient identifier (must match existing person table records)
  • *_concept_id -- standard concept ID (populated during concept mapping)
  • *_start_date -- event start date
  • *_type_concept_id -- provenance concept (e.g., 32817 = "EHR", 32810 = "Claim")

Optional columns left unmapped are populated with NULL or a configurable default value.

Expression Transforms

For complex column transformations, click the Expression icon (fx) on any mapping row to enter a SQL expression. Expressions have access to all source columns prefixed with source.:

-- Date parsing from non-standard format
CAST(TO_DATE(source.visit_date, 'MM/DD/YYYY') AS DATE)

-- Concatenating year/month into a date
MAKE_DATE(source.year::int, source.month::int, 1)

-- Conditional logic
CASE
WHEN source.gender = 'M' THEN 8507
WHEN source.gender = 'F' THEN 8532
ELSE 0
END

-- String cleanup
TRIM(UPPER(source.drug_name))
Expression testing

Click Test Expression to evaluate your expression against the first 10 rows of staging data and preview the output before saving. This catches syntax errors and type mismatches early.

Saving and Executing the Mapping

Save

Click Save Mapping to persist your column mapping configuration. Saved mappings can be edited, duplicated, or exported at any time. Saving does not transform any data.

Execute

Click Execute Transform to run the mapping SQL against the staging data and write the results into the target CDM schema. The execution page shows:

  • Row counts written to each CDM table
  • Constraint violations (e.g., invalid foreign key references, type casting failures)
  • Skipped rows with reasons (null required fields, duplicate keys)
  • Execution time and throughput (rows/second)

Execution is transactional -- if a critical error occurs, the entire batch is rolled back and no partial data is written.

Mapping Templates

Once you create a mapping for a data source, save it as a Mapping Template for reuse:

  1. After saving a mapping, click Save as Template.
  2. Enter a template name (e.g., "Epic Clarity Claims Extract v3.2").
  3. Future uploads from the same source system can apply the template, requiring only minor adjustments for schema changes.

Templates are shared across your organization -- any user with the researcher role or higher can apply an existing template.

Schema drift

If your source system changes column names or adds new columns between exports, the template will flag mismatches. Review the Template Diff panel before executing to ensure all required columns still map correctly.

Validation Rules

Parthenon applies automatic validation during execution:

RuleDescriptionAction on Failure
Type coercionSource values must cast to target typeRow skipped, logged
NOT NULLRequired CDM columns cannot be nullRow skipped, logged
Foreign keyperson_id must exist in person tableRow skipped, logged
Date orderingStart date must precede or equal end dateWarning logged
Concept ID rangeConcept IDs must be non-negative integersRow skipped, logged

Review the validation report after execution to identify data quality issues that need correction in the source data.