Skip to main content

Concept Mapping

Concept mapping (also called code mapping or vocabulary mapping) translates source codes in your data -- ICD-10-CM diagnosis codes, NDC drug codes, LOINC lab codes, CPT procedure codes, and other terminologies -- into OMOP standard concept IDs. This step is essential for making your CDM data queryable using the standardized OMOP vocabulary and enables cross-database analyses across institutions using different source coding systems.

Why Concept Mapping Matters

Source data typically uses local or domain-specific terminologies that vary by institution, vendor, and country:

Source TerminologyDomainExample CodeExample Meaning
ICD-10-CMConditionsE11.9Type 2 diabetes without complications
NDCDrugs00002-7510-01Metformin 500mg tablet
CPT-4Procedures99213Office visit, established patient
LOINCMeasurements2339-0Glucose [Mass/volume] in Blood
SNOMED CTObservations73211009Diabetes mellitus

OMOP CDM stores the original source codes in *_source_value and *_source_concept_id columns for full traceability, but all analytical queries use standardized concept IDs in the *_concept_id columns:

  • SNOMED CT for conditions
  • RxNorm for drugs
  • LOINC for measurements
  • SNOMED CT for procedures (or CPT4 mapped to SNOMED)

The concept mapping step populates these standardized concept ID columns correctly, enabling your data to participate in network studies and standardized analytics.

Automatic Mapping

Parthenon leverages the OMOP vocabulary's concept_relationship table to automatically map source codes to standard concepts:

  1. Navigate to the mapping results for your upload batch (after schema mapping is complete).
  2. Click the Concept Mapping tab.
  3. Click Auto-Map to start the automatic mapping process.
  4. Parthenon queries the vocabulary for Maps to relationships from each source code.
  5. The system categorizes results:
    • Mapped -- single unambiguous standard concept found; auto-applied
    • Review -- multiple possible standard concepts found; requires manual selection
    • Unmapped -- no Maps to relationship exists in the vocabulary
Auto-mapping coverage

For well-coded claims data using standard terminologies (ICD-10-CM, NDC, CPT-4), auto-mapping typically achieves 85-95% coverage. EHR data with local codes or free-text descriptions will have lower auto-mapping rates and require more manual review.

Manual Mapping Review

The mapping review table provides a comprehensive view of all source codes and their mapping status:

ColumnDescription
Source ValueThe raw code from your data (e.g., E11.9)
Source VocabularyDetected vocabulary (ICD10CM, NDC, CPT4, LOINC, HCPCS, etc.)
FrequencyNumber of rows in your data containing this source code
Auto-Mapped ToStandard concept ID and name from auto-mapping (if found)
ConfidenceMapping confidence score (High / Medium / Low)
StatusMapped / Unmapped / Review Needed

Resolving Unmapped Codes

For codes with no automatic mapping or those flagged for review:

  1. Click the search icon next to the unmapped code to open the concept search dialog.
  2. Search by code, name, or keyword in the OMOP vocabulary.
  3. Review candidate standard concepts with their domain, vocabulary, and validity dates.
  4. Click Accept to apply the mapping.
  5. Optionally check Apply to all to map all occurrences of this source code across all batches.

Bulk Review Workflow

For large mapping efforts, use the bulk review tools:

  • Sort by frequency -- map high-frequency codes first for maximum impact
  • Filter by status -- show only unmapped or review-needed codes
  • Filter by domain -- focus on one clinical domain at a time
  • Batch accept -- accept all high-confidence auto-mappings in one click

Unmappable Codes

Some source codes cannot be mapped to a standard OMOP concept. Common reasons:

  • Custom local codes -- institution-specific codes not in any standard vocabulary
  • Vocabulary gaps -- the OMOP vocabulary does not yet include this terminology
  • Malformed codes -- truncated, invalid, or data entry errors
  • Deprecated codes -- codes that have been retired from their source vocabulary

For unmappable codes, set the concept ID to 0 -- the OMOP convention meaning "no standard concept available." These records will still appear in *_source_value columns and can be queried directly when needed. The source concept ID (*_source_concept_id) should still be populated if the source code exists in the vocabulary, even without a standard mapping.

Concept ID 0 impact

Records mapped to concept ID 0 are excluded from most standard OHDSI analyses (incidence rates, characterization, cohort definitions) because these analyses filter on standard concepts. Track your concept 0 rate as a key data quality metric -- high rates indicate vocabulary gaps that may bias your analyses.

Exporting and Importing Mapping Tables

Concept mapping tables can be exported and shared across Parthenon instances:

Export

  1. Navigate to the completed concept mapping for a batch.
  2. Click Export Mappings to download a CSV file containing:
    • Source vocabulary, source code, source name
    • Target standard concept ID, concept name, domain
    • Mapping status and reviewer notes

Import

  1. Navigate to Data Ingestion > Concept Mappings.
  2. Click Import Mapping File.
  3. Upload a previously exported CSV mapping file.
  4. Parthenon validates the target concept IDs against the current vocabulary and flags any concepts that have been deprecated or invalidated since the export.

This enables a "build once, reuse everywhere" approach for multi-site studies using the same source system or data vendor.

Integration with Usagi

For large-scale manual concept mapping projects, Parthenon integrates with OHDSI Usagi -- a standalone Java tool that provides NLP-assisted concept suggestions based on term similarity:

  1. Export your unmapped codes from Parthenon as a Usagi-compatible CSV.
  2. Open the export in Usagi and use its similarity-based suggestions to find appropriate standard concepts.
  3. Review and approve mappings in Usagi's interface.
  4. Export the completed Usagi mapping file.
  5. Import the Usagi output back into Parthenon to apply all mappings at once.
Vocabulary updates

When you update your OMOP vocabulary (see Chapter 25 -- System Configuration), re-run auto-mapping on existing unmapped codes. New vocabulary releases often add mappings for previously unmappable codes, especially for newer ICD-10-CM codes and drug products.

Mapping Quality Metrics

After concept mapping is complete, the summary dashboard shows:

MetricDescriptionTarget
Overall mapping rate% of source codes with a standard concept> 90%
Frequency-weighted rate% of data rows with a mapped concept> 95%
Concept 0 rate% of rows mapped to concept ID 0< 5%
Review pendingCodes still awaiting manual review0
Domain coverageMapping rates per clinical domainVaries

These metrics help assess whether your ETL is producing analysis-ready data or requires additional vocabulary work.