Concept Mapping
Concept mapping (also called code mapping or vocabulary mapping) translates source codes in your data -- ICD-10-CM diagnosis codes, NDC drug codes, LOINC lab codes, CPT procedure codes, and other terminologies -- into OMOP standard concept IDs. This step is essential for making your CDM data queryable using the standardized OMOP vocabulary and enables cross-database analyses across institutions using different source coding systems.
Why Concept Mapping Matters
Source data typically uses local or domain-specific terminologies that vary by institution, vendor, and country:
| Source Terminology | Domain | Example Code | Example Meaning |
|---|---|---|---|
| ICD-10-CM | Conditions | E11.9 | Type 2 diabetes without complications |
| NDC | Drugs | 00002-7510-01 | Metformin 500mg tablet |
| CPT-4 | Procedures | 99213 | Office visit, established patient |
| LOINC | Measurements | 2339-0 | Glucose [Mass/volume] in Blood |
| SNOMED CT | Observations | 73211009 | Diabetes mellitus |
OMOP CDM stores the original source codes in *_source_value and *_source_concept_id columns for full traceability, but all analytical queries use standardized concept IDs in the *_concept_id columns:
- SNOMED CT for conditions
- RxNorm for drugs
- LOINC for measurements
- SNOMED CT for procedures (or CPT4 mapped to SNOMED)
The concept mapping step populates these standardized concept ID columns correctly, enabling your data to participate in network studies and standardized analytics.
Automatic Mapping
Parthenon leverages the OMOP vocabulary's concept_relationship table to automatically map source codes to standard concepts:
- Navigate to the mapping results for your upload batch (after schema mapping is complete).
- Click the Concept Mapping tab.
- Click Auto-Map to start the automatic mapping process.
- Parthenon queries the vocabulary for
Maps torelationships from each source code. - The system categorizes results:
- Mapped -- single unambiguous standard concept found; auto-applied
- Review -- multiple possible standard concepts found; requires manual selection
- Unmapped -- no
Maps torelationship exists in the vocabulary
For well-coded claims data using standard terminologies (ICD-10-CM, NDC, CPT-4), auto-mapping typically achieves 85-95% coverage. EHR data with local codes or free-text descriptions will have lower auto-mapping rates and require more manual review.
Manual Mapping Review
The mapping review table provides a comprehensive view of all source codes and their mapping status:
| Column | Description |
|---|---|
| Source Value | The raw code from your data (e.g., E11.9) |
| Source Vocabulary | Detected vocabulary (ICD10CM, NDC, CPT4, LOINC, HCPCS, etc.) |
| Frequency | Number of rows in your data containing this source code |
| Auto-Mapped To | Standard concept ID and name from auto-mapping (if found) |
| Confidence | Mapping confidence score (High / Medium / Low) |
| Status | Mapped / Unmapped / Review Needed |
Resolving Unmapped Codes
For codes with no automatic mapping or those flagged for review:
- Click the search icon next to the unmapped code to open the concept search dialog.
- Search by code, name, or keyword in the OMOP vocabulary.
- Review candidate standard concepts with their domain, vocabulary, and validity dates.
- Click Accept to apply the mapping.
- Optionally check Apply to all to map all occurrences of this source code across all batches.
Bulk Review Workflow
For large mapping efforts, use the bulk review tools:
- Sort by frequency -- map high-frequency codes first for maximum impact
- Filter by status -- show only unmapped or review-needed codes
- Filter by domain -- focus on one clinical domain at a time
- Batch accept -- accept all high-confidence auto-mappings in one click
Unmappable Codes
Some source codes cannot be mapped to a standard OMOP concept. Common reasons:
- Custom local codes -- institution-specific codes not in any standard vocabulary
- Vocabulary gaps -- the OMOP vocabulary does not yet include this terminology
- Malformed codes -- truncated, invalid, or data entry errors
- Deprecated codes -- codes that have been retired from their source vocabulary
For unmappable codes, set the concept ID to 0 -- the OMOP convention meaning "no standard concept available." These records will still appear in *_source_value columns and can be queried directly when needed. The source concept ID (*_source_concept_id) should still be populated if the source code exists in the vocabulary, even without a standard mapping.
Records mapped to concept ID 0 are excluded from most standard OHDSI analyses (incidence rates, characterization, cohort definitions) because these analyses filter on standard concepts. Track your concept 0 rate as a key data quality metric -- high rates indicate vocabulary gaps that may bias your analyses.
Exporting and Importing Mapping Tables
Concept mapping tables can be exported and shared across Parthenon instances:
Export
- Navigate to the completed concept mapping for a batch.
- Click Export Mappings to download a CSV file containing:
- Source vocabulary, source code, source name
- Target standard concept ID, concept name, domain
- Mapping status and reviewer notes
Import
- Navigate to Data Ingestion > Concept Mappings.
- Click Import Mapping File.
- Upload a previously exported CSV mapping file.
- Parthenon validates the target concept IDs against the current vocabulary and flags any concepts that have been deprecated or invalidated since the export.
This enables a "build once, reuse everywhere" approach for multi-site studies using the same source system or data vendor.
Integration with Usagi
For large-scale manual concept mapping projects, Parthenon integrates with OHDSI Usagi -- a standalone Java tool that provides NLP-assisted concept suggestions based on term similarity:
- Export your unmapped codes from Parthenon as a Usagi-compatible CSV.
- Open the export in Usagi and use its similarity-based suggestions to find appropriate standard concepts.
- Review and approve mappings in Usagi's interface.
- Export the completed Usagi mapping file.
- Import the Usagi output back into Parthenon to apply all mappings at once.
When you update your OMOP vocabulary (see Chapter 25 -- System Configuration), re-run auto-mapping on existing unmapped codes. New vocabulary releases often add mappings for previously unmappable codes, especially for newer ICD-10-CM codes and drug products.
Mapping Quality Metrics
After concept mapping is complete, the summary dashboard shows:
| Metric | Description | Target |
|---|---|---|
| Overall mapping rate | % of source codes with a standard concept | > 90% |
| Frequency-weighted rate | % of data rows with a mapped concept | > 95% |
| Concept 0 rate | % of rows mapped to concept ID 0 | < 5% |
| Review pending | Codes still awaiting manual review | 0 |
| Domain coverage | Mapping rates per clinical domain | Varies |
These metrics help assess whether your ETL is producing analysis-ready data or requires additional vocabulary work.