Skip to main content

Mapping Assistant (Ariadne)

The Mapping Assistant is an AI-powered tool that maps non-standard source terms to OMOP standard concepts. Powered by the Ariadne service, it combines three complementary matching strategies — verbatim lookup, vector similarity search, and LLM-based reasoning — to find the best candidate concept for each source term you provide. This is particularly useful during ETL development, when you need to map thousands of local codes or free-text descriptions to the OMOP Vocabulary.

Matching Strategies

Ariadne applies three matching strategies to each source term, then selects the candidate with the highest confidence across all strategies:

Verbatim Matching

An exact or near-exact string match against concept names in the OMOP Vocabulary. Verbatim matches are fastest and produce the highest confidence scores. When a source term is spelled identically (or nearly so) to a standard concept name, verbatim matching returns a result instantly.

If no verbatim match is found, Ariadne queries a pgvector embedding index of OMOP concept names. This strategy captures semantic similarity — for example, mapping "heart attack" to "Acute myocardial infarction" even though the strings share no words. Vector matches typically produce moderate-to-high confidence scores.

LLM-Based Matching

For ambiguous or abbreviated terms (e.g., "CABG x2", "HTN w/ CKD"), Ariadne invokes a large language model to reason about the clinical intent and suggest the best OMOP concept. LLM matches are the most flexible but may produce lower confidence scores that warrant manual review.

Mapping Workflow

Step 1: Enter Source Terms

Navigate to Vocabulary > Mapping Assistant. In the Source Terms text area, enter your terms one per line. You can type terms manually or upload a CSV file using the Upload CSV button. When uploading a CSV, the first column of each row is extracted as a source term.

Step 2: Configure Target Filters

Optionally narrow the search space using the filter controls below the text area:

  • Target Vocabulary — Restrict candidates to specific vocabularies such as SNOMED CT, ICD-10-CM, RxNorm, LOINC, ICD-9-CM, CPT-4, HCPCS, or MedDRA. Select one or more, or leave empty for all vocabularies.
  • Target Domain — Restrict candidates to specific OMOP domains: Condition, Drug, Procedure, Measurement, Observation, or Device.

Step 3: Run Mapping

Click Map Terms to submit the batch. Ariadne processes all terms and returns results in a table with the following columns:

ColumnDescription
Source TermThe original term you entered
Best MatchThe highest-confidence OMOP concept candidate
ConfidenceA 0-100% score with a color-coded bar (green >= 80%, gold >= 50%, red < 50%)
Match TypeBadge indicating which strategy produced the best match (verbatim, vector, or llm)
VocabularyThe vocabulary of the best match (e.g., SNOMED, RxNorm)
ActionsAccept or reject buttons for each mapping

Step 4: Review Candidates

Click any row to expand it and see all candidates returned by Ariadne, not just the best match. Each candidate shows its concept ID, concept name, vocabulary, domain, match type, and confidence score. This lets you select an alternative mapping if the best match is incorrect.

Step 5: Accept or Reject

Use the checkmark and X buttons on each row to mark mappings as accepted or rejected. These decisions are tracked in the results summary and included in the CSV export.

Summary Statistics

After mapping completes, four summary cards appear above the results table:

  • Terms Mapped — Number of terms that received at least one candidate
  • High Confidence — Terms with a best-match confidence >= 80%
  • Need Review — Terms with a best-match confidence < 80%
  • No Match — Terms where no candidate was found

An accepted-mappings counter also appears when you begin accepting results.

Term Cleanup

The collapsible Term Cleanup section (below the mapping controls) helps you normalize messy source terms before mapping. Enter abbreviated or misspelled terms one per line (e.g., "t2dm", "HTN w/ CKD stage 3", "AF/aflutter"), then click Clean Terms. Ariadne returns a cleaned version of each term that is more likely to produce a high-confidence match.

Use Term Cleanup Before Mapping

Running cleanup on abbreviated clinical terms before mapping significantly improves match quality. For example, "t2dm" cleaned to "type 2 diabetes mellitus" produces a verbatim match instead of requiring LLM inference.

CSV Import and Export

Import

Click Upload CSV to load terms from a file. The importer reads the first column of each row as a source term. Header rows are included, so ensure your first row is a term or skip it manually.

Export

Click Export CSV (available in the header and at the bottom of results) to download the mapping results. The exported file includes columns for source term, concept ID, concept name, vocabulary, domain, confidence, match type, and your accept/reject decision.

Review Low-Confidence Mappings

Mappings with confidence below 50% (shown in red) should always be reviewed manually. Automated matching may produce incorrect results for highly abbreviated terms, misspellings, or terms that span multiple clinical concepts.

API Reference

The Mapping Assistant uses the Ariadne API endpoints. These can also be called programmatically for batch integration.

EndpointMethodDescription
/api/v1/ariadne/mapPOSTMap an array of source terms to OMOP concepts
/api/v1/ariadne/clean-termsPOSTNormalize messy source terms
/api/v1/ariadne/vector-searchPOSTPerform vector similarity search against concept embeddings

Map Terms Request

{
"terms": ["type 2 diabetes mellitus", "HTN", "CABG"],
"target_vocabularies": ["SNOMED", "RxNorm"],
"target_domains": ["Condition", "Procedure"]
}

Map Terms Response

Each result includes the source term, a best_match object (or null), and a candidates array of all matches. Each candidate contains concept_id, concept_name, vocabulary_id, domain_id, match_type, and a confidence score between 0 and 1.

Batch Size

For large mapping jobs (thousands of terms), consider splitting your input into batches of 100-200 terms. This keeps response times manageable and allows you to review results incrementally.

Supported Vocabularies

The target vocabulary filter supports the following standard vocabularies:

VocabularyCommon Use
SNOMED CTConditions, procedures, observations
ICD-10-CMDiagnosis codes
RxNormDrug ingredients and clinical drugs
LOINCLaboratory measurements
ICD-9-CMLegacy diagnosis codes
CPT-4Procedure billing codes
HCPCSHealthcare common procedure codes
MedDRAAdverse event reporting

Confidence Score Interpretation

The confidence score (0-100%) reflects how closely a candidate matches the source term across all three strategies. The color-coded bar provides a quick visual assessment:

RangeColorInterpretation
80-100%GreenHigh confidence. Likely correct; safe to auto-accept in most cases.
50-79%GoldModerate confidence. Manual review recommended. The match may be semantically close but not exact.
0-49%RedLow confidence. Likely requires manual mapping or term cleanup before re-mapping.

When the best match has low confidence, expand the row to review all candidates. A better match may exist further down the candidate list.

Permissions and Service Requirements

The Mapping Assistant requires the Ariadne backend service to be running. If the service is unavailable, mapping and cleanup operations will fail with an error message: "Mapping failed. Verify the Ariadne service is running and reachable."

Access to the Mapping Assistant requires the vocabulary:manage permission. Standard researcher accounts have read-only access to the vocabulary browser but may not have mapping permissions. Contact your administrator to request access.