ETL Tools

The ETL Tools page is a unified workspace that brings together three data ingestion capabilities under a single tabbed interface. It provides tools for profiling source databases, generating synthetic OMOP data, and ingesting FHIR resources into the CDM. Each tab operates independently, allowing you to switch between workflows without losing state.

Accessing ETL Tools

Navigate to Ingestion > ETL Tools from the main navigation. The page displays three tabs:

Source Profiler — WhiteRabbit-based database profiling
Synthea Generator — Synthetic patient data generation
FHIR Ingestion — FHIR R4 resource ingestion

Tab 1: Source Profiler

The Source Profiler tab provides a streamlined version of the full Source Profiler for quick database scanning. It connects to the WhiteRabbit service to scan your data source and produce a table-by-table quality report.

Usage

Verify the WhiteRabbit service health indicator shows "available" at the top of the tab.
Select a Data Source from the dropdown (all registered Parthenon sources are listed).
Optionally enter a Table Filter — a comma-separated list of table names to restrict the scan scope.
Click Scan Database and wait for the scan to complete.

Results

After scanning, the tab displays:

Summary cards — Tables scanned, total columns, total rows, and scan time
Data Quality Flags — A warning list of tables that have columns with > 50% null values, identifying specific column names
Table accordions — Collapsible rows for each table showing column name, data type (color-coded badge), null percentage bar, distinct value count, and top-5 sample values

Click Export Report to download the scan results as a JSON file.

Quick Scan vs Full Profiler

The Source Profiler tab here provides the essential scanning workflow. For advanced features like the completeness heatmap, data quality scorecard with letter grades, table size distribution chart, scan history, and sorting/filtering controls, use the dedicated Source Profiler page.

Tab 2: Synthea Generator

The Synthea Generator loads pre-generated Synthea CSV files into an OMOP CDM source, converting synthetic patient records into properly structured CDM tables.

Configuration

Field	Description
Target Source	The registered data source where CDM records will be inserted
Patient Count	Number of synthetic patients to load (1 to 100,000)
Synthea CSV Output Folder	Absolute filesystem path to the directory containing Synthea CSV output files (e.g., `patients.csv`, `encounters.csv`). This path must be accessible from the R runtime container.
CDM Version	Target OMOP CDM version: 5.4 (default) or 5.3

Running the Generator

Verify the Synthea ETL service health indicator shows "available." The status also displays the service version and supported capabilities.
Select a target source, set the patient count, and provide the CSV folder path.
Click Generate to start the ETL process.

CSV Files Must Exist First

The Synthea Generator does not generate the Synthea CSV files itself. You must run Synthea separately to produce the CSV output, then point this tool at the output directory. The generator reads the CSVs and loads them into the CDM.

Results

After generation completes, the tab shows:

Persons Generated — Number of person records created
Total Rows Inserted — Sum of all CDM records across tables
Elapsed Time — Duration of the ETL process
Per-Table Row Counts — A bar chart showing the number of rows inserted into each CDM table, sorted by count

Tab 3: FHIR Ingestion

The FHIR Ingestion tab provides an embedded version of the FHIR Ingestion tool for converting FHIR R4 resources into OMOP CDM records. It supports both JSON Bundle paste and NDJSON file upload, with resource preview, mapping coverage metrics, and error logging.

For complete documentation of the FHIR ingestion workflow, resource type mappings, and API reference, see the dedicated FHIR Ingestion page.

Common Workflow: Profile, Generate, Verify

A typical workflow using all three tabs:

Profile your target source using the Source Profiler tab to establish a baseline. Note any existing tables and row counts.
Generate synthetic data using the Synthea Generator tab (or ingest real data via FHIR) to populate the CDM.
Profile again after loading to verify that expected tables are populated, null rates are acceptable, and row counts match expectations.

Use for Development and Testing

The ETL Tools are particularly useful during development. Generate a Synthea dataset for your test environment, then use the Source Profiler to verify the data loaded correctly before running Achilles characterization or cohort generation.

Service Dependencies

The ETL Tools page depends on several backend services:

Service	Required By	Health Check
WhiteRabbit	Source Profiler tab	Shown as status badge
Synthea ETL (R Runtime)	Synthea Generator tab	Shown as status badge
FHIR Ingestion Service	FHIR Ingestion tab	Shown as status badge

If a service is unavailable, its corresponding tab will display a warning and operations will fail. Check the System Health Dashboard for service status details.

Data Type Color Coding

Both the Source Profiler and Synthea results use color-coded type badges to help you quickly identify column categories:

Data Type	Color
varchar, text	Blue
integer, int, bigint	Teal
numeric, float, double	Gold
date, datetime, timestamp	Purple
boolean, bool	Orange

Error Handling

Each tab handles errors independently. When a scan, generation, or ingestion fails:

An error banner appears with a red background showing the failure reason
The operation can be retried without losing your configuration
Previous successful results remain visible until you clear them

Check Service Logs for Detailed Errors

The error messages shown in the UI are summaries. For detailed stack traces and debugging information, check the Docker service logs:

WhiteRabbit: docker compose logs -f php
Synthea ETL: docker compose logs -f r-runtime
FHIR Ingestion: docker compose logs -f php

Permissions

Access to the ETL Tools page requires the etl:manage permission. By default, this permission is granted to users with the data-engineer or super-admin role. Standard researcher accounts do not have access to ETL operations. Contact your system administrator to request access if needed.

Source Profiler (full) — Advanced profiling with heatmap, scorecard, and history
FHIR Ingestion (full) — Complete FHIR workflow with API reference
Schema Mapping — Mapping source schemas to OMOP CDM
Concept Mapping — Mapping source codes to OMOP concepts
Mapping Assistant (Ariadne) — AI-powered concept mapping

Accessing ETL Tools​

Tab 1: Source Profiler​

Usage​

Results​

Tab 2: Synthea Generator​

Configuration​

Running the Generator​

Results​

Tab 3: FHIR Ingestion​

Common Workflow: Profile, Generate, Verify​

Service Dependencies​

Data Type Color Coding​

Error Handling​

Permissions​

Related Documentation​

Accessing ETL Tools

Tab 1: Source Profiler

Usage

Results

Tab 2: Synthea Generator

Configuration

Running the Generator

Results

Tab 3: FHIR Ingestion

Common Workflow: Profile, Generate, Verify

Service Dependencies

Data Type Color Coding

Error Handling

Permissions

Related Documentation