Uploading Data

Parthenon includes a production-grade data ingestion pipeline for uploading source data and transforming it into OMOP CDM v5.4 format. This chapter covers the upload step -- the first stage of the ETL (Extract, Transform, Load) process. See Chapter 16 -- Schema Mapping and Chapter 17 -- Concept Mapping for the subsequent transformation and vocabulary mapping steps.

Supported Upload Formats

Parthenon accepts the most common data interchange formats used in healthcare and clinical research:

Format	Extension	Description	Best For
CSV	`.csv`	Comma-separated values (UTF-8 encoding required)	Claims data, registry exports
TSV	`.tsv`	Tab-separated values	Lab extracts, delimiter-sensitive data
Parquet	`.parquet`	Apache Parquet columnar format	Large datasets, high performance
Excel	`.xlsx`	Single-sheet Excel workbooks (converted to CSV internally)	Ad-hoc uploads, small datasets

File size limits

The web upload interface supports files up to 500 MB each, with a maximum of 20 files per batch. For larger datasets, use the CLI bulk loader described below. Parquet files are strongly recommended for datasets exceeding 100 MB due to their columnar compression and faster parsing.

Uploading Files via the Web Interface

Navigate to Data Ingestion > Upload Data from the main navigation.
Click New Upload Batch and enter a descriptive Batch Name (e.g., "Claims Q4 2025" or "EHR Extract - Site Alpha").
Select a Target Data Source -- the configured data source that will receive the ingested data. Only sources where you have write access are shown.
Drag and drop files into the upload area, or click Browse Files to select from your local filesystem.
Click Start Upload. Files are uploaded to the server and staged in the ingestion schema.

Each file displays a real-time progress bar during upload. After upload completes, the system parses the file and shows:

Row count -- total data rows detected (excluding headers)
Column headers -- detected column names with inferred data types
File size -- original and on-disk size
Parse status -- success, warnings (e.g., encoding issues), or failure

Naming conventions

Use descriptive filenames that indicate the source system and content type (e.g., claims_pharmacy_2025q4.csv rather than data_export.csv). Parthenon preserves the original filename for traceability throughout the ETL pipeline.

Staging Schema

Uploaded files are stored in a staging schema (configurable per source, defaults to staging) as raw tables named after the uploaded filenames (sanitized to valid PostgreSQL identifiers). Key properties of the staging layer:

No transformation -- raw data is preserved exactly as uploaded. Column names, data types, and values remain unchanged.
Isolation -- each upload batch creates its own set of staging tables, preventing collisions between concurrent uploads.
Retention -- staging data is retained until explicitly purged by an administrator or replaced by a subsequent upload to the same batch.
Inspection -- you can preview any staging table directly from the upload detail page to verify data before mapping.

Bulk Loader CLI

For large datasets that exceed the web upload limit, or for automated ETL pipelines, use the Artisan CLI bulk loader:

php artisan ingestion:load \
  --source=my_source \
  --batch="Claims Q4 2025" \
  --path=/data/claims/q4-2025/

This command reads all CSV and Parquet files from the specified directory and loads them directly into the staging schema using PostgreSQL COPY for maximum throughput, bypassing the web upload size limit entirely.

CLI options:

Flag	Description	Default
`--source`	Source key (from Data Sources configuration)	Required
`--batch`	Batch name for grouping	Required
`--path`	Directory containing data files	Required
`--format`	Force file format (`csv`, `tsv`, `parquet`)	Auto-detect
`--delimiter`	CSV delimiter character	`,`
`--encoding`	Input file encoding	`UTF-8`
`--truncate`	Truncate existing staging tables before load	`false`

Data privacy

Uploaded data is stored unencrypted in the staging schema within your PostgreSQL database. Ensure your database security, network access controls, and audit logging are properly configured before uploading patient data. Parthenon does not anonymize or de-identify uploaded data automatically. All upload actions are recorded in the Audit Log.

Upload History

Navigate to Data Ingestion > Upload History to view all past upload batches. The history view shows:

Column	Description
Batch Name	User-provided name for the upload batch
Target Source	Data source receiving the data
Files	Number of files in the batch
Total Rows	Sum of rows across all files
Status	Staged / Mapped / Imported / Failed
Uploaded By	User who performed the upload
Date	Upload timestamp

Click any batch to view per-file statistics, preview staging data, and proceed to the Schema Mapping step.

Troubleshooting Uploads

Symptom	Likely Cause	Resolution
Upload fails at 100%	Server timeout for large files	Use the CLI bulk loader instead
Garbled characters in preview	Non-UTF-8 encoding	Re-export source data as UTF-8, or specify encoding in CLI
Zero rows detected	Header-only file or wrong delimiter	Check file contents and delimiter settings
Duplicate batch error	Batch name already exists for this source	Use a unique batch name or append a timestamp

Do not upload PHI to development environments

If you are running Parthenon in a development or demo environment without proper HIPAA safeguards, do not upload real patient data. Use synthetic datasets such as SynPUF or Synthea for testing.

Supported Upload Formats​

Uploading Files via the Web Interface​

Staging Schema​

Bulk Loader CLI​

Upload History​

Troubleshooting Uploads​