Skip to main content

Uploading Data

Parthenon includes a production-grade data ingestion pipeline for uploading source data and transforming it into OMOP CDM v5.4 format. This chapter covers the upload step -- the first stage of the ETL (Extract, Transform, Load) process. See Chapter 16 -- Schema Mapping and Chapter 17 -- Concept Mapping for the subsequent transformation and vocabulary mapping steps.

Supported Upload Formats

Parthenon accepts the most common data interchange formats used in healthcare and clinical research:

FormatExtensionDescriptionBest For
CSV.csvComma-separated values (UTF-8 encoding required)Claims data, registry exports
TSV.tsvTab-separated valuesLab extracts, delimiter-sensitive data
Parquet.parquetApache Parquet columnar formatLarge datasets, high performance
Excel.xlsxSingle-sheet Excel workbooks (converted to CSV internally)Ad-hoc uploads, small datasets
File size limits

The web upload interface supports files up to 500 MB each, with a maximum of 20 files per batch. For larger datasets, use the CLI bulk loader described below. Parquet files are strongly recommended for datasets exceeding 100 MB due to their columnar compression and faster parsing.

Uploading Files via the Web Interface

  1. Navigate to Data Ingestion > Upload Data from the main navigation.
  2. Click New Upload Batch and enter a descriptive Batch Name (e.g., "Claims Q4 2025" or "EHR Extract - Site Alpha").
  3. Select a Target Data Source -- the configured data source that will receive the ingested data. Only sources where you have write access are shown.
  4. Drag and drop files into the upload area, or click Browse Files to select from your local filesystem.
  5. Click Start Upload. Files are uploaded to the server and staged in the ingestion schema.

Each file displays a real-time progress bar during upload. After upload completes, the system parses the file and shows:

  • Row count -- total data rows detected (excluding headers)
  • Column headers -- detected column names with inferred data types
  • File size -- original and on-disk size
  • Parse status -- success, warnings (e.g., encoding issues), or failure
Naming conventions

Use descriptive filenames that indicate the source system and content type (e.g., claims_pharmacy_2025q4.csv rather than data_export.csv). Parthenon preserves the original filename for traceability throughout the ETL pipeline.

Staging Schema

Uploaded files are stored in a staging schema (configurable per source, defaults to staging) as raw tables named after the uploaded filenames (sanitized to valid PostgreSQL identifiers). Key properties of the staging layer:

  • No transformation -- raw data is preserved exactly as uploaded. Column names, data types, and values remain unchanged.
  • Isolation -- each upload batch creates its own set of staging tables, preventing collisions between concurrent uploads.
  • Retention -- staging data is retained until explicitly purged by an administrator or replaced by a subsequent upload to the same batch.
  • Inspection -- you can preview any staging table directly from the upload detail page to verify data before mapping.

Bulk Loader CLI

For large datasets that exceed the web upload limit, or for automated ETL pipelines, use the Artisan CLI bulk loader:

php artisan ingestion:load \
--source=my_source \
--batch="Claims Q4 2025" \
--path=/data/claims/q4-2025/

This command reads all CSV and Parquet files from the specified directory and loads them directly into the staging schema using PostgreSQL COPY for maximum throughput, bypassing the web upload size limit entirely.

CLI options:

FlagDescriptionDefault
--sourceSource key (from Data Sources configuration)Required
--batchBatch name for groupingRequired
--pathDirectory containing data filesRequired
--formatForce file format (csv, tsv, parquet)Auto-detect
--delimiterCSV delimiter character,
--encodingInput file encodingUTF-8
--truncateTruncate existing staging tables before loadfalse
Data privacy

Uploaded data is stored unencrypted in the staging schema within your PostgreSQL database. Ensure your database security, network access controls, and audit logging are properly configured before uploading patient data. Parthenon does not anonymize or de-identify uploaded data automatically. All upload actions are recorded in the Audit Log.

Upload History

Navigate to Data Ingestion > Upload History to view all past upload batches. The history view shows:

ColumnDescription
Batch NameUser-provided name for the upload batch
Target SourceData source receiving the data
FilesNumber of files in the batch
Total RowsSum of rows across all files
StatusStaged / Mapped / Imported / Failed
Uploaded ByUser who performed the upload
DateUpload timestamp

Click any batch to view per-file statistics, preview staging data, and proceed to the Schema Mapping step.

Troubleshooting Uploads

SymptomLikely CauseResolution
Upload fails at 100%Server timeout for large filesUse the CLI bulk loader instead
Garbled characters in previewNon-UTF-8 encodingRe-export source data as UTF-8, or specify encoding in CLI
Zero rows detectedHeader-only file or wrong delimiterCheck file contents and delimiter settings
Duplicate batch errorBatch name already exists for this sourceUse a unique batch name or append a timestamp
Do not upload PHI to development environments

If you are running Parthenon in a development or demo environment without proper HIPAA safeguards, do not upload real patient data. Use synthetic datasets such as SynPUF or Synthea for testing.