Skip to main content

Patient-Level Prediction

Patient-Level Prediction (PLP) builds machine learning models that predict the probability of a clinical outcome for individual patients. Given a target population and an outcome, PLP extracts features from the OMOP CDM, trains classifiers, evaluates performance on held-out data, and produces calibrated risk scores. PLP implements the OHDSI HADES PatientLevelPrediction R package interface, enabling standardized, reproducible predictive modeling.

Configuration available; execution connecting to R runtime

The PLP design UI is fully functional --- you can configure target/outcome cohorts, select model types, tune hyperparameters, and set population and covariate settings. Execution is being connected to the R runtime container where the HADES PatientLevelPrediction package performs the actual model training and evaluation. In the interim, designs can be exported as R-ready configuration objects.


What PLP Does

The PLP pipeline performs the following steps:

  1. Population definition: Identify patients in the target cohort who meet the study criteria (observation requirements, no prior outcome, etc.)
  2. Feature extraction: Extract a large feature matrix from the CDM --- demographics, conditions, drugs, procedures, measurements --- typically producing 10,000+ binary and continuous features per patient
  3. Data splitting: Divide patients into training and test sets (or use cross-validation)
  4. Model training: Train one or more classifiers on the training set
  5. Model evaluation: Assess performance on the held-out test set using discrimination, calibration, and clinical utility metrics
  6. Risk scoring: Produce a calibrated predicted probability for each patient

Model Types

Parthenon supports five model architectures, spanning interpretable linear models to complex ensemble and deep learning approaches:

ModelTypeKey CharacteristicsBest For
LASSO Logistic RegressionLinearL1-regularized; automatic feature selection; most interpretable; coefficient-based explanationDefault choice; regulatory submissions; situations requiring model explainability
Random ForestEnsembleCollection of decision trees; robust to non-linearity and interactions; variable importance rankingModerate-sized datasets; exploration of non-linear relationships
Gradient Boosting (XGBoost)EnsembleSequential boosting of weak learners; often highest accuracy; tunable complexityLarge datasets; maximizing predictive performance
Deep Learning (ResNet)Neural NetworkDeep residual network; can capture complex patterns; requires more data and computeVery large datasets (100K+ patients); GPU-accelerated environments
AdaBoostEnsembleAdaptive boosting; focuses training on misclassified examples; simpler than XGBoostModerate datasets; alternative ensemble approach
Start with LASSO

LASSO logistic regression should be your default first model. It is fast, interpretable (you can inspect which features drive predictions), and often performs competitively with more complex models. Only move to ensemble or deep learning models if LASSO performance is insufficient and you have enough data to support the additional complexity.


Creating a Prediction Analysis

  1. Navigate to Analyses and select the Predictions tab.
  2. Click New Analysis and select Prediction.
  3. Configure the design on the detail page.

Cohort Configuration

SettingDescription
Target cohortThe population in which predictions will be made (e.g., patients hospitalized for heart failure)
Outcome cohortThe event to predict (e.g., 30-day readmission, 1-year mortality)

Time at Risk

The time at risk defines the prediction window --- the period during which the outcome must occur to be counted as a positive case:

SettingDescriptionDefault
StartDays after cohort start to begin observation1
EndDays after the anchor to end observation365
End anchorcohort start or cohort endcohort start

Example configurations:

Prediction questionStartEndEnd anchor
30-day readmission after discharge130cohort start
1-year mortality after T2DM diagnosis1365cohort start
90-day adverse event after drug start190cohort start
Event during active treatment10cohort end

Model Configuration

SettingDescriptionDefault
Model typeOne of the five model typeslasso_logistic_regression
HyperparametersModel-specific tuning parameters (optional)Auto-tuned

Model-specific hyperparameters:

ModelKey Hyperparameters
LASSO LRRegularization strength (lambda), convergence tolerance
Random ForestNumber of trees, max depth, min samples per leaf
XGBoostNumber of rounds, learning rate, max depth, subsample ratio
Deep LearningLayer sizes, dropout rate, learning rate, epochs, batch size
AdaBoostNumber of estimators, learning rate, base estimator type
Automatic hyperparameter tuning

When hyperparameters are left at their defaults (empty {}), the R runtime uses cross-validation on the training set to automatically select optimal values. This is recommended for most use cases. Manual hyperparameter specification is available for advanced users who want to replicate specific model configurations.

Covariate Settings

Configure which CDM features are included in the feature matrix:

SettingDescriptionDefault
DemographicsAge, gender, race, ethnicity, year of birthEnabled
Condition occurrenceBinary indicators for each condition conceptEnabled
Drug exposureBinary indicators for each drug conceptEnabled
Procedure occurrenceBinary indicators for each procedure conceptDisabled
MeasurementMeasurement values and binary indicatorsDisabled
Time windowsLookback periods for feature extraction[-365, 0]

Population Settings

Population settings control which patients from the target cohort are included in the modeling dataset:

SettingDescriptionDefault
Washout periodDays of required observation before index365
Remove prior outcomeExclude patients with the outcome before indexEnabled
Require time at riskRequire minimum observation after indexEnabled
Minimum time at riskMinimum days of post-index observation required365
Outcome prevalence matters

If the outcome is very rare (< 1% of the target population), prediction models may struggle to learn meaningful patterns. Consider:

  • Extending the time at risk window
  • Broadening the outcome definition
  • Using AUPRC (which handles class imbalance better) rather than AUROC as the primary metric

Split Settings

SettingDescriptionDefault
Test fractionProportion of patients held out for evaluation0.25
Split seedRandom seed for reproducible train/test splitting42

Performance Metrics

All PLP models are evaluated on a comprehensive set of metrics:

Discrimination

MetricWhat it measuresIdeal value
AUROCProbability that a random positive case is ranked higher than a random negative case1.0 (random = 0.5)
AUPRCArea under the precision-recall curve; preferred for rare outcomes1.0 (random = outcome prevalence)

Calibration

MetricWhat it measuresIdeal value
Calibration plotExpected vs. observed event rates across deciles of predicted riskPoints on the 45-degree line
Brier scoreMean squared error of probabilistic predictions0.0 (lower is better)
E-statisticAverage absolute calibration error across deciles0.0 (lower is better)

Clinical Utility

MetricWhat it measures
Net benefitDecision-curve analysis across probability thresholds; compares model to "treat all" and "treat none" strategies
Sensitivity/Specificity at thresholdPerformance at clinically meaningful probability cutoffs

Interpretation Guidelines

AUROC RangeInterpretation
0.90--1.00Excellent discrimination
0.80--0.90Good discrimination
0.70--0.80Acceptable discrimination
0.60--0.70Poor discrimination
0.50--0.60Near random; model is not useful
AUROC alone is insufficient

A high AUROC does not guarantee a useful model. A model with AUROC = 0.85 but poor calibration (predicted 10% risk when actual risk is 2%) is clinically dangerous. Always evaluate both discrimination AND calibration before deploying a model.


Privacy and Model Sharing

PLP models are trained on patient-level data within a data access boundary. Parthenon enforces the following privacy constraints:

  • Model objects can be exported (coefficients, tree structures, neural network weights) because they do not contain individual patient data.
  • Patient-level predictions cannot be exported from the platform without explicit authorization.
  • Minimum cell count settings suppress any aggregate statistics (feature prevalences, outcome counts) with small counts.
  • External validation (applying a model trained at one site to data at another site) requires the model object to be transferred, not the data.

Use Cases

Use CaseTarget CohortOutcomeTime at Risk
Hospital readmissionPatients discharged from inpatient visit30-day all-cause readmission1--30 days
Cardiovascular riskT2DM patients initiating therapyMajor adverse cardiovascular event1--365 days
Treatment responseCancer patients starting immunotherapyTreatment response at 90 days1--90 days
Surgical complicationPatients undergoing joint replacementPost-operative infection1--90 days
Disease progressionEarly CKD patientsProgression to Stage 4+1--730 days
Propensity scoring(Any two treatment groups)Treatment assignmentAt index

Best Practices

  1. Define the question precisely: The prediction question must specify WHO (target), WHAT (outcome), and WHEN (time at risk) before any modeling begins.

  2. Ensure adequate sample size: As a rough guide, you need at least 100 outcome events in the training set for LASSO, and 500+ for ensemble/deep learning methods.

  3. Avoid data leakage: Features must be derived only from data available BEFORE the prediction time point. The time window setting ensures this --- never include post-index features.

  4. Evaluate on held-out data: Never assess model performance on the data used for training. The test fraction setting ensures proper evaluation.

  5. Validate externally: A model trained on one database should be validated on an independent database before clinical deployment. Performance typically drops in external validation.

  6. Report all metrics: Discrimination (AUROC, AUPRC), calibration (Brier score, calibration plot), and clinical utility (net benefit) together --- not just AUROC alone.

  7. Consider clinical actionability: A prediction model is only useful if the predicted risk can lead to a different clinical action (intervention, monitoring, referral). Models predicting non-actionable outcomes have limited value.