Cohort Expressions
A cohort expression is the formal, machine-readable definition of who belongs to a cohort and when their membership begins and ends. Parthenon uses the CIRCE-BE cohort expression format --- the same standard used by OHDSI Atlas --- which describes cohort membership as a structured combination of clinical criteria evaluated against OMOP CDM v5.4 data.
Understanding cohort expressions is foundational. Every cohort you build in Parthenon --- whether through the visual builder or by importing JSON --- resolves to a CIRCE expression that gets compiled into SQL and executed against your CDM.
Anatomy of a Cohort Expression
Every CIRCE cohort expression is composed of seven logical sections. Not all sections are required, but together they give you full control over cohort membership logic.
1. Primary Criteria (Index Event)
The primary criteria define the anchoring clinical event --- the index event --- that qualifies a patient for cohort entry. The index event establishes T = 0 (the index date) from which all other temporal logic is measured.
Primary criteria specify:
-
Domain: Which CDM table to search. Supported domains are:
ConditionOccurrence--- diagnoses and conditionsDrugExposure--- prescriptions, dispensings, administrationsProcedureOccurrence--- surgical and diagnostic proceduresMeasurement--- lab tests and vital signsObservation--- clinical observationsVisitOccurrence--- encounters (inpatient, outpatient, ED)Death--- mortality events
-
Concept Set: Which concepts must match (e.g., all concepts in "Type 2 Diabetes" concept set)
-
Observation Window: Required continuous observation before and/or after the index date. For example, requiring 365 days of prior observation ensures sufficient lookback for baseline characterization.
Examples of index events:
| Clinical scenario | Domain | Description |
|---|---|---|
| New T2DM diagnosis | ConditionOccurrence | First condition record matching T2DM concept set |
| Metformin initiation | DrugExposure | First drug exposure for metformin concepts |
| GI hemorrhage hospitalization | VisitOccurrence | Inpatient visit with a GI bleed diagnosis |
| Elevated HbA1c | Measurement | HbA1c measurement with value >= 7.0% |
The choice of index event fundamentally shapes the study. A "first diagnosis" index event produces an incident cohort (new cases), while allowing any occurrence produces a prevalent cohort (existing cases). For causal inference studies, incident/new-user designs are strongly preferred.
2. Inclusion Rules (Additional Criteria)
Inclusion rules are named logical expressions that must be satisfied relative to the index event for a patient to remain in the cohort. They are evaluated sequentially, and the attrition report shows how many patients survive each rule.
Each inclusion rule contains one or more criteria groups that define:
- Criteria: Clinical events from any CDM domain with concept set filters
- Temporal windows: Days relative to the index date. A window of
[-365, -1]means "in the 365 days before index (exclusive of index day)." A window of[0, 30]means "at index or within 30 days after." - Occurrence counts: How many times the criterion must be met:
At least N(type 2) --- most common, e.g., "at least 1 prior diagnosis"Exactly N(type 0) --- e.g., "exactly 0 prior drug exposures" for new-user designsAt most N(type 1) --- upper bound
- Group logic: Criteria within a group combine with
ALL(every criterion met),ANY(at least one met), orAT_MOST_0(none met, i.e., exclusion) - Nested groups: Groups can contain sub-groups for complex boolean logic
| Pattern | Implementation |
|---|---|
| Prior diagnosis required | At least 1 ConditionOccurrence in concept set X, window [-365, -1] |
| New user (no prior exposure) | Exactly 0 DrugExposure in concept set X, window [-365, -1] |
| Minimum follow-up | At least 365 days of continuous observation after index |
| Age restriction | Demographic filter: age >= 18 at index |
| Exclude prior outcome | Group type AT_MOST_0 with the outcome concept set |
3. Cohort Exit (End Strategy)
The end strategy defines when a patient's cohort membership ends. If no end strategy is specified, the default is end of continuous observation.
Available strategies:
| Strategy | Configuration | Use case |
|---|---|---|
| End of continuous observation | (default) | Pharmacoepi studies where you observe until data runs out |
| Fixed duration after index | DateField + Offset in days | Time-limited exposure windows (e.g., 90 days post-surgery) |
| Drug era end | DrugCodesetId + GapDays + Offset | Treatment cohorts where membership = active drug era |
For drug era-based exit, the GapDays parameter defines how many days of no drug supply are allowed before the era is considered ended. An Offset of 0 means exit on the exact era end date; a positive offset extends membership beyond era end.
4. Qualified Limit
The QualifiedLimit controls how many qualifying events per patient are considered before inclusion rules are applied:
"First"--- only the earliest qualifying event per patient (most common for incident cohorts)"All"--- all qualifying events per patient (allows multiple cohort entries)
5. Expression Limit
The ExpressionLimit controls how many events per patient survive after all inclusion rules are applied:
"First"--- keep only the first surviving event per patient"All"--- keep all surviving events
The distinction matters: QualifiedLimit = "All" with ExpressionLimit = "First" means "evaluate all events against inclusion rules, but ultimately keep only the first one that passes." This is useful when the first qualifying event might not meet inclusion rules but a later one does.
6. Censoring Criteria
Censoring criteria define events that terminate a patient's observation for the purpose of this cohort, even if their continuous observation period has not ended. Common censoring events:
- Death
- Occurrence of a competing risk (e.g., censor at liver transplant in a hepatic outcome study)
- Switch to a different treatment
Censoring criteria use the same domain/concept-set structure as primary criteria but do not have temporal windows --- they are evaluated across the entire cohort membership period.
7. Collapse Settings
CollapseSettings control whether multiple cohort entry periods for the same patient that are close together should be merged into a single continuous period:
CollapseType: "ERA"--- merge adjacent periodsEraPad--- maximum gap in days between periods that still triggers collapse (e.g.,EraPad: 30merges entries separated by 30 days or fewer)
This is analogous to the drug era construction logic in OMOP --- it smooths over brief gaps to create continuous exposure eras.
Demographic Criteria
In addition to clinical domain criteria, cohort expressions support demographic filters that constrain based on patient attributes:
- Age: Numeric range evaluated at the index date
- Gender: Concept IDs for Male (8507) and Female (8532)
- Race: OMOP race concept IDs
- Ethnicity: OMOP ethnicity concept IDs
Demographic filters can appear in both primary criteria (restricting who qualifies for the index event) and in inclusion rules (further narrowing after index).
Extended Criteria: Genomics and Imaging
Parthenon extends the standard CIRCE format with two additional criteria types for precision medicine research.
Genomic Criteria (Phase 15)
The GenomicCriteria array allows cohort definitions to incorporate molecular biomarker filters:
| Type | Description | Parameters |
|---|---|---|
gene_mutation | Specific gene mutation | gene, hgvs (e.g., EGFR L858R) |
tmb | Tumor mutational burden | tmbOperator, tmbValue, tmbUnit |
msi | Microsatellite instability | msiStatus (MSS, MSI-L, MSI-H) |
fusion | Gene fusion event | gene1, gene2 |
pathogenicity | ClinVar pathogenicity class | clinvarClasses array |
treatment_episode | Treatment episode linkage | (context-dependent) |
Each genomic criterion can be negated with exclude: true to create exclusion filters (e.g., "patients WITHOUT KRAS mutation").
Imaging Criteria (Phase 16)
The ImagingCriteria array adds DICOM imaging-based filters:
| Type | Description | Parameters |
|---|---|---|
modality | Imaging modality | modality (CT, MRI, PET, etc.) |
anatomy | Body part examined | bodyPart |
quantitative | Radiomics feature threshold | featureName, operator, value, unit |
ai_classification | AI model classification | classificationLabel, minConfidence |
dose | Radiation dose limit | maxDoseGy |
Genomic and imaging criteria are Parthenon extensions to the CIRCE format. Cohort definitions using these criteria will not be directly compatible with standard OHDSI Atlas. The standard CIRCE portions of the expression remain fully interoperable.
JSON Representation
CIRCE cohort expressions are stored as JSON in the expression_json column of the cohort_definitions table in the application database. The JSON structure maps directly to the TypeScript CohortExpression interface.
Here is a simplified example of a "New users of metformin with T2DM" cohort expression:
{
"ConceptSets": [
{
"id": 0,
"name": "Metformin",
"expression": {
"items": [
{
"concept": {
"CONCEPT_ID": 1503297,
"CONCEPT_NAME": "metformin",
"DOMAIN_ID": "Drug",
"VOCABULARY_ID": "RxNorm",
"STANDARD_CONCEPT": "S"
},
"isExcluded": false,
"includeDescendants": true,
"includeMapped": false
}
]
}
},
{
"id": 1,
"name": "Type 2 Diabetes Mellitus",
"expression": {
"items": [
{
"concept": {
"CONCEPT_ID": 201826,
"CONCEPT_NAME": "Type 2 diabetes mellitus",
"DOMAIN_ID": "Condition",
"VOCABULARY_ID": "SNOMED",
"STANDARD_CONCEPT": "S"
},
"isExcluded": false,
"includeDescendants": true,
"includeMapped": false
}
]
}
}
],
"PrimaryCriteria": {
"CriteriaList": [
{ "DrugExposure": { "CodesetId": 0, "First": true } }
],
"ObservationWindow": { "PriorDays": 365, "PostDays": 0 }
},
"AdditionalCriteria": {
"Type": "ALL",
"CriteriaList": [
{
"Criteria": {
"ConditionOccurrence": { "CodesetId": 1 }
},
"StartWindow": {
"Start": { "Days": 365, "Coeff": -1 },
"End": { "Days": 0, "Coeff": -1 }
},
"Occurrence": { "Type": 2, "Count": 1 }
}
],
"Groups": []
},
"QualifiedLimit": { "Type": "First" },
"ExpressionLimit": { "Type": "First" },
"CollapseSettings": { "CollapseType": "ERA", "EraPad": 0 }
}
This expression reads: "Find patients whose first-ever metformin exposure (with at least 365 days of prior observation) occurs after at least one T2DM diagnosis in the prior year. Keep only the first qualifying event per patient."
Cohort expression JSON exported from Parthenon (without genomic/imaging extensions) can be imported directly into OHDSI Atlas, and vice versa. This makes Parthenon fully interoperable with the broader OHDSI ecosystem.
SQL Generation
When you generate a cohort, Parthenon compiles the CIRCE JSON into executable SQL parameterized for your target CDM schema. The generated SQL:
- Creates a temporary table of all qualifying index events from the primary criteria domain
- Applies the observation window filter to ensure sufficient lookback/follow-up
- Applies each inclusion rule sequentially, recording the count surviving each rule (for the attrition report)
- Applies the end strategy to compute
cohort_end_date - Applies the qualified and expression limits
- Applies collapse settings to merge adjacent eras
- Inserts final rows into the
{results_schema}.cohorttable
The SQL generation is deterministic --- the same expression JSON always produces the same SQL, ensuring reproducibility across executions and data refreshes.
For the complete JSON schema reference, see Appendix C --- CIRCE JSON Schema.