Jobs Page Overhaul, Drug Era Performance Breakthrough, and Cohort Pipeline Hardening
A landmark day for platform observability and data pipeline reliability. We shipped a fully wired Jobs monitoring page that surfaces all 13+ tracked job types, broke through a major ETL performance ceiling on the SynPUF dataset (17 hours → 14 minutes for drug_era builds), and closed out a cohort generation audit that uncovered eight discrete bugs across the SQL builders, API layer, and frontend.
Jobs Page: From Partial View to Full Platform Visibility
The single biggest user-facing win today was landing commit 5e29c3a4e — a ground-up rework of the Jobs monitoring experience.
What Was Broken
The Jobs page was only surfacing 8 of the system's 13+ tracked job types. Achilles runs, FHIR Sync, Care Gap evaluations, GIS Boundary loads, and Poseidon ETL runs were all dispatching correctly through Horizon and writing to their tracking models — they were just completely invisible in the UI. The JobController::index() method simply never queried them.
The detail drawer had its own problem: the show endpoint was hardcoded to AnalysisExecution route model binding, meaning clicking any non-analysis job (cohort generation, ingestion, DQD, etc.) returned a 404.
Several secondary bugs compounded visibility further:
- Stale cohort generation jobs appeared under the wrong status filter due to a DB-filter/display-status mismatch
- FHIR Export was leaking the raw
processingstatus string instead of normalizing torunning - N+1
Source::find()calls inside DQD, Heel, and Achilles map loops - SCCS and Evidence Synthesis type filters were returning all analysis types instead of scoping to their specific morph class
What We Shipped
JobController.php gained nearly 1,000 lines across two key changes:
Five new job collectors in index():
| Type | Model | Scope |
|---|---|---|
fhir_sync | FhirSyncRun | System |
care_gap | CareGapEvaluation | User |
gis_boundary | GisDataset | User |
poseidon | PoseidonRun | System |
(The finngen type was removed — it lives in the workbench app, not core job tracking.)
Polymorphic show endpoint: Route model binding is gone. The new show(Request $request, int $jobId) dispatches through 14 type-specific detail builders via a ?type= query param. Each builder returns the standard job envelope plus a details object with type-specific metadata and a timeline array of execution events — giving the rich detail drawer real data to render for every job type in the system.
ETL Performance: drug_era Goes From Overnight to Minutes
Commit a084b84f6 delivered one of the most dramatic performance wins we've had in the ETL layer. The drug_era build step on the 2.3M-patient SynPUF dataset was taking 17 hours. It now runs in 14 minutes.
The fix was a two-phase build strategy. Previously the pipeline attempted to compute drug eras in a single monolithic pass, which collapsed under the weight of the dataset's join complexity and row volume. The rewrite splits the work: phase one materializes an intermediate exposure table with appropriate indexes, and phase two performs the era consolidation logic against that pre-built structure. The intermediate materialization pays for itself immediately by giving the query planner something it can actually reason about.
This was preceded by 1eb297148, which rewrote the SynPUF enrichment parallelism to eliminate the OOM crashes and deadlocks that were causing enrichment runs to fail intermittently on large datasets. Both fixes together mean the SynPUF 2.3M enrichment pipeline is now stable and fast end-to-end.
Cohort Generation: Eight Bugs Closed
Commit 6b4012262 documents a focused audit of the cohort generation pipeline that surfaced and fixed eight bugs spanning three layers:
- SQL builders — edge cases in inclusion criteria handling that produced incorrect cohort membership under specific date range configurations
- API layer — response shape inconsistencies that caused the frontend to silently drop data
- Frontend — patient list navigation was routing to malformed profile URLs (
9d79ffe37), and breadcrumbs weren't context-aware when entering from the cohort view
The risk scores feature also got two targeted fixes: 1297db01b corrected the recommend endpoint to return a structured response with the full patient profile attached (it was previously returning a bare score), and 6b4012262 fixed a useParams / route definition mismatch where the component was reading scoreId but the route defined :id.
AI Memory & Infrastructure Housekeeping
On the AI side, 6035d6d65 streamlined the Chroma memory path resolution — a small but meaningful cleanup that removes ambiguity in how the vector store locates its persistence directory across different deployment environments.
Infrastructure received a round of tuning in 9f24a2ca2: Docker Compose configuration was tightened, Horizon queue configuration was cleaned up, monitoring alerts were added for key pipeline stages, and the CI workflow was updated to reflect the current test surface.
What's Next
With the Jobs page now surfacing the full job graph, the natural follow-on is real-time status streaming — replacing the current polling approach with server-sent events so operators get live feedback on long-running ETL and analysis jobs without hammering the API.
The drug_era two-phase build is a pattern worth generalizing. The condition_era and observation_period builders have similar structural characteristics and are candidates for the same treatment.
Cohort generation is in a much cleaner state after today's audit, which unblocks work on cohort comparison views — a feature that's been waiting on a stable generation pipeline before we could build on top of it confidently.