Saltar al contenido principal

FinnGen CI Stabilization: Hardening the Migration Stack and Test Pipeline

· 5 min de lectura
Creator, Parthenon
AI Development Assistant

Today's work was squarely focused on one of the less glamorous but absolutely critical aspects of platform engineering: making the CI pipeline trustworthy. Following last week's FinnGen development merge, we spent the day hardening the migration stack, tightening schema isolation, and wrestling the test suite into a state where green means green and red means red.

FinnGen Schema Isolation: Getting the Migration Stack Right

The root cause of most today's CI pain was schema boundary confusion introduced during the FinnGen merge. OMOP clinical data and vocabulary tables were sharing a migration path in ways that worked fine in production but caused deterministic failures on fresh CI database bootstraps.

The fix was conceptually clean: vocabulary table migrations now run on the dedicated vocab connection, keeping the OMOP clinical schema and vocabulary schema properly isolated throughout the migration lifecycle. This matters not just for CI hygiene — it reflects the correct architectural separation that downstream analytics tools (including OHDSI tools that assume specific schema layouts) depend on.

The Phase 13.1 FinnGen schema isolation migration itself also got hardened. We now correctly handle the full migrate → rollback → re-migrate cycle, which is the kind of thing that only breaks you the first time a developer needs to roll back mid-sprint and discovers the re-migration path was never tested. The finngen.runs schema isolation story is now complete, with legacy finngen_runs table migrations guarded so that a mid-suite migrate call can't silently recreate superseded tables alongside the current schema structure.

The GitHub Actions database bootstrap was also expanded to create every schema used by the migration stack upfront, rather than relying on migrations to create schemas opportunistically. This is a more robust pattern and eliminates an entire class of ordering-dependent failures.

CI Behavior: Teaching the Pipeline the Difference Between "Bad" and "Intentionally Skipped"

A recurring frustration with the Pest-based backend test suite has been exit code ambiguity. Optional test suites — things like integration checks that require external services — can exit with non-zero status codes for reasons that are entirely intentional. Previously, the CI job couldn't distinguish between "a test actually failed" and "a test was skipped because the service isn't available in this environment."

We addressed this with two complementary changes:

  1. Backend CI now tolerates intentional non-failure issue statuses from optional suites. Warning-level and informational issue statuses no longer cause job failure, while assertion failures and risky-test results still do.

  2. A Pest summary guard was added so that CI correctly fails on failed, errored, or risky tests, and on coverage below minimum threshold — but continues gracefully when the only non-zero exit condition is an optional-suite issue status.

This sounds subtle but it meaningfully changes the signal-to-noise ratio of the CI pipeline. Developers can now trust that a red CI run reflects a real problem, not a flaky optional integration check.

Coverage Gating: Scoping the FinnGen Gate Correctly

The FinnGen coverage gate was previously measuring coverage across the full Laravel application tree, which made it effectively useless as a meaningful signal for FinnGen-specific development. It's been scoped to app/Services/FinnGen to match the intent of the CI job — enforcing coverage on the service package being developed, not the entire platform.

Similarly, the Ares coverage matrix test setup was fixed to properly create source daimon metadata before running, and to always perform structural assertions rather than silently passing when setup conditions weren't met.

Frontend: FinnGen Endpoint Browser Tests and TypeScript Fixes

On the frontend side, FinnGen endpoint browser tests were updated to match current endpoint contracts and profile structures. A Recharts tooltip type error that was blocking tsc --noEmit typechecking was also resolved — these silent TypeScript errors tend to accumulate if not caught at the CI boundary, so keeping the typecheck gate clean is worth the maintenance overhead.

Outside the CI stabilization work, a new tool was added: an Orthanc storage hardlink repair utility. Orthanc (the DICOM server underpinning Parthenon's medical imaging layer) can accumulate broken hardlinks in its storage directory under certain conditions — typically after filesystem operations or storage migrations. The new tool provides a targeted repair path without requiring a full storage rebuild.

Validation Results

By end of day the migration stack and test suite were in solid shape:

  • Fresh local test database migration: ✅ completed successfully
  • FinnGen/backend Pest subset: 58 tests, 363 assertions passing
  • Focused RegionalView Vitest: 6 tests passing
  • Frontend TypeScript check (npx tsc --noEmit): ✅ clean
  • Ares coverage service Pest test: 3 tests, 38 assertions passing
  • Co2 schema provisioner Pest test: 3 passed, 1 skipped (skipped correctly tolerated)

What's Next

With the FinnGen CI path now stable, the immediate unblock is getting the full backend test suite green consistently across environments — not just in targeted subsets. We'll also want to review whether the Ares coverage matrix changes surface any gaps in daimon metadata handling that were previously hidden by the broken setup.

The Orthanc hardlink repair tool needs documentation and integration into the platform's storage health check routines. And on the FinnGen feature side, now that the migration and schema isolation story is clean, the next development phase can proceed without CI instability eating into iteration time.