Parthenon v1.0.4 — Test Coverage & CI Hardening
v1.0.4 — Test Coverage & CI Hardening
v1.0.4 is the first stabilization release in the v1.0.x arc. No new features — only refinement of what exists. This release fills the test coverage gaps identified during the v1.0.3 launch and hardens the CI pipeline so every subsequent release starts from a trustworthy baseline.
Why stabilization matters
v1.0.3 landed fast. It shipped the installer rewrite, Poseidon + Vulcan, Standard PROs+, Risk Scores v2, the Commons wiki ChromaDB refactor, the Acropolis enterprise installer, Hecate semantic search, and a dozen other modules. By the time the dust settled, the test suite had drifted: 74 PHP test files and 37 Playwright E2E specs covered the older surfaces, but the newer modules had thin unit coverage, zero Vitest tests on the frontend, and several "contract" tests for inter-service communication had been scheduled but never written.
v1.0.4 is the release that pays that debt. Every scope item listed in
ROADMAP.md v1.0.4 is now either shipped or explicitly deferred with a
rationale.
Backend test coverage
- Pest service tests for Achilles, DQD, Cohort, and Analysis services
(shipped in commit
4b3a77c) - 8 database connection integration tests — one per Laravel connection,
each verifying its
search_pathresolves the correct schemas for clinical data, vocabulary, and results (commit4b3a77c) - RBAC enforcement tests on every route group (auth, research, admin,
data pipeline) — these verify that removing a permission actually breaks
the corresponding endpoint (commit
39cfe57) - OpenAPI spec drift detection in CI — generated TypeScript types are now
verified against live API responses so the frontend never diverges from the
backend (commit
4b3a77c) - R Plumber contract tests — a new
DarkstarContractTest.phphits thedarkstarservice/healthendpoint viaconfig('services.darkstar.url'), asserting the response shape:status,service=darkstar, semverversion, thechecks.{packages,jvm,memory_ok,jdbc_driver}dict, and non-emptypackages.ohdsimetadata. The test gracefully skips when darkstar is not running so CI stays green in environments that don't deploy the R sidecar. - Python AI contract tests — a matching
AiServiceContractTest.phphits thepython-aiservice/healthendpoint viaconfig('services.ai.url'), assertingservice=parthenon-ai,llm.provider=ollama, and non-empty model/base_url. Same graceful-skip fallback. - PHPUnit/Pest runtime detection for local test isolation (commit
5a5ff2cd9) — fixes a regression where Redis-backed middleware poisoned the test database transaction.
Python AI test coverage
Three new FastAPI TestClient contract tests land in ai/tests/:
test_abby_router_contract.py— covers/abby/parse-cohortwith a mockedcall_ollamafunction. Verifies the fullCohortParseResponseshape, the 422 validation errors on missing and short prompts, and the fallback shape when the LLM returns unparseable output.test_embeddings_router_contract.py— covers/embeddings/encodeand/embeddings/encode-batchwith a mocked SapBERT service. Verifies the{embedding, model}and{embeddings, model, count}response contracts plus the 422/400 validation guards.test_concept_mapping_router_contract.py— covers/concept-mapping/map-termand/concept-mapping/map-batchwith every strategy singleton (exact, cache, SapBERT, LLM, ranker) patched. Verifies theRankedCandidateshape and the{results, total_time_ms, strategies_used}batch response.
All Python contract tests mock every external dependency (Ollama, SapBERT, pgvector, ChromaDB) so they need no network and no model weights.
Frontend test coverage
Vitest + React Testing Library + jsdom infrastructure was already landed in
vite.config.ts during v1.0.3. v1.0.4 populates it:
- 66 pre-existing frontend test files covering estimation, prediction, SCCS, analyses, evidence synthesis, data-explorer, publish, auth, vocabulary, patient similarity, ingestion, and more.
- 7 new cohort builder component tests —
CohortExpressionEditor,CriteriaGroupEditor,DomainCriteriaSelector,ConceptSetPicker,CohortStatsBar,CohortGenerationPanel, andCohortSqlPreview. Together they exercise the store-backed top-level editor, the nested criteria group depth guard, the seven OMOP domain buttons, the concept set creation flow, the stats bar interaction, and the generation/SQL preview TanStack Query states. 33 tests total. - 5 new concept set editor component tests —
ConceptSetEditor,ConceptSetItemRow,ConceptSetItemDetailExpander,ConceptSetStatsBar,PhoebeRecommendationsPanel. These cover the empty state, row rendering, the three toggle switches (includeDescendants,isExcluded,includeMapped), the four-tab detail expander, and the Phoebe recommendations accept flow. 26 tests total. - 10 dark clinical theme snapshot tests —
Badge,Button,StatusDot,EmptyState,MetricCard,Panel,Tabs,Progress,FilterChip, andFormInput. Each file usestoMatchInlineSnapshot()plus a regex assertion for the theme token (e.g.btn-primary,badge-critical,form-input error). The goal is to fail loudly when someone renames a class, removes a variant, or accidentally swaps the#9B1B30crimson for a different red. 33 snapshots total.
Analysis stores — scope clarification
The v1.0.4 scope called for "unit tests for all Zustand stores (authStore, cohort stores, analysis stores)." Here is the actual status:
authStore,abbyStore, andwikiStorealready have Vitest unit tests (landed during the wiki UX cleanup work).cohortExpressionStorealready has a Vitest unit test fornormalizeCohortExpression.- There is no
analysisStore. Thefrontend/src/features/analyses/module uses TanStack Query (useIncidenceRates,useCharacterizations,useEstimations,usePatientLevelPredictions, etc.) exclusively — all analysis state lives in the server cache, not in client-side Zustand stores.
This scope item is therefore N/A by design. TanStack Query is the analysis feature's state layer, and it is already tested through the feature-level component tests and the OpenAPI drift detection.
CI/CD hardening
- Pre-commit hook now runs Pint + PHPStan + tsc + ESLint + Vitest on
every commit. A silent bug in the hook's ESLint invocation — newline
separators leaking into a
sh -ccommand, causing staged files after the first to be interpreted as shell commands — was fixed as a side effect of this release. - Chroma ingestion mypy types fixed (commit
58a8910a4). - GitHub Actions pipeline: lint → test → build → deploy gate. OpenAPI drift detection runs on every PR.
Not yet shipped in v1.0.4 (deferred to v1.0.5+)
- Playwright cross-browser validation (Chromium, Firefox, WebKit).
- Playwright E2E coverage for GIS Explorer, imaging viewer, and genomics workflows.
These are tracked in the v1.0.5 section of ROADMAP.md.
Upgrade notes
No migrations, no config changes, no breaking API changes. This is pure
test coverage and CI hardening — git pull && ./deploy.sh is sufficient.
New test files are picked up automatically by Vitest, Pest, and pytest
discovery.
By the numbers
- New backend tests: 6 Pest contract tests
- New Python AI tests: 11 FastAPI TestClient contract tests
- New frontend component tests: 33 cohort builder + 26 concept set
- New snapshot tests: 33 across 10 UI primitives
- Total new tests landed in v1.0.4: 109
Contributors
Claude Code + @sudoshi