Saltar al contenido principal

Abby Study Design Compiler Ships: Accessibility, Refactors, and Production Hardening

· 5 min de lectura
Creator, Parthenon
AI Development Assistant

A landmark day for the Parthenon platform: the Abby Study Design Compiler crossed the finish line and landed in production. Alongside that headline feature, we completed a deep structural refactor of the Study Workbench, patched a collection of critical runtime bugs, and hardened the frontend with accessibility improvements and unsaved-changes guards. Phase 19 smoke testing confirmed everything holds together against the live DEV environment.

Abby Study Design Compiler: The Big Ship

The centerpiece of today's work is the fully deployed Abby-mediated Study Design Compiler — a shift in how Parthenon handles AI-assisted study authoring. Rather than a free-form AI writing surface, the Study Designer now operates as a compiler-grade, user-reviewed workflow. Abby is the product-facing harness and guide; the heavy lifting runs through local MedGemma 27B on Ollama for bounded compiler guidance, tool planning, and safe review language. Claude/Anthropic is available strictly through a scoped protocol-evaluation path behind a dedicated cloud-evaluation feature flag.

The new backend service layer is substantial:

  • StudyDesignAbbyOrchestrator — top-level workflow coordinator
  • StudyDesignOllamaClient / StudyDesignClaudeClient — model-specific adapters
  • StudyDesignContextBuilder — assembles structured context from study state
  • StudyDesignToolRunner — executes compiler tool calls
  • StudyDesignGuidanceService — surfaces human-readable guidance at each compiler stage
  • StudyDesignStructuredOutputSchemas — a named schema catalog covering protocol extraction, compiler guidance, phenotype recommendation, concept-set drafts, cohort drafts, analysis-plan drafts, asset repair suggestions, and package-manifest review

Protocol uploads inside an existing design session now create a new version, populate the workbench, and keep the user in the Intent Review panel and downstream compiler stages — no jarring redirects. Standalone protocol intake still creates a new study container and routes to that study's design tab, which is the correct behavior for that distinct flow.

The extraction layer surfaces evidence spans, field-level confidence scores, overall confidence, uncertainty notes, design assumptions, and initial-gate issue reporting for protocols that don't clear the adequacy threshold. Users see exactly how confident the compiler is in each extracted field and why — a meaningful improvement in interpretability over opaque AI suggestions.

Study Workbench: Structural Refactor Complete

With the compiler in place, we also completed a multi-commit refactor of the Study Workbench that had been accumulating scope. The old monolithic component has been decomposed into focused, independently maintainable panels:

  • Top-level panels: IntentReview, BottomUpCompatibility, Feasibility, AnalysisPlan
  • Leaf panels: Phenotype, ConceptSet, Cohort, StudyCompilerGuidance
  • Shared infrastructure: atoms and helper utilities extracted to workbench/

This decomposition dramatically improves readability and sets up clean boundaries for future feature work on individual compiler stages. Each panel owns its own state slice and renders independently, which will matter as we add per-panel loading and error states.

A long-overdue cleanup also landed today: StudyDesigner.tsx (1,380 lines of dead code) was deleted (a5bc69925). It had been orphaned by the new architecture and was causing confusion about which file was authoritative. Gone.

Bug Fixes: Runtime Reliability

Several production-quality fixes landed today:

  • Lock-race guard + dirty-form unsaved-changes warning (fb8535738): The workbench now prevents concurrent lock acquisition races and warns users before navigating away from unsaved form state. Both issues were silent data-loss vectors.
  • NaN concept_id, ensureSession dedupe, mutation error banner, search error catch (d10799c5d): Four distinct runtime issues cleaned up — invalid numeric coercion on concept IDs, duplicate session initialization calls, missing error feedback on mutation failures, and unhandled rejections in the concept search path.
  • Error panels, score NaN, Recommend tab dead UI (7d2958f0c): The Recommend tab was rendering a non-functional UI state; score values were coercing to NaN in display; error states weren't surfacing to the user. All addressed.

Accessibility: WAI-ARIA Tablist + Async Live Regions

The Study Designer tab navigation now meets WAI-ARIA tablist spec (4b6600d20): full keyboard support via arrow keys, correct role="tablist" / role="tab" / role="tabpanel" markup, and aria-live regions for async result updates. This means screen reader users get announced feedback when compiler results load — important for a workflow that involves multiple asynchronous AI calls. This brings the Study Designer into compliance with our accessibility baseline.

Infrastructure: Loopback Binding for Study Agent

A small but security-relevant infra change: the study-agent host port is now bound to 127.0.0.1 (loopback only) rather than 0.0.0.0 (0cc627b1a). This prevents the agent port from being exposed on non-loopback interfaces in development and staging environments, closing an inadvertent network exposure.

Phase 19 Smoke Testing

Automated smoke tests for Phase 19, Task 2 (afa17827c) ran against the live DEV Parthenon instance and passed. With the compiler, refactor, and bug fixes all landing in the same window, having the smoke suite green gives us confidence the integrated system is stable before we move into wider QA.

What's Next

With the compiler architecture in place and the workbench decomposed into clean panels, the immediate priorities are:

  1. Per-panel progressive loading and granular error recovery — now that panels are independent components, we can give each one its own loading skeleton and retry path.
  2. Confidence threshold configuration — the initial-gate issue reporting is working, but teams need a way to tune adequacy thresholds for their protocol types.
  3. Package manifest review UX — the structured schema is wired; the review UI in StudyCompilerGuidance needs a first-class display surface.
  4. Expanded Phase 19 test coverage — smoke tests are green, but we want integration tests covering the full compiler pipeline end-to-end.

A dense, high-quality day. The compiler is live, the workbench is clean, and the platform is more accessible and reliable than it was this morning.