Abby 2.0 Phase 5: Advanced Agency — Parallel Workflows and Safety Rails
Abby can now orchestrate complex multi-step research workflows with independent steps running in parallel. High-risk tools (modify concept sets, update cohort criteria, execute SQL) join the toolkit with safety validation. Dry run mode simulates actions before execution. Workflow templates encode OHDSI best practices into one-click study designs.
What's New
DAG-Based Parallel Execution
Phase 4's Plan-Confirm-Execute ran steps sequentially. Phase 5 introduces a DAG (directed acyclic graph) executor that identifies independent steps and runs them concurrently.
"Build diabetes + metformin cohort with characterization"
↓
Wave 1 (parallel): Create diabetes concept set ║ Create metformin concept set
Wave 2 (sequential): Create cohort definition (depends on both sets)
Wave 3 (sequential): Generate cohort
Wave 4 (sequential): Run characterization
Concept sets have no dependencies on each other — they run simultaneously. The cohort definition waits for both. This reduces a 5-step sequential plan from ~15 seconds to ~10 seconds.
The executor uses Kahn's algorithm (BFS topological sort) to decompose plans into waves. Each wave's steps run via asyncio.gather. If any step fails, all its dependents are automatically skipped with a clear reason.
High-Risk Tools with Safety Validation
Six new tools join the registry, bringing the total to 12:
| Tool | Risk | Safety Mechanism |
|---|---|---|
modify_concept_set | High | Adds/removes concepts via validated API |
modify_cohort_criteria | High | Updates expression JSON via validated API |
execute_sql | High | Regex blocks INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, GRANT, REVOKE, pg_*, COPY |
run_characterization | Medium | Queues via job system |
run_incidence_analysis | Medium | Queues via job system |
schedule_recurring_analysis | High | Creates periodic schedule |
The SQL safety validator blocks 12 categories of dangerous patterns before any query reaches the database. Only read-only SELECT queries pass validation.
Dry Run Mode
For high-risk actions, dry run simulates what WOULD happen without executing:
{
"simulated": true,
"would_create": "concept_set",
"name": "Diabetes Conditions",
"concept_count": 3,
"description": "Would create concept set 'Diabetes Conditions' with 3 concepts"
}
Every tool has a simulation handler that returns tool-specific fields describing the expected outcome.
Workflow Templates
Pre-built study designs encode OHDSI best practices:
- Incident Cohort — condition concept set + optional drug concept set + cohort definition with washout period + generation
- Characterization Study — concept set + cohort definition + generation + characterization analysis queue
Templates generate step lists compatible with the plan engine, so researchers can say "run an incident cohort study for diabetes on metformin" and get a complete, reviewable plan.
What Shipped
| Component | Tests | Purpose |
|---|---|---|
| DAG Executor | 7 | Parallel wave execution with dependency tracking |
| Dry Run Simulator | 5 | Action simulation without side effects |
| Modify tools | 2 | Concept set and cohort criteria modification |
| Analysis tools | 2 | Characterization and incidence analysis queuing |
| SQL tools | 8 | Read-only SQL execution with safety validation |
| Workflow templates | 5 | Pre-built OHDSI study design plans |
| Integration tests | 4 | End-to-end DAG, dry run, safety, template verification |
247 tests passing across the Python AI service.
