Abby 2.0 Phase 4: The Agency Framework — She Gets Hands
Abby can now take actions, not just answer questions. "Build me a diabetes cohort" generates a reviewable multi-step plan — create concept sets, define the cohort, generate the patient count — that executes with one click after user approval. Every action is logged with checkpoint data for rollback. Phase 4 adds supervised autonomy with safety rails.
The Problem: Knowledge Without Agency
Through Phases 1-3, Abby became remarkably intelligent. She remembered researchers across sessions, routed complex questions to Claude, traversed OMOP concept hierarchies, and warned about data quality gaps. But she could only talk about things — she couldn't do them.
Ask "Build me a cohort of Type 2 diabetes patients on metformin" and she'd explain how to build one. She'd describe the concept sets you'd need, the inclusion criteria, the observation window. But the researcher still had to manually navigate to the cohort builder, create each concept set, add concepts, define criteria, and generate the cohort.
Phase 4 changes this. Now Abby proposes a plan, the user approves it, and she executes it.
Plan-Confirm-Execute: Supervised Autonomy
The core pattern is simple: propose, confirm, execute.
User: "Build me a cohort of T2DM patients on metformin"
↓
PLAN: Abby decomposes into steps:
1. Create concept set "Diabetes Conditions" (SNOMED 201826 + descendants)
2. Create concept set "Metformin Exposures" (RxNorm + descendants)
3. Create cohort definition with entry + inclusion criteria
4. Generate cohort against data source
↓
CONFIRM: User sees the plan card with Approve/Cancel buttons
↓
EXECUTE: Steps run sequentially, each reporting status in real-time
↓
REPORT: "Created cohort 'T2DM on Metformin' — 2,847 patients matched"
No action executes without explicit user approval. No writes to the database without the user clicking "Approve & Execute."
Tool Registry: Declarative Risk Classification
Every action tool is registered with a risk level that determines the confirmation UX:
| Tool | Risk | Confirmation | Rollback |
|---|---|---|---|
create_concept_set | Medium | Plan preview | Delete if unreferenced |
create_cohort_definition | Medium | Plan preview | Delete if ungenerated |
generate_cohort | Medium | Plan preview | Mark as invalidated |
clone_cohort | Low | Plan preview | Delete the clone |
compare_cohorts | Low | Auto-execute | N/A (read-only) |
export_results | Low | Auto-execute | N/A (read-only) |
The registry is extensible — Phase 5 will add high-risk tools (modify, delete, execute SQL) with per-step confirmation and advisory locks.
Architecture: Python Orchestrates, Laravel Executes
A critical design decision: Abby's agency tools call the existing Laravel API rather than writing to the database directly. This means:
- All Laravel validation rules apply (form requests, model validation)
- All authorization checks apply (Sanctum auth, role-based access)
- All business logic applies (version incrementing, event dispatching, notification triggering)
- The same API that the frontend uses — no second path to the data
The AgencyApiClient makes authenticated HTTP calls to http://nginx:80/api/v1/* with the user's Bearer token, ensuring Abby can only do what the user is authorized to do.
Action Audit Trail
Every action is logged to abby_action_log with:
- User, tool, risk level — who did what, how risky was it
- Plan — the full multi-step plan that was approved
- Parameters — exact inputs to the tool
- Result — what happened (created entity IDs, error messages)
- Checkpoint data — previous entity state for rollback capability
- Rolled back flag — whether this action was later undone
Frontend: The Plan Card
The AbbyPlanCard component renders action plans inline in the chat:
- Each step shows a status icon with color coding (pending, running with pulse animation, completed in teal, failed in red, skipped in gray)
- Tool names are human-readable (underscores replaced with spaces)
- Step results and errors are shown inline
- "Approve & Execute" button in teal, "Cancel" in muted — prominent approval, easy escape
What Shipped
| Component | Tests | Purpose |
|---|---|---|
ToolRegistry | 6 | Declarative tool definitions with risk levels |
PlanEngine | 7 | Plan-Confirm-Execute orchestration |
ActionLogger | 4 | Audit trail with checkpoint/rollback |
AgencyApiClient | — | Authenticated HTTP calls to Laravel API |
| Concept set tools | 2 | Create sets + add items |
| Cohort tools | 3 | Create definitions, generate, clone |
| Query tools | 1 | Compare cohorts, export results |
AbbyPlanCard | — | Frontend plan approval UI |
| Integration tests | 3 | End-to-end agency verification |
213 tests passing across the Python AI service.
What's Next: Phase 5 — Advanced Agency
Phase 4 gave Abby basic hands. Phase 5 gives her coordination.
DAG-based workflow orchestration will let independent steps run in parallel. High-risk tools (modify, delete, execute SQL) will join the registry with per-step confirmation and PostgreSQL advisory locks. Dry run mode will simulate actions before execution. And workflow templates will let common study designs execute with a single request.
