Abby 2.0 Phase 4: The Agency Framework — She Gets Hands

March 17, 2026 · 5 min read

Creator, Parthenon

AI Development Assistant

Abby can now take actions, not just answer questions. "Build me a diabetes cohort" generates a reviewable multi-step plan — create concept sets, define the cohort, generate the patient count — that executes with one click after user approval. Every action is logged with checkpoint data for rollback. Phase 4 adds supervised autonomy with safety rails.

The Problem: Knowledge Without Agency

Through Phases 1-3, Abby became remarkably intelligent. She remembered researchers across sessions, routed complex questions to Claude, traversed OMOP concept hierarchies, and warned about data quality gaps. But she could only talk about things — she couldn't do them.

Ask "Build me a cohort of Type 2 diabetes patients on metformin" and she'd explain how to build one. She'd describe the concept sets you'd need, the inclusion criteria, the observation window. But the researcher still had to manually navigate to the cohort builder, create each concept set, add concepts, define criteria, and generate the cohort.

Phase 4 changes this. Now Abby proposes a plan, the user approves it, and she executes it.

Plan-Confirm-Execute: Supervised Autonomy

The core pattern is simple: propose, confirm, execute.

User: "Build me a cohort of T2DM patients on metformin"
    ↓
PLAN: Abby decomposes into steps:
  1. Create concept set "Diabetes Conditions" (SNOMED 201826 + descendants)
  2. Create concept set "Metformin Exposures" (RxNorm + descendants)
  3. Create cohort definition with entry + inclusion criteria
  4. Generate cohort against data source
    ↓
CONFIRM: User sees the plan card with Approve/Cancel buttons
    ↓
EXECUTE: Steps run sequentially, each reporting status in real-time
    ↓
REPORT: "Created cohort 'T2DM on Metformin' — 2,847 patients matched"

No action executes without explicit user approval. No writes to the database without the user clicking "Approve & Execute."

Tool Registry: Declarative Risk Classification

Every action tool is registered with a risk level that determines the confirmation UX:

Tool	Risk	Confirmation	Rollback
`create_concept_set`	Medium	Plan preview	Delete if unreferenced
`create_cohort_definition`	Medium	Plan preview	Delete if ungenerated
`generate_cohort`	Medium	Plan preview	Mark as invalidated
`clone_cohort`	Low	Plan preview	Delete the clone
`compare_cohorts`	Low	Auto-execute	N/A (read-only)
`export_results`	Low	Auto-execute	N/A (read-only)

The registry is extensible — Phase 5 will add high-risk tools (modify, delete, execute SQL) with per-step confirmation and advisory locks.

Architecture: Python Orchestrates, Laravel Executes

A critical design decision: Abby's agency tools call the existing Laravel API rather than writing to the database directly. This means:

All Laravel validation rules apply (form requests, model validation)
All authorization checks apply (Sanctum auth, role-based access)
All business logic applies (version incrementing, event dispatching, notification triggering)
The same API that the frontend uses — no second path to the data

The AgencyApiClient makes authenticated HTTP calls to http://nginx:80/api/v1/* with the user's Bearer token, ensuring Abby can only do what the user is authorized to do.

Action Audit Trail

Every action is logged to abby_action_log with:

User, tool, risk level — who did what, how risky was it
Plan — the full multi-step plan that was approved
Parameters — exact inputs to the tool
Result — what happened (created entity IDs, error messages)
Checkpoint data — previous entity state for rollback capability
Rolled back flag — whether this action was later undone

Frontend: The Plan Card

The AbbyPlanCard component renders action plans inline in the chat:

Each step shows a status icon with color coding (pending, running with pulse animation, completed in teal, failed in red, skipped in gray)
Tool names are human-readable (underscores replaced with spaces)
Step results and errors are shown inline
"Approve & Execute" button in teal, "Cancel" in muted — prominent approval, easy escape

What Shipped

Component	Tests	Purpose
`ToolRegistry`	6	Declarative tool definitions with risk levels
`PlanEngine`	7	Plan-Confirm-Execute orchestration
`ActionLogger`	4	Audit trail with checkpoint/rollback
`AgencyApiClient`	—	Authenticated HTTP calls to Laravel API
Concept set tools	2	Create sets + add items
Cohort tools	3	Create definitions, generate, clone
Query tools	1	Compare cohorts, export results
`AbbyPlanCard`	—	Frontend plan approval UI
Integration tests	3	End-to-end agency verification

213 tests passing across the Python AI service.

What's Next: Phase 5 — Advanced Agency

Phase 4 gave Abby basic hands. Phase 5 gives her coordination.

DAG-based workflow orchestration will let independent steps run in parallel. High-risk tools (modify, delete, execute SQL) will join the registry with per-step confirmation and PostgreSQL advisory locks. Dry run mode will simulate actions before execution. And workflow templates will let common study designs execute with a single request.

The Problem: Knowledge Without Agency​

Plan-Confirm-Execute: Supervised Autonomy​

Tool Registry: Declarative Risk Classification​

Architecture: Python Orchestrates, Laravel Executes​

Action Audit Trail​

Frontend: The Plan Card​

What Shipped​

What's Next: Phase 5 — Advanced Agency​