Skip to main content

Abby 2.0 Phase 4: The Agency Framework — She Gets Hands

· 5 min read
Creator, Parthenon
AI Development Assistant

Abby can now take actions, not just answer questions. "Build me a diabetes cohort" generates a reviewable multi-step plan — create concept sets, define the cohort, generate the patient count — that executes with one click after user approval. Every action is logged with checkpoint data for rollback. Phase 4 adds supervised autonomy with safety rails.

Abby AI assistant

The Problem: Knowledge Without Agency

Through Phases 1-3, Abby became remarkably intelligent. She remembered researchers across sessions, routed complex questions to Claude, traversed OMOP concept hierarchies, and warned about data quality gaps. But she could only talk about things — she couldn't do them.

Ask "Build me a cohort of Type 2 diabetes patients on metformin" and she'd explain how to build one. She'd describe the concept sets you'd need, the inclusion criteria, the observation window. But the researcher still had to manually navigate to the cohort builder, create each concept set, add concepts, define criteria, and generate the cohort.

Phase 4 changes this. Now Abby proposes a plan, the user approves it, and she executes it.


Plan-Confirm-Execute: Supervised Autonomy

The core pattern is simple: propose, confirm, execute.

User: "Build me a cohort of T2DM patients on metformin"

PLAN: Abby decomposes into steps:
1. Create concept set "Diabetes Conditions" (SNOMED 201826 + descendants)
2. Create concept set "Metformin Exposures" (RxNorm + descendants)
3. Create cohort definition with entry + inclusion criteria
4. Generate cohort against data source

CONFIRM: User sees the plan card with Approve/Cancel buttons

EXECUTE: Steps run sequentially, each reporting status in real-time

REPORT: "Created cohort 'T2DM on Metformin' — 2,847 patients matched"

No action executes without explicit user approval. No writes to the database without the user clicking "Approve & Execute."


Tool Registry: Declarative Risk Classification

Every action tool is registered with a risk level that determines the confirmation UX:

ToolRiskConfirmationRollback
create_concept_setMediumPlan previewDelete if unreferenced
create_cohort_definitionMediumPlan previewDelete if ungenerated
generate_cohortMediumPlan previewMark as invalidated
clone_cohortLowPlan previewDelete the clone
compare_cohortsLowAuto-executeN/A (read-only)
export_resultsLowAuto-executeN/A (read-only)

The registry is extensible — Phase 5 will add high-risk tools (modify, delete, execute SQL) with per-step confirmation and advisory locks.


Architecture: Python Orchestrates, Laravel Executes

A critical design decision: Abby's agency tools call the existing Laravel API rather than writing to the database directly. This means:

  • All Laravel validation rules apply (form requests, model validation)
  • All authorization checks apply (Sanctum auth, role-based access)
  • All business logic applies (version incrementing, event dispatching, notification triggering)
  • The same API that the frontend uses — no second path to the data

The AgencyApiClient makes authenticated HTTP calls to http://nginx:80/api/v1/* with the user's Bearer token, ensuring Abby can only do what the user is authorized to do.


Action Audit Trail

Every action is logged to abby_action_log with:

  • User, tool, risk level — who did what, how risky was it
  • Plan — the full multi-step plan that was approved
  • Parameters — exact inputs to the tool
  • Result — what happened (created entity IDs, error messages)
  • Checkpoint data — previous entity state for rollback capability
  • Rolled back flag — whether this action was later undone

Frontend: The Plan Card

The AbbyPlanCard component renders action plans inline in the chat:

  • Each step shows a status icon with color coding (pending, running with pulse animation, completed in teal, failed in red, skipped in gray)
  • Tool names are human-readable (underscores replaced with spaces)
  • Step results and errors are shown inline
  • "Approve & Execute" button in teal, "Cancel" in muted — prominent approval, easy escape

What Shipped

ComponentTestsPurpose
ToolRegistry6Declarative tool definitions with risk levels
PlanEngine7Plan-Confirm-Execute orchestration
ActionLogger4Audit trail with checkpoint/rollback
AgencyApiClientAuthenticated HTTP calls to Laravel API
Concept set tools2Create sets + add items
Cohort tools3Create definitions, generate, clone
Query tools1Compare cohorts, export results
AbbyPlanCardFrontend plan approval UI
Integration tests3End-to-end agency verification

213 tests passing across the Python AI service.


What's Next: Phase 5 — Advanced Agency

Phase 4 gave Abby basic hands. Phase 5 gives her coordination.

DAG-based workflow orchestration will let independent steps run in parallel. High-risk tools (modify, delete, execute SQL) will join the registry with per-step confirmation and PostgreSQL advisory locks. Dry run mode will simulate actions before execution. And workflow templates will let common study designs execute with a single request.