Skip to content

ARC-ADR-006: Human-in-the-loop confirmation for destructive operations

Metadata

Field Value
ID ARC-ADR-006
Status Proposed
Date 2026-05-25
Deciders Architecture Review
Supersedes
Superseded by
Tags hitl, copilotkit, destructive-ops, rbac, safety, generative-ui

Spoke-authored draft. Referenced as proposed by epic #12 / issue #17 at the hub path docs/decisions/ARC-ADR-006-hitl-destructive-ops.md, but not yet published. This file mirrors that ID/filename so the issue links resolve once upstreamed.


Context and Problem Statement

Phase 3 (issue #17) lets users manage sources through the copilot, including deleting a source. Delete is destructive and irreversible from the user's point of view, and it is being triggered through a natural-language interface where the agent infers intent — a setting prone to misinterpretation ("clean up these" deleting more than intended). We need a pattern that prevents an LLM-mediated request from destroying data without an explicit, informed human action, while keeping authorization authoritative.

Two distinct concerns must not be conflated:

  • Authorization (may this user delete at all?) — owned by backend-core RBAC (admin role), reached via the forwarded JWT (ARC-ADR-002). This is the security boundary.
  • Confirmation (does the user really want this specific deletion now?) — a UX safety gate at the moment of action. This ADR is primarily about the second, layered on the first.

Note this is a different HITL than hub ARC-ADR-001 (HITL decision-point pattern), which is board-level governance for agents escalating decisions to humans across sessions. ARC-ADR-006 is in-app, end-user HITL: an interactive confirmation card in the chat stream.


Decision Drivers

# Driver
D1 A destructive op must never execute on inferred intent alone — it requires an explicit, deliberate human confirmation.
D2 Authorization stays authoritative in backend-core (admin role); the UX gate must not become the security boundary.
D3 The confirmation must clearly state what will be deleted before the user consents (informed consent).
D4 Non-admins must be stopped before being offered a confirmation they cannot fulfill (clear permission-denied, not a dead-end button).
D5 On rejection, the data is preserved and the user is told; on approval, the delete proceeds.
D6 Use a supported CopilotKit primitive so the pause/confirm/resume integrates with agent state.

Considered Options

  1. renderAndWaitForResponse HITL confirmation card, layered on backend RBAC (chosen) — the delete tool's useCopilotAction renders a confirmation card and pauses the agent until the user explicitly confirms or rejects; backend-core still enforces admin.
  2. Backend RBAC only, no UX gate — rely solely on backend-core's admin check; the agent deletes immediately if authorized.
  3. Client-side guard only — a frontend confirmation/window.confirm, with no agent-state pause and trusting the client for gating.
  4. Type-to-confirm modal outside the agent flow — a standard React modal (type the source name) decoupled from CopilotKit's action/state lifecycle.

Decision Outcome

Option 1 — renderAndWaitForResponse confirmation card, layered on backend RBAC is adopted.

The delete source flow:

  1. Pre-check role (D4). Before offering deletion, the UI checks the user's role (from the session/JWT). Non-admins get a clear permission-denied message instead of a confirmation card.
  2. Render + wait (D1, D3). The delete_source action is registered with useCopilotAction({ renderAndWaitForResponse }). When the agent proposes a delete, the chat stream renders a confirmation card naming exactly what will be deleted, with an explicit Confirm button (and Cancel). The agent pausesrespond() is not called until the user acts.
  3. Resolve (D5). Confirm → respond() returns approval, the agent proceeds, and the delete tool runs in middle-core, which calls backend-core. Cancel → respond() returns rejection, the source is preserved, and the user is notified.
  4. Authoritative enforcement (D2). Independent of the card, backend-core's require_principal admin gate is the real authorization check (reached via the forwarded JWT, ARC-ADR-002). The card is a safety gate, not the security boundary: even if the card were bypassed, backend-core still denies non-admins.

Confirmation Criteria

  • Selecting a source and asking to delete it renders the HITL confirmation card via renderAndWaitForResponse; the card states what will be deleted and requires an explicit Confirm press.
  • The agent does not proceed until the user responds; no delete request reaches middle-core before confirmation.
  • A non-admin sees a permission-denied message and is not shown a functional confirmation card.
  • On approval the delete proceeds; on rejection the source still exists and the user is notified.
  • A forged/bypassed client request to delete as a non-admin is still rejected by backend-core (the UX gate is not the security boundary) — verifiable once backend-core is reachable.

Pros and Cons

Option 1 — renderAndWaitForResponse + backend RBAC (chosen)

Pros:

  • Defense in depth: explicit human consent (D1/D3) and authoritative backend authorization (D2) — neither alone, both together.
  • renderAndWaitForResponse is CopilotKit's first-party HITL primitive; the pause/resume is wired into agent state, so the agent genuinely waits (D6).
  • Role pre-check gives non-admins a clean message rather than a button that errors (D4).

Cons:

  • More implementation than a plain modal: action wiring, agent-state pause, and the role pre-check must all be correct.
  • The role used for the pre-check is a UX hint; it must never be treated as the authorization decision (that stays in backend-core) — a discipline the team must hold.

Option 2 — Backend RBAC only, no UX gate

Pros:

  • Simplest; backend already enforces admin.

Cons:

  • Violates D1: an admin's misinterpreted natural-language request deletes data with no confirmation — exactly the irreversible-mistake risk of an LLM interface.
  • No informed-consent step (D3); poor, dangerous UX for destructive actions.

Option 3 — Client-side guard only

Pros:

  • Easy; a window.confirm or local modal blocks accidental clicks.

Cons:

  • If treated as the gate, violates D2 (client-trusted authorization is no authorization).
  • Not integrated with agent state (D6): the agent may continue or the tool may fire regardless of the dialog; fragile in a streaming agent flow.

Option 4 — Type-to-confirm modal outside the agent flow

Pros:

  • Strong friction (typing the name) reduces accidental deletes.

Cons:

  • Decoupled from CopilotKit's action/respond() lifecycle (D6): the agent isn't actually paused/resumed by the modal, so the conversational flow and the modal can desync.
  • Reinvents a primitive CopilotKit already provides via renderAndWaitForResponse.

Positive Consequences

  • Destructive operations get explicit, informed, human approval at the point of action while backend-core remains the single authoritative authorizer — satisfying both safety and security goals.
  • Establishes a reusable HITL pattern for any future destructive tool (not just delete source).

Negative Consequences

  • Adds interaction friction to deletes (intended), and adds implementation complexity to the Phase 3 action wiring.
  • Tempting failure mode: treating the client-side role pre-check as the authorization decision. Must be documented and reviewed to keep backend-core authoritative.

Implementation Notes

  • Register delete_source with useCopilotAction({ renderAndWaitForResponse }); the render function receives args (what will be deleted), status, and respond; do not call respond() until the user clicks Confirm/Cancel.
  • Gate the offer of deletion on role (from session/JWT) for UX (D4); gate the execution on backend-core admin RBAC (D2). These are two separate checks with two purposes.
  • Card copy must name the specific source(s) and be unambiguous about irreversibility (D3).
  • Relates to but is distinct from hub ARC-ADR-001: 001 = agent→human board governance HITL; 006 = end-user in-app destructive-op confirmation. Cross-link them to avoid confusion.

  • Depends on: ARC-ADR-007 (React/CopilotKit app required for renderAndWaitForResponse); ARC-ADR-002 (forwarded JWT carries the role backend-core enforces).
  • Distinct from: hub ARC-ADR-001 (HITL decision-point pattern — agent/board governance).
  • Relates to: epic #12, issue #17 (Phase 3 source management + HITL delete); hub plan docs/plans/copilotkit-generative-ui.md ("deletes need admin + HITL confirmation").

Caveats

  • The authoritative backend denial path can only be fully tested with backend-core + middle-core running (private repos). frontend-core verifies the card/pause/resume behavior and the non-admin pre-check against mocks.

Revision History

Version Date Author Change
0.1 2026-05-25 Architecture Review Initial proposal (spoke draft)