Skip to content

ARC-ADR-052 — Agent Tool-Authorization & Capability Gating

One line: How the platform restricts agent access to dangerous tools and codebase paths dynamically, ensuring that autonomous loops only invoke capabilities explicitly granted by their active goals.


Context and Problem Statement

As untool.ai scales towards continuous self-building loops, multiple autonomous agent swarms will run concurrently to compile code, modify schemas, run system commands, and call external APIs. This high degree of autonomy introduces critical security and stability risks: 1. Unauthorized Tool Use: An agent assigned to write styling code could mistakenly or maliciously invoke dangerous system tools (e.g. running destructive shell scripts or deleting files). 2. Scope Bleed: Agents can modify directories or repositories outside their designated territory, leading to concurrent write conflicts and unauthorized changes. 3. Privilege Escalation: A compromised or misbehaving agent could query connection credentials or database keys that it does not need for its task.

We need a model-driven mechanism to authorize and restrict tool execution at runtime on a per-turn basis, matching the agent's active Goal and its associated CapabilityGrant in the system ontology.


Decision Drivers

  • Least-Privilege by Construction: Agents start with zero tool clearance. Access must be explicitly granted and time-boxed.
  • Dynamic Gating: Tool clearances must adjust dynamically as the agent progresses through the execution loop (e.g. acquiring compiling tools only during the verification phase).
  • Zero-Overhead Enforcement: Authorization checks must run inline at the MCP gateway, adding less than 5ms latency to tool calls.
  • HITL Escalation Seams: Gaps in capability clearances must escalate to Nicky Clarke for review and manual override rather than failing silently or crashing the loop.

Proposed Decision: Least-Privilege Capability Gating

We adopt a runtime Capability Gating architecture enforced directly by the untool MCP gateway.

    [ Builder Agent ] (Requests tool: `runner_execute`)
          │
          ▼
    [ untool MCP Gateway ] (Intercepts call, extracts Caller token)
          │
          ├─────► [ Query Active Session ] (Get active Goal ID & Session ID)
          │
          ├─────► [ Check Clearance in Graph ] ── (Is tool authorized by Goal's CapabilityGrant?)
          │
          ▼
   ┌──────┴──────┐
   │             │
   ▼ [Yes]       ▼ [No]
[Execute Tool]  [Block & Escalate to hitl-steering / Request Grant]

1. The Ontology model (model.yaml additions)

We model capabilities and access grants as first-class ontology entities: * Capability (Kind): The abstract function definition (e.g. write_file_territory, compile_spoke). * ToolOffering (Kind): The specific MCP tool exposing the capability (e.g. runner_build_trigger). * CapabilityGrant (Relator): Reifies the temporary link between a running session, a capability, and the active Goal justifying it. It carries a validTo timestamp (TTL) and optional boundary parameters (e.g. file paths or allowed branches).

2. Inline Gateway Enforcement

  • The MCP gateway intercepts every tool execution request.
  • It extracts the agent session token from the caller metadata.
  • It runs a fast, cached query against the local SQLite board shard (.vfs_board.db / .vfs_staging.db) to verify that the session holds an active CapabilityGrant covering the requested tool and parameters.
  • If authorized, the tool executes. If unauthorized, the gateway blocks execution and returns a 403 Forbidden response.

3. Dynamic Acquisition & HITL Escalation

If an agent realizes it needs a tool it does not have (e.g. a python-pro agent needs to install a new dependency via pip): 1. Request Grant: The agent invokes request_temporary_capability(capability, justification). 2. RTE Evaluation: The Release Train Engineer evaluates the request. If it conforms to safe bounds, it mints a temporary CapabilityGrant. 3. HITL Gating: If the requested tool is classified as high-risk (e.g. network egress or credential reading), the request escalates to Nicky Clarke via a hitl-steering decision artifact. Nicky can click to grant the clearance or redirect the swarm.


Consequences

  • + High Security: Compromised agent loops are isolated; they cannot damage the host system or cross repository boundaries.
  • + Direct Traceability: Every tool execution is cryptographically linked to a Goal and a signed CapabilityGrant, providing a complete audit trail.
  • − Latency: Adds a local query lookup before tool execution (mitigated: SQLite lookup takes < 1ms).
  • − Agent Friction: Agents may frequently halt to request permissions for unexpected sub-tasks (mitigated: core capabilities are pre-granted at goal assignment).