ARC-ADR-049 — Harness-Isolated Agent Event Bridge¶
One line: A platform harness wrapping sandboxed microVMs and containers that translates system-level webhooks and NATS events into file-based, prompt-injected, or virtual socket (VSOCK) context updates for both non-SDK and SDK-based coding agents.
Context and Problem Statement¶
In our localized hyperautomation architecture, coding agents (such as Codex, Claude Code, and Antigravity) run inside isolated, short-lived sandboxes—specifically KVM-backed Firecracker microVMs or Docker-in-Docker containers.
When a system-level event occurs (such as a "new platform feature announced", "peer agent acquired VFS path lock", or "background build verification finished"), we need to reactively notify the active agent swarms. However, doing so presents major security and architectural constraints: 1. Blocked Inbound Ports (Zero-Trust): Sandboxed microVMs and guest containers must run with fully locked-down inbound firewalls. Opening public ports to listen for webhooks inside guest virtual machines breaches security boundaries. 2. Ephemeral Lifespans: MicroVMs are frequently spawned, execute a single command or lint, and tear down immediately. They are not persistent listeners. 3. Third-Party Agent Compatibility: Consumers bring proprietary or pre-built agents that do not implement our NATS client or custom SDK. We must notify these agents without requiring them to compile against our library.
We need a secure, non-intrusive event bridge that routes host-level webhook and NATS events into microVM guest environments.
Decision Drivers¶
- Zero Inbound Guest Ports: No inbound HTTP listeners or TCP connections allowed inside the microVM/container guest boundary.
- Support for Non-SDK Agents: Works out-of-the-box for raw, un-instrumented agents by translating events into workspace mutations.
- Sub-Millisecond Delivery: Low-overhead routing for active, long-lived agent sessions.
- Resource Efficiency: Minimizes background sockets and TCP handshakes on the host.
Considered Options¶
Option A — Guest Network webhook Receivers¶
Assign each guest microVM a dynamic TAP IP and route external/host webhooks directly to a port listening inside the guest. * Pros: * Standard HTTP webhooks; no translation layer needed inside the host. * Cons: * High Risk: Exposes guest ports to potential host-network attacks. * Complexity: Requires heavy host NAT routing rules and dynamic IP management for short-lived sessions.
Option B — Filesystem-Based Event Mirroring (Mounted VFS)¶
The host runner daemon runs as the central NATS and webhook listener. When an event arrives, the host writes it to a VFS-mounted directory shared with the guest (e.g., work/.untool/inbox.json or appending to work/task.md).
* Pros:
* Zero Guest Port requirement: Fully relies on filesystem read access.
* Non-SDK Ready: Non-SDK agents naturally read these workspace files during compile-verify loops.
* Cons:
* I/O Latency: Polling filesystems introduces latency unless filesystem events (inotify) are wired.
Option C — VSOCK / Unix Domain Socket Bridge (Virtual Socket Proxy)¶
MicroVMs communicate with the host hypervisor using Firecracker's virtual sockets (vsock). The host runs an event proxy listening on vsock CID 2, and the guest connects out to stream MCP messages. * Pros: * Fastest Transport: Sub-millisecond latency using VM socket bypass. * Highly Secure: Vsock requires no network routing or TCP/IP stack configuration. * Cons: * Requires vsock drivers configured inside the guest Alpine kernel. * Only works for virtualized environments (Firecracker); does not map directly to native Windows/Docker fallback.
Decision¶
Adopt a Hybrid Harness Event Bridge (Option B and Option C combined). Expose a unified event bridge that leverages filesystem injection for boot configuration and non-SDK fallbacks, and vsock/TCP MCP stream forwarding for active, SDK-enabled agents:
graph TD
NATS["NATS JetStream / Webhook Gateway"] -- "1. Egress Event" --> HostDaemon["Host Platform Harness (Daemon)"]
subgraph Host Hypervisor Plane
HostDaemon -- "A. Staged Write" --> VFS["Mounted Workspace VFS (.untool/inbox.json)"]
HostDaemon -- "B. Event Stream" --> VsockProxy["Host VSOCK / TCP Proxy"]
end
subgraph Guest Sandbox (MicroVM / Container)
VFS -- "Read at start / inotify" --> NonSDKAgent["Non-SDK Agent Loop"]
VsockProxy -- "Outbound Stream (vsock / TCP)" --> SDKAgent["SDK Agent / MCP Client"]
end
1. Boot-Time Prompt and Environment Injection¶
When the host daemon spawns a microVM/container, it injects the current platform capability snapshot before the guest processes start:
1. Boot Args / Env: Mounts active feature flags as environment variables (e.g., UNTOOL_CAP_LIST=hvfs,messaging).
2. Workspace Seed: Writes the latest 10 events from platform-capability-log.feed.json into .untool/inbox.json on the mounted workspace loop before starting the guest execution thread.
3. Runtime Event Proxying (vsock/TAP for SDK-Agents)¶
For long-running guest agents:
* The guest SDK connects outbound to the host-side vsock CID 2 (or the 172.16.0.1 gateway port if falling back to Docker TCP).
* The connection establishes an outbound MCP client-server connection.
* The host server publishes events (like NATS platform.capability.changed) down this stream, triggering the agent's reactive listeners.
4. Filesystem Watcher (Fallback for Non-SDK Agents)¶
For raw agents (e.g., un-instrumented CLI tools):
* The host daemon writes events dynamically to the VFS-mounted directory.
* Because the agent reads files like task.md or source files continuously during its execution loop, the harness updates the file contents (e.g. appending a comment <!-- UNTOOL_EVENT: new capability ... -->), allowing the agent to parse it on its next file read.
Consequences¶
- + Impervious guest security: Zero guest network listening ports are exposed; all network traffic is either outbound or filesystem-level.
- + Dynamic onboarding: Non-SDK agents receive platform changes instantly via prompt injection and workspace updates.
- + MicroVM-Docker Parity: Vsock handles high-performance microVM routing, while TCP loopback proxies provide a seamless fallback under Docker container groups.
- − File polling overhead: Appending to workspace files requires robust write-staging lock coordinates to prevent swarm file write conflicts.
- − Multi-harness footprint: The host daemon must continuously run and maintain the NATS/Webhook listener on the developer's localhost.