ARC-ADR-048 — Local Loopback Runner & Hyperautomation Engine¶
One line: A localized, zero-trust execution sandbox plane utilizing Firecracker microVMs with automated fallback to Docker-in-Docker (DinD) loops for continuous agent-triggered build and smoke verification.
Context and Problem Statement¶
As our autonomous agent armies (Codex, Antigravity, Claude Code) take on broader development tasks—including modifying code, generating DB schema migrations, and rewriting APIs—they must test their changes in a live runtime. Running these tests presents three critical architectural challenges: 1. Security Risk (CWE-269 / Arbitrary Code Execution): Spawning agent-written tests and scripts directly on the host development machine is unsafe. An agent hallucination or malicious payload could corrupt host files, leak API keys, or compromise the system. 2. Architecture Drift & Environment Pollution: Building spokes locally installs dependencies (npm, pip, cargo) on the host machine. If dependencies collide with host setups, builds fail. Additionally, running Linux-native build pipelines on a Windows host fails without virtualization. 3. CI/CD Queue Bottlenecks: Offloading every validation step to remote GitHub-hosted runners introduces high latency (~2-5 minutes per run), incurs cloud cost, and causes context-switching delays in the agent's reasoning loop.
We need a lightweight, secure, and fast local execution environment where the agent can run its compile-verify-correct loops hermetically in sub-seconds.
Decision Drivers¶
- Zero-Trust Isolation: Full hardware-level or container-level isolation between the agent runtime environment and the host developer machine.
- Startup Latency: Sandbox creation, boot, execution, and teardown must complete in under 2-3 seconds to prevent stalling agent loops.
- Developer Platform Compatibility: Must run seamlessly on Linux, macOS, and Windows (via Docker Desktop / WSL2).
- Resource Footprint: Must support running multiple concurrent test loops without draining host memory/CPU.
Considered Options¶
Option A — Native Host Process Execution (Status Quo)¶
Run test commands (npm run build, dotnet build, pytest) directly in shell subprocesses on the host machine.
* Pros:
* No virtualization overhead; runs at maximum native speed.
* Simple to invoke; requires no setup.
* Cons:
* High Risk: Zero security boundaries. Agent execution can compromise the host machine.
* OS Incompatibility: Windows hosts cannot run bash-based scripts or GCC/make commands reliably without WSL2.
* Host Pollution: Installs libraries, writes tmp files, and leaves background processes dangling.
Option B — Docker-in-Docker (DinD) Container Loops¶
Spawn local test runs inside isolated Docker containers, leveraging a localized DinD daemon to manage build stages and run compose services in isolation. * Pros: * Cross-Platform: Works on any host running Docker Desktop (Windows, macOS, Linux). * Spoke Equivalence: Matches production container layouts exactly, making docker-compose smoke tests highly realistic. * Volume Cache Sharing: Allows sharing pip/npm cache folders across runs via Docker volume mounts. * Cons: * Moderate Startup Latency: Container creation and compose boot add ~2-5 seconds. * Heavier Footprint: Docker Desktop VM memory overhead can be substantial when running multiple replicas.
Option C — Firecracker microVM Sandboxes¶
Spin up minimal guest Linux kernels inside KVM-backed microVMs (Firecracker), mounting lightweight read-only Alpine rootfs images. * Pros: * Near-Zero Cold Start: MicroVM boot time is typically under 100-150ms. * Hardware Isolation: KVM provides true hardware-assisted virtualization isolation, far safer than shared-kernel containers. * Ultra-lightweight: Extremely low memory footprint (~5-10 MB per VM thread). * Cons: * WSL2 Nested Virtualization Requirement: Running under Windows requires configuring WSL2 with nested virtualization, which requires manual command-line execution and root privileges. * TAP Networking Setup: Setting up tap0 interfaces and NAT route rules requires root/sudo access on the host. * No Native macOS support: Firecracker depends on KVM (Linux-native).
Decision¶
Adopt a Hybrid Architecture (Option C with fallback to Option B). Expose a unified loopback runner plane to the Agent Army via local MCP tools:
graph TD
Agent["Agent / Swarm Client"] -- "1. Runner Execute Command" --> MCP["Local Runner MCP Tool"]
MCP -- "2. Check KVM / Linux Support" --> Detect{"KVM Available?"}
Detect -- "Yes (Linux/WSL2)" --> Firecracker["Option C: Firecracker microVM sandbox"]
Detect -- "No (macOS/Windows Native)" --> Docker["Option B: Docker-in-Docker ephemeral stack"]
Firecracker -- "Boot and Run" --> Guest["Guest kernel / rootfs.ext4"]
Docker -- "Boot and Run" --> Container["DinD Docker Sandbox"]
1. Hybrid Virtualization Tiering¶
- Fast Path (Firecracker microVMs): Used for lightweight, single-spoke validations, prompt template routing tests, and code lints.
- Boot is handled via
commons-core/scripts/firecracker_harness.py. - Guest filesystem uses Alpine minirootfs mounted as an overlay drive, preventing guest changes from dirtying the master disk.
- Boot is handled via
- Fallback Path (Docker-in-Docker): Used for full-spoke integration tests (Next.js server + Postgres + ArcadeDB) or when running on macOS/Windows native hosts.
- Containers are spawned as ephemeral replicas under
templates/local-docker-runner. - Each job runs in a clean container, then exits to let Docker spawn a fresh, clean instance.
- Containers are spawned as ephemeral replicas under
2. Zero-Trust Local Guest Network Topology¶
All sandboxes run inside an isolated host TAP subnet:
* Host Gateway IP: 172.16.0.1 (TAP interface tap0 or Docker network bridge).
* Guest Sandbox IP: 172.16.0.2 (dynamic DHCP allocation for multiple runs).
* Safety Net Interceptor: Guest VM outbound connections are routed through a NAT masquerade. Sibling databases (Postgres, ArcadeDB, Fuseki) are reachable only if the local coordinator explicitly whitelists the connection path in the configuration, preventing agents from calling out to unapproved servers.
3. Model Context Protocol Tool Interface¶
The local-runner plane is managed through these three MCP tools:
* runner_execute(branch_name, command): Executes a build/lint/test suite on the specified VFS branch inside a sandbox.
* runner_smoke_test(compose_file): Spins up the compose stack and runs the doctor check, returning structured logs.
* runner_status(task_id): Checks job status and parses the logs for exit codes.
Consequences¶
- + Secure Execution: Safe to execute arbitrary agent-generated code; KVM or container namespaces isolate host files and credentials.
- + Sub-Second Feedback: Firecracker's ~100ms startup means lints and compiler checks return almost instantly.
- + Portable Fallback: Keeps the codebase accessible to Windows/macOS native developers via Docker Desktop, while optimization-focused runners benefit from KVM.
- − Hypervisor Dependency: Firecracker requires KVM. WSL2 nested virtualization must be enabled in
C:\Users\<user>\.wslconfigon Windows. - − Multi-Toolchain Maintenance: Requires maintaining both the Alpine rootfs (
make_rootfs.sh) and the Docker runner base (Dockerfile.runner).