Proposed Endpoints Tracing Specification: Data Platform & Coder Executor¶
This document defines the OpenTelemetry tracing contracts, span structures, semantic conventions, and context propagation maps for newly proposed endpoints in the Data Platform and Coder Executor components.
1. Overview & Architecture¶
To achieve end-to-end observability across the untool.ai platform swarm, traces must propagate seamlessly across orchestration services down to ephemeral compute boundaries.
graph TD
MC[middle-core orchestrator] -->|HTTP/gRPC with traceparent| DP[mcr-f4 Data Platform]
MC -->|HTTP/gRPC with traceparent| EX[agent-sdk-executor Coder VM]
DP -->|OTLP/gRPC| OC[downstream OTel Collector]
EX -->|OTLP/gRPC| OC
2. Data Platform (mcr-f4) Span Specifications¶
The Data Platform exposes interfaces for querying and projecting ontology-driven Object Models. The telemetry contracts enforce tracing of schema versions and logical clock synchronization.
2.1 Span: dataplatform.query¶
Triggered during search and retrieval operations over materialized semantic databases (e.g. ArcadeDB, Fuseki).
- Semantic Conventions & Attributes:
| Attribute | Type | Description | Example |
| :--- | :--- | :--- | :--- |
|
dataplatform.query.filters| String | JSON-serialized query filter terms |{"type": "Feature", "status": "active"}| |dataplatform.query.engine| String | Database backend executing the query |ArcadeDB| |schema.version| String | Semantic ontology version queried |1.4.2-alpha| |logical_clock.time| Integer | Lamport / Vector logical clock integer value |42| |logical_clock.agent_id| String | Callsign of the requesting agent |Signal|
2.2 Span: dataplatform.project¶
Triggered when projecting an ontology corpus to a specific layer's target Object Model.
- Semantic Conventions & Attributes:
| Attribute | Type | Description | Example |
| :--- | :--- | :--- | :--- |
|
dataplatform.projection.source| String | Source ontology graph namespace |ontology/platform-self-model| |dataplatform.projection.target_lang| String | Target codebase language generated |typescript| |schema.version| String | Version of the schema being projected |1.4.2-alpha| |logical_clock.time| Integer | Logical clock at time of projection |43|
3. Coder Executor (agent-sdk-executor) Span Specifications¶
The Coder Executor manages ephemeral microVMs executing coding subagent tasks. Traces must capture resource utilization, model usage billing, and git side-effects.
3.1 Span: executor.bootstrap¶
Tracks ephemeral VM sandbox allocation, network provisioning, and tool mounting.
- Semantic Conventions & Attributes:
| Attribute | Type | Description | Example |
| :--- | :--- | :--- | :--- |
|
executor.vm_id| String | Unique identifier of the created VM container |vm-9a8b7c6d-5e4f| |executor.sandbox.type| String | Type of isolation layer |microvm-firecracker| |executor.bootstrap.duration_ms| Integer | Total VM boot and provision latency |1250|
3.2 Span: executor.run¶
Spanned during execution of user goals/tasks in the isolated environment.
- Semantic Conventions & Attributes:
| Attribute | Type | Description | Example |
| :--- | :--- | :--- | :--- |
|
executor.vm_id| String | Target VM container identifier |vm-9a8b7c6d-5e4f| |executor.budget.limit_usd| Double | Maximum allocated execution budget in USD |0.50| |executor.budget.used_usd| Double | Actual dollar cost consumed |0.12| |executor.tokens.prompt| Integer | LLM prompt tokens consumed during execution |4096| |executor.tokens.completion| Integer | LLM completion tokens generated |512| |executor.tokens.total| Integer | Total prompt + completion tokens |4608| |executor.model.name| String | The model backing the executor |gemini-1.5-pro|
3.3 Span: executor.teardown¶
Tracks VM cleanup, file extraction, and workspace Git outcomes.
- Semantic Conventions & Attributes:
| Attribute | Type | Description | Example |
| :--- | :--- | :--- | :--- |
|
executor.vm_id| String | Target VM container identifier |vm-9a8b7c6d-5e4f| |executor.git.push_success| Boolean | Whether changes were successfully pushed to VCS |true| |executor.git.pr_created| Boolean | Whether a PR was submitted |true| |executor.git.pr_url| String | URL of the generated Pull Request |https://github.com/nickpclarke/backend-core/pull/42|
4. Context Propagation Map¶
To preserve trace continuity across network and process boundaries (Orchestrator $\rightarrow$ Executor microVM), the W3C traceparent HTTP header MUST be propagated.
Header Name: traceparent
Format: 00-{trace_id}-{parent_id}-{trace_flags}
- version: 2 hex chars (currently "00")
- trace_id: 32 hex chars (unique trace ID)
- parent_id: 16 hex chars (span ID of caller)
- trace_flags: 2 hex chars (e.g., "01" indicating sampled)
Context Injection/Extraction Flow¶
-
Inject: The
middle-coreorchestrator generates or inherits atraceparentcontext. When calling Coder Executor API (e.g./run), the HTTP client injects it:POST /run HTTP/1.1 Host: executor-api.local traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 -
Extract: The Coder Executor API extracts
traceparentfrom headers and sets it as the parent context ofexecutor.run:from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator carrier = {"traceparent": request.headers.get("traceparent")} extracted_context = TraceContextTextMapPropagator().extract(carrier=carrier)
5. Mock Instrumentation Reference (Python)¶
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from opentelemetry.trace import Status, StatusCode
# Initialize Tracer
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer("agent-sdk-executor")
def execute_subagent_task(http_headers: dict, task_budget: float):
# 1. Extract context from incoming traceparent header
context = TraceContextTextMapPropagator().extract(carrier=http_headers)
# 2. Run under active parent context
with tracer.start_as_current_span("executor.run", context=context) as span:
vm_id = "vm-" + os.urandom(6).hex()
span.set_attribute("executor.vm_id", vm_id)
span.set_attribute("executor.budget.limit_usd", task_budget)
try:
# Execute logic...
used_cost = 0.08
tokens_total = 1250
# Record runtime metrics
span.set_attribute("executor.budget.used_usd", used_cost)
span.set_attribute("executor.tokens.total", tokens_total)
span.set_status(Status(StatusCode.OK))
except Exception as e:
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, str(e)))
raise e