Skip to content

LiteLLM OSS Feature Map

Generated working note for issue #83 and ADR 0005.

Mapper Input

  • Untool surface: contracts/llm-gateway.openapi.yaml
  • LiteLLM surface: tmp/litellm/openapi.json
  • Mapper: app/ontology/schema_abstract.py

The mapper found confirmed overlap on the core OpenAI-shaped primitives: chat completion, embeddings, and validation/error envelopes. The LiteLLM surface is much broader, so adoption should happen as feature families behind Untool contracts instead of copying the proxy wholesale.

Adopt Now

LiteLLM OSS feature Untool adoption shape Status
Provider normalization ProviderTarget.litellm_model via LiteLLM SDK present
Model aliases / groups ProviderTarget.aliases wired
Fallbacks ProviderTarget.fallback_models executed by gateway wired
Timeout / retries timeout_seconds, max_retries passed to LiteLLM wired
Cache policy cache / cache_ttl_seconds passed as LiteLLM cache controls wired
Cache backend materialization env-configured LiteLLM enable_cache(...) for local/disk/Redis/S3/GCS-style backends wired
Cost metadata OTel span, metric cost fields, and non-secret audit fields wired
Spend budgets LLM_GATEWAY_DAILY_SPEND_USD / LLM_GATEWAY_MONTHLY_SPEND_USD per JWT subject wired
Durable FinOps usage events llm.finops.usage.v1 CloudEvent projected after spend recording wired
Evals privacy-preserving llm.eval.trace.v1 CloudEvent projected after model calls wired
OpenAI-compatible chat /v1/chat/completions present
OpenAI-compatible embeddings /v1/embeddings present
OpenAI-compatible Responses API /v1/responses via litellm.aresponses wired
Streaming SSE preflight + audited stream present
BYO LiteLLM proxy secret-free config projection + admin export route wired
Pass-through providers customer/partner LiteLLM proxy targets via api_base + secret_ref registry entries wired
Internal OpenRouter policy /v1/admin/llm/route-policy projects aliases, fallbacks, budget/cost metadata, and strategies wired
Model health / management admin-only secret-free configuration health projection wired
Live provider health admin-triggered /v1/admin/litellm/health/probe with strict timeout and redacted diagnostics wired
Virtual keys caller-specific secret-free virtual-key policy projection wired
Guardrail integrations hook-style policy projection for federated LiteLLM proxies wired

Adopt Next

LiteLLM OSS feature Untool shape
Broker-backed FinOps/eval stream Optional EVENTS_ENABLED publisher for llm.finops.usage.v1 and llm.eval.trace.v1; full outbox/JetStream contract remains issue #94.
Team/model access Extend the virtual-key projection with persisted team/org policy once the IDP/team model is canonical.
Cache invalidation Admin-only dry-run invalidation plan endpoint; deletion remains blocked until key ownership contract is finalized.
MCP/A2A features Keep Agent Gateway canonical; federate LiteLLM MCP only as a target family.

FinOps Projection

The gateway now emits LiteLLM-style spend metadata without adopting LiteLLM's virtual-key database as the source of truth. Successful chat, Responses, Responses streaming, chat streaming, and embeddings calls include:

  • budget_group / cost_center when configured on the model target.
  • response_model, prompt/completion token counts, and cost_usd.
  • daily_spend_usd and monthly_spend_usd after the call is recorded.

These fields are intentionally non-secret and ride the existing audit/OTel path. Successful model calls also project a validated llm.finops.usage.v1 CloudEvent using contracts/llm-finops-event.schema.json. Today the event is included in the structured audit stream; the same payload can be published to the fleet broker when the NATS/JetStream contract is active.

Successful model calls also project a privacy-preserving llm.eval.trace.v1 CloudEvent using contracts/llm-eval-trace-event.schema.json. This gives the eval/QA pipeline a stable sampling and join key without sending prompts, outputs, provider secrets, API bases, or raw exception details into the eval event. The event carries endpoint, caller metadata, model/provider ids, token counts, cost, streaming flag, and explicit privacy booleans.

When EVENTS_ENABLED=true and NATS_URL is configured, the same validated FinOps/eval CloudEvents are also published to their broker subjects through the optional LLM event stream publisher. Publish failures are logged and do not fail successful model calls; the audit log remains the replay source until issue #94 lands the durable outbox/JetStream contract.

Virtual-Key Projection

/v1/litellm/virtual-key-policy maps the caller's Untool JWT into a LiteLLM-compatible virtual-key policy envelope without minting or returning a secret. The projection includes:

  • user_id, team_id from tenant context, and caller roles.
  • Allowed model names and aliases visible to that principal.
  • Per-model budget group, cost center, RPM/TPM, capabilities, and fallback metadata.
  • Daily/monthly Untool spend caps projected into LiteLLM-style max_budget / budget_duration fields when present.

Operators can use this as the federated materialization input for a downstream LiteLLM proxy while Untool remains authoritative for authN/Z, tenant policy, budget, and provenance.

Federated LiteLLM Proxy Targets

When a customer, partner, or internal team already runs a LiteLLM proxy, Untool can route to it as a normal ProviderTarget:

  • provider="litellm-proxy" or another local provider slug;
  • litellm_model set to the downstream OpenAI-compatible model id;
  • api_base set to the partner proxy's /v1 base URL;
  • secret_ref pointing at the brokered proxy key;
  • optional aliases, budget group, cost center, fallback, timeout/retry, and cache metadata.

This gives us the internal OpenRouter pattern without forking LiteLLM: direct providers and federated LiteLLM proxies sit behind the same Untool auth, guardrail, FinOps, audit, and contract surfaces.

Route Policy Projection

/v1/admin/llm/route-policy projects the registry as Untool's internal OpenRouter-style policy document. It is secret-free and includes model capabilities, aliases, fallback chains, budget groups, cost centers, RPM/TPM limits, timeout/retry settings, cache hints, and supported selection strategies.

The first strategy is conservative: capability match, then registry order, then explicit fallback chain. Future dynamic routing can add measured latency, quality scores, tenant policy, and near-budget downgrade rules without changing the client-facing OpenAI-compatible /v1 surface.

Cache Projection

LiteLLM's cache controls are now represented in ProviderTarget rather than as an opaque boolean only. Targets can set:

  • cache=true|false
  • cache_ttl_seconds=<seconds>

When TTL is present, gateway calls forward cache={"ttl": seconds} to LiteLLM. The same policy appears in the secret-free config, health, and virtual-key projections so a federated proxy can materialize the matching cache behavior. Backend materialization is now wired through LLM_GATEWAY_CACHE_* settings:

  • LLM_GATEWAY_CACHE_ENABLED
  • LLM_GATEWAY_CACHE_TYPE (local, disk, redis, s3, gcs, etc. as supported by LiteLLM)
  • LLM_GATEWAY_CACHE_TTL_SECONDS
  • LLM_GATEWAY_CACHE_NAMESPACE
  • LLM_GATEWAY_CACHE_CALL_TYPES
  • optional host/port/disk/S3 settings and LLM_GATEWAY_CACHE_PASSWORD_SECRET

/v1/admin/litellm/cache exposes a secret-free projection of the cache backend policy. The gateway resolves cache passwords only through secret_ref at LiteLLM cache initialization time.

/v1/admin/litellm/cache/invalidation-plan exposes the cache-invalidation contract shape without deleting entries. It is admin-only, dry-run only, and returns model/tenant scopes with delete_enabled=false. This lets operators and the future cache service agree on key ownership before any destructive invalidation endpoint exists.

Live Health Projection

/v1/admin/litellm/health remains the default safe report: it is configuration only and never calls upstream providers. When an operator explicitly needs a LiteLLM-style live health check, /v1/admin/litellm/health/probe runs a minimal chat or embedding call for the requested logical models.

The probe path is admin-only, caps the request to 25 model ids, applies a per-probe timeout, and returns only:

  • model, provider, and probe capability;
  • healthy / unhealthy;
  • latency in milliseconds;
  • a coarse error_type such as timeout, provider_error, or credential_resolution_failed.

It intentionally does not return provider output, resolved keys, secret refs, API bases, or raw exception details.

Guardrail Projection

/v1/litellm/guardrail-policy projects Untool's gateway controls into a LiteLLM hook-style policy envelope. It intentionally exposes hook names, phases, blocking mode, and high-level actions rather than scanner internals:

  • pre_call: prompt-injection heuristics for chat, Responses, embeddings, and search.
  • pre_call: optional llm-guard deep scan when enabled.
  • post_call: output email redaction when enabled.
  • post_tool_call: search/extract result scrubbing for indirect prompt injection.
  • pre_tool_call: URL scheme/private-network/domain policy for extraction.

The same guardrail policy is embedded in proxy config, health, and virtual-key policy projections so a federated LiteLLM proxy can mirror the boundary without becoming the policy source of truth.

Sift Rule

A LiteLLM feature is snapped into Untool only when it satisfies all gates:

  1. It keeps external clients OpenAI/Anthropic-compatible.
  2. It does not expose provider secrets.
  3. Untool remains source of truth for tenant policy, provenance, and FinOps.
  4. It can be tested offline with LiteLLM calls monkeypatched.
  5. It does not collapse Hypergraph/Object Model semantics into model routing.