LiteLLM OSS Feature Map¶
Generated working note for issue #83 and ADR 0005.
Mapper Input¶
- Untool surface:
contracts/llm-gateway.openapi.yaml - LiteLLM surface:
tmp/litellm/openapi.json - Mapper:
app/ontology/schema_abstract.py
The mapper found confirmed overlap on the core OpenAI-shaped primitives: chat completion, embeddings, and validation/error envelopes. The LiteLLM surface is much broader, so adoption should happen as feature families behind Untool contracts instead of copying the proxy wholesale.
Adopt Now¶
| LiteLLM OSS feature | Untool adoption shape | Status |
|---|---|---|
| Provider normalization | ProviderTarget.litellm_model via LiteLLM SDK |
present |
| Model aliases / groups | ProviderTarget.aliases |
wired |
| Fallbacks | ProviderTarget.fallback_models executed by gateway |
wired |
| Timeout / retries | timeout_seconds, max_retries passed to LiteLLM |
wired |
| Cache policy | cache / cache_ttl_seconds passed as LiteLLM cache controls |
wired |
| Cache backend materialization | env-configured LiteLLM enable_cache(...) for local/disk/Redis/S3/GCS-style backends |
wired |
| Cost metadata | OTel span, metric cost fields, and non-secret audit fields | wired |
| Spend budgets | LLM_GATEWAY_DAILY_SPEND_USD / LLM_GATEWAY_MONTHLY_SPEND_USD per JWT subject |
wired |
| Durable FinOps usage events | llm.finops.usage.v1 CloudEvent projected after spend recording |
wired |
| Evals | privacy-preserving llm.eval.trace.v1 CloudEvent projected after model calls |
wired |
| OpenAI-compatible chat | /v1/chat/completions |
present |
| OpenAI-compatible embeddings | /v1/embeddings |
present |
| OpenAI-compatible Responses API | /v1/responses via litellm.aresponses |
wired |
| Streaming | SSE preflight + audited stream | present |
| BYO LiteLLM proxy | secret-free config projection + admin export route | wired |
| Pass-through providers | customer/partner LiteLLM proxy targets via api_base + secret_ref registry entries |
wired |
| Internal OpenRouter policy | /v1/admin/llm/route-policy projects aliases, fallbacks, budget/cost metadata, and strategies |
wired |
| Model health / management | admin-only secret-free configuration health projection | wired |
| Live provider health | admin-triggered /v1/admin/litellm/health/probe with strict timeout and redacted diagnostics |
wired |
| Virtual keys | caller-specific secret-free virtual-key policy projection | wired |
| Guardrail integrations | hook-style policy projection for federated LiteLLM proxies | wired |
Adopt Next¶
| LiteLLM OSS feature | Untool shape |
|---|---|
| Broker-backed FinOps/eval stream | Optional EVENTS_ENABLED publisher for llm.finops.usage.v1 and llm.eval.trace.v1; full outbox/JetStream contract remains issue #94. |
| Team/model access | Extend the virtual-key projection with persisted team/org policy once the IDP/team model is canonical. |
| Cache invalidation | Admin-only dry-run invalidation plan endpoint; deletion remains blocked until key ownership contract is finalized. |
| MCP/A2A features | Keep Agent Gateway canonical; federate LiteLLM MCP only as a target family. |
FinOps Projection¶
The gateway now emits LiteLLM-style spend metadata without adopting LiteLLM's virtual-key database as the source of truth. Successful chat, Responses, Responses streaming, chat streaming, and embeddings calls include:
budget_group/cost_centerwhen configured on the model target.response_model, prompt/completion token counts, andcost_usd.daily_spend_usdandmonthly_spend_usdafter the call is recorded.
These fields are intentionally non-secret and ride the existing audit/OTel path.
Successful model calls also project a validated llm.finops.usage.v1
CloudEvent using contracts/llm-finops-event.schema.json. Today the event is
included in the structured audit stream; the same payload can be published to
the fleet broker when the NATS/JetStream contract is active.
Successful model calls also project a privacy-preserving llm.eval.trace.v1
CloudEvent using contracts/llm-eval-trace-event.schema.json. This gives the
eval/QA pipeline a stable sampling and join key without sending prompts,
outputs, provider secrets, API bases, or raw exception details into the eval
event. The event carries endpoint, caller metadata, model/provider ids, token
counts, cost, streaming flag, and explicit privacy booleans.
When EVENTS_ENABLED=true and NATS_URL is configured, the same validated
FinOps/eval CloudEvents are also published to their broker subjects through the
optional LLM event stream publisher. Publish failures are logged and do not fail
successful model calls; the audit log remains the replay source until issue
#94 lands the durable outbox/JetStream contract.
Virtual-Key Projection¶
/v1/litellm/virtual-key-policy maps the caller's Untool JWT into a
LiteLLM-compatible virtual-key policy envelope without minting or returning a
secret. The projection includes:
user_id,team_idfrom tenant context, and caller roles.- Allowed model names and aliases visible to that principal.
- Per-model budget group, cost center, RPM/TPM, capabilities, and fallback metadata.
- Daily/monthly Untool spend caps projected into LiteLLM-style
max_budget/budget_durationfields when present.
Operators can use this as the federated materialization input for a downstream LiteLLM proxy while Untool remains authoritative for authN/Z, tenant policy, budget, and provenance.
Federated LiteLLM Proxy Targets¶
When a customer, partner, or internal team already runs a LiteLLM proxy, Untool
can route to it as a normal ProviderTarget:
provider="litellm-proxy"or another local provider slug;litellm_modelset to the downstream OpenAI-compatible model id;api_baseset to the partner proxy's/v1base URL;secret_refpointing at the brokered proxy key;- optional aliases, budget group, cost center, fallback, timeout/retry, and cache metadata.
This gives us the internal OpenRouter pattern without forking LiteLLM: direct providers and federated LiteLLM proxies sit behind the same Untool auth, guardrail, FinOps, audit, and contract surfaces.
Route Policy Projection¶
/v1/admin/llm/route-policy projects the registry as Untool's internal
OpenRouter-style policy document. It is secret-free and includes model
capabilities, aliases, fallback chains, budget groups, cost centers, RPM/TPM
limits, timeout/retry settings, cache hints, and supported selection strategies.
The first strategy is conservative: capability match, then registry order, then
explicit fallback chain. Future dynamic routing can add measured latency,
quality scores, tenant policy, and near-budget downgrade rules without changing
the client-facing OpenAI-compatible /v1 surface.
Cache Projection¶
LiteLLM's cache controls are now represented in ProviderTarget rather than as
an opaque boolean only. Targets can set:
cache=true|falsecache_ttl_seconds=<seconds>
When TTL is present, gateway calls forward cache={"ttl": seconds} to LiteLLM.
The same policy appears in the secret-free config, health, and virtual-key
projections so a federated proxy can materialize the matching cache behavior.
Backend materialization is now wired through LLM_GATEWAY_CACHE_* settings:
LLM_GATEWAY_CACHE_ENABLEDLLM_GATEWAY_CACHE_TYPE(local,disk,redis,s3,gcs, etc. as supported by LiteLLM)LLM_GATEWAY_CACHE_TTL_SECONDSLLM_GATEWAY_CACHE_NAMESPACELLM_GATEWAY_CACHE_CALL_TYPES- optional host/port/disk/S3 settings and
LLM_GATEWAY_CACHE_PASSWORD_SECRET
/v1/admin/litellm/cache exposes a secret-free projection of the cache backend
policy. The gateway resolves cache passwords only through secret_ref at
LiteLLM cache initialization time.
/v1/admin/litellm/cache/invalidation-plan exposes the cache-invalidation
contract shape without deleting entries. It is admin-only, dry-run only, and
returns model/tenant scopes with delete_enabled=false. This lets operators and
the future cache service agree on key ownership before any destructive
invalidation endpoint exists.
Live Health Projection¶
/v1/admin/litellm/health remains the default safe report: it is configuration
only and never calls upstream providers. When an operator explicitly needs a
LiteLLM-style live health check, /v1/admin/litellm/health/probe runs a minimal
chat or embedding call for the requested logical models.
The probe path is admin-only, caps the request to 25 model ids, applies a per-probe timeout, and returns only:
- model, provider, and probe capability;
healthy/unhealthy;- latency in milliseconds;
- a coarse
error_typesuch astimeout,provider_error, orcredential_resolution_failed.
It intentionally does not return provider output, resolved keys, secret refs, API bases, or raw exception details.
Guardrail Projection¶
/v1/litellm/guardrail-policy projects Untool's gateway controls into a
LiteLLM hook-style policy envelope. It intentionally exposes hook names, phases,
blocking mode, and high-level actions rather than scanner internals:
pre_call: prompt-injection heuristics for chat, Responses, embeddings, and search.pre_call: optionalllm-guarddeep scan when enabled.post_call: output email redaction when enabled.post_tool_call: search/extract result scrubbing for indirect prompt injection.pre_tool_call: URL scheme/private-network/domain policy for extraction.
The same guardrail policy is embedded in proxy config, health, and virtual-key policy projections so a federated LiteLLM proxy can mirror the boundary without becoming the policy source of truth.
Sift Rule¶
A LiteLLM feature is snapped into Untool only when it satisfies all gates:
- It keeps external clients OpenAI/Anthropic-compatible.
- It does not expose provider secrets.
- Untool remains source of truth for tenant policy, provenance, and FinOps.
- It can be tested offline with LiteLLM calls monkeypatched.
- It does not collapse Hypergraph/Object Model semantics into model routing.