LiteLLM OSS Feature Map¶

Generated working note for issue #83 and ADR 0005.

Mapper Input¶

Untool surface: contracts/llm-gateway.openapi.yaml
LiteLLM surface: tmp/litellm/openapi.json
Mapper: app/ontology/schema_abstract.py

The mapper found confirmed overlap on the core OpenAI-shaped primitives: chat completion, embeddings, and validation/error envelopes. The LiteLLM surface is much broader, so adoption should happen as feature families behind Untool contracts instead of copying the proxy wholesale.

Adopt Now¶

LiteLLM OSS feature	Untool adoption shape	Status
Provider normalization	`ProviderTarget.litellm_model` via LiteLLM SDK	present
Model aliases / groups	`ProviderTarget.aliases`	wired
Fallbacks	`ProviderTarget.fallback_models` executed by gateway	wired
Timeout / retries	`timeout_seconds`, `max_retries` passed to LiteLLM	wired
Cache policy	`cache` / `cache_ttl_seconds` passed as LiteLLM cache controls	wired
Cache backend materialization	env-configured LiteLLM `enable_cache(...)` for local/disk/Redis/S3/GCS-style backends	wired
Cost metadata	OTel span, metric cost fields, and non-secret audit fields	wired
Spend budgets	`LLM_GATEWAY_DAILY_SPEND_USD` / `LLM_GATEWAY_MONTHLY_SPEND_USD` per JWT subject	wired
Durable FinOps usage events	`llm.finops.usage.v1` CloudEvent projected after spend recording	wired
Evals	privacy-preserving `llm.eval.trace.v1` CloudEvent projected after model calls	wired
OpenAI-compatible chat	`/v1/chat/completions`	present
OpenAI-compatible embeddings	`/v1/embeddings`	present
OpenAI-compatible Responses API	`/v1/responses` via `litellm.aresponses`	wired
Streaming	SSE preflight + audited stream	present
BYO LiteLLM proxy	secret-free config projection + admin export route	wired
Pass-through providers	customer/partner LiteLLM proxy targets via `api_base` + `secret_ref` registry entries	wired
Internal OpenRouter policy	`/v1/admin/llm/route-policy` projects aliases, fallbacks, budget/cost metadata, and strategies	wired
Model health / management	admin-only secret-free configuration health projection	wired
Live provider health	admin-triggered `/v1/admin/litellm/health/probe` with strict timeout and redacted diagnostics	wired
Virtual keys	caller-specific secret-free virtual-key policy projection	wired
Guardrail integrations	hook-style policy projection for federated LiteLLM proxies	wired

Adopt Next¶

LiteLLM OSS feature	Untool shape
Broker-backed FinOps/eval stream	Optional `EVENTS_ENABLED` publisher for `llm.finops.usage.v1` and `llm.eval.trace.v1`; full outbox/JetStream contract remains issue #94.
Team/model access	Extend the virtual-key projection with persisted team/org policy once the IDP/team model is canonical.
Cache invalidation	Admin-only dry-run invalidation plan endpoint; deletion remains blocked until key ownership contract is finalized.
MCP/A2A features	Keep Agent Gateway canonical; federate LiteLLM MCP only as a target family.

FinOps Projection¶

The gateway now emits LiteLLM-style spend metadata without adopting LiteLLM's virtual-key database as the source of truth. Successful chat, Responses, Responses streaming, chat streaming, and embeddings calls include:

budget_group / cost_center when configured on the model target.
response_model, prompt/completion token counts, and cost_usd.
daily_spend_usd and monthly_spend_usd after the call is recorded.

These fields are intentionally non-secret and ride the existing audit/OTel path. Successful model calls also project a validated llm.finops.usage.v1 CloudEvent using contracts/llm-finops-event.schema.json. Today the event is included in the structured audit stream; the same payload can be published to the fleet broker when the NATS/JetStream contract is active.

Successful model calls also project a privacy-preserving llm.eval.trace.v1 CloudEvent using contracts/llm-eval-trace-event.schema.json. This gives the eval/QA pipeline a stable sampling and join key without sending prompts, outputs, provider secrets, API bases, or raw exception details into the eval event. The event carries endpoint, caller metadata, model/provider ids, token counts, cost, streaming flag, and explicit privacy booleans.

When EVENTS_ENABLED=true and NATS_URL is configured, the same validated FinOps/eval CloudEvents are also published to their broker subjects through the optional LLM event stream publisher. Publish failures are logged and do not fail successful model calls; the audit log remains the replay source until issue #94 lands the durable outbox/JetStream contract.

Virtual-Key Projection¶

/v1/litellm/virtual-key-policy maps the caller's Untool JWT into a LiteLLM-compatible virtual-key policy envelope without minting or returning a secret. The projection includes:

user_id, team_id from tenant context, and caller roles.
Allowed model names and aliases visible to that principal.
Per-model budget group, cost center, RPM/TPM, capabilities, and fallback metadata.
Daily/monthly Untool spend caps projected into LiteLLM-style max_budget / budget_duration fields when present.

Operators can use this as the federated materialization input for a downstream LiteLLM proxy while Untool remains authoritative for authN/Z, tenant policy, budget, and provenance.

Federated LiteLLM Proxy Targets¶

When a customer, partner, or internal team already runs a LiteLLM proxy, Untool can route to it as a normal ProviderTarget:

provider="litellm-proxy" or another local provider slug;
litellm_model set to the downstream OpenAI-compatible model id;
api_base set to the partner proxy's /v1 base URL;
secret_ref pointing at the brokered proxy key;
optional aliases, budget group, cost center, fallback, timeout/retry, and cache metadata.

This gives us the internal OpenRouter pattern without forking LiteLLM: direct providers and federated LiteLLM proxies sit behind the same Untool auth, guardrail, FinOps, audit, and contract surfaces.

Route Policy Projection¶

/v1/admin/llm/route-policy projects the registry as Untool's internal OpenRouter-style policy document. It is secret-free and includes model capabilities, aliases, fallback chains, budget groups, cost centers, RPM/TPM limits, timeout/retry settings, cache hints, and supported selection strategies.

The first strategy is conservative: capability match, then registry order, then explicit fallback chain. Future dynamic routing can add measured latency, quality scores, tenant policy, and near-budget downgrade rules without changing the client-facing OpenAI-compatible /v1 surface.

Cache Projection¶

LiteLLM's cache controls are now represented in ProviderTarget rather than as an opaque boolean only. Targets can set:

cache=true|false
cache_ttl_seconds=<seconds>

When TTL is present, gateway calls forward cache={"ttl": seconds} to LiteLLM. The same policy appears in the secret-free config, health, and virtual-key projections so a federated proxy can materialize the matching cache behavior. Backend materialization is now wired through LLM_GATEWAY_CACHE_* settings:

LLM_GATEWAY_CACHE_ENABLED
LLM_GATEWAY_CACHE_TYPE (local, disk, redis, s3, gcs, etc. as supported by LiteLLM)
LLM_GATEWAY_CACHE_TTL_SECONDS
LLM_GATEWAY_CACHE_NAMESPACE
LLM_GATEWAY_CACHE_CALL_TYPES
optional host/port/disk/S3 settings and LLM_GATEWAY_CACHE_PASSWORD_SECRET

/v1/admin/litellm/cache exposes a secret-free projection of the cache backend policy. The gateway resolves cache passwords only through secret_ref at LiteLLM cache initialization time.

/v1/admin/litellm/cache/invalidation-plan exposes the cache-invalidation contract shape without deleting entries. It is admin-only, dry-run only, and returns model/tenant scopes with delete_enabled=false. This lets operators and the future cache service agree on key ownership before any destructive invalidation endpoint exists.

Live Health Projection¶

/v1/admin/litellm/health remains the default safe report: it is configuration only and never calls upstream providers. When an operator explicitly needs a LiteLLM-style live health check, /v1/admin/litellm/health/probe runs a minimal chat or embedding call for the requested logical models.

The probe path is admin-only, caps the request to 25 model ids, applies a per-probe timeout, and returns only:

model, provider, and probe capability;
healthy / unhealthy;
latency in milliseconds;
a coarse error_type such as timeout, provider_error, or credential_resolution_failed.

It intentionally does not return provider output, resolved keys, secret refs, API bases, or raw exception details.

Guardrail Projection¶

/v1/litellm/guardrail-policy projects Untool's gateway controls into a LiteLLM hook-style policy envelope. It intentionally exposes hook names, phases, blocking mode, and high-level actions rather than scanner internals:

pre_call: prompt-injection heuristics for chat, Responses, embeddings, and search.
pre_call: optional llm-guard deep scan when enabled.
post_call: output email redaction when enabled.
post_tool_call: search/extract result scrubbing for indirect prompt injection.
pre_tool_call: URL scheme/private-network/domain policy for extraction.

The same guardrail policy is embedded in proxy config, health, and virtual-key policy projections so a federated LiteLLM proxy can mirror the boundary without becoming the policy source of truth.

Sift Rule¶

A LiteLLM feature is snapped into Untool only when it satisfies all gates:

It keeps external clients OpenAI/Anthropic-compatible.
It does not expose provider secrets.
Untool remains source of truth for tenant policy, provenance, and FinOps.
It can be tested offline with LiteLLM calls monkeypatched.
It does not collapse Hypergraph/Object Model semantics into model routing.