Cloud Serving Landscape¶
When you fork AgentArmy and build a spoke (UI layer, API layer, worker, infra layer), the first question is: which cloud, which database, which LLM? This guide makes that choice fast with opinionated defaults, hard limits, and a routing table for the right agent.
TL;DR — Opinionated Starter Stacks¶
Pick one and go. You can always migrate later; the LLM abstraction principle (below) keeps your options open.
| Stack | Components | Best for |
|---|---|---|
| Zero-ops | GitHub Pages + Vercel Functions + Neon Postgres + Claude API | Solo dev, zero infra management, pay-as-you-go |
| GCP | Cloud Run + Cloud SQL + Vertex AI (Gemini) | Google-ecosystem teams, per-request billing, strong IAM |
| Azure | Container Apps + Azure SQL + Azure OpenAI | Enterprise M365/Azure shops, existing EA agreements |
| AWS | App Runner/Fargate + RDS Aurora + AWS Bedrock | AWS-native teams, compliance-sensitive workloads |
| GitHub-native | GitHub Pages + GitHub Actions compute | Fully GitHub, no external accounts, documentation-first |
Default recommendation for new spokes: Start with the Zero-ops stack. Vercel + Neon costs nothing at small scale, deploys in minutes, and you can move compute to Cloud Run or ECS later with a container swap.
GitHub Pages: Capabilities and Limits¶
GitHub Pages is already wired up in this template (MkDocs auto-deployed via deploy-docs.yml). Here is what it can and cannot do:
What it does well¶
- Serves static files: HTML, CSS, JS, images, JSON, PDFs
- Supports Jekyll natively or any SSG (Hugo, Astro, MkDocs, Eleventy) via GitHub Actions
- Custom domains with automatic HTTPS via Let's Encrypt
- Free on public repos; included in GitHub plans for private repos
- Ideal for: documentation sites, project portals, OpenAPI spec browsers, marketing pages
Hard limits¶
| Limit | Value |
|---|---|
| Repository size | 1 GB |
| Site size | 1 GB |
| Bandwidth | 100 GB/month (soft limit — GitHub may throttle, not block) |
| Build timeout | 10 minutes |
| Deploys per hour | 10 |
What it cannot do¶
- No server-side execution — no Node.js, Python, PHP, or any runtime at request time
- No API endpoints — any dynamic behavior must come from an external service called via client-side JS
- No server-side secrets — anything in a Pages site is public; never embed API keys
- No auth at the edge — anyone can access any URL; use a separate identity provider + SPA auth
- No databases — all data access must go through a client-side API call to an external backend
Rule of thumb: If a request needs to read a database or call an LLM, it does not belong on GitHub Pages — it belongs in a Vercel Function, Cloud Run service, or similar compute layer.
Decision Matrix: Static / Frontend Hosting¶
| Provider | Free tier | Custom domain | Build included | Server-side | Notes |
|---|---|---|---|---|---|
| GitHub Pages | Yes (public repos) | Yes (CNAME) | Via Actions | No | Best for docs and project portals |
| Vercel | Yes (Hobby) | Yes | Yes (Vercel CI) | Yes (Functions) | Best for full-stack Next.js/SvelteKit |
| Netlify | Yes (Starter) | Yes | Yes | Yes (Functions) | JAMstack, form handling, identity |
| Azure SWA | Yes (Free tier) | Yes | Yes | Yes (Azure Functions) | Best within existing Azure EA |
| Cloudflare Pages | Yes | Yes | Yes | Yes (Workers) | Global edge, Workers KV, R2 |
When to choose Vercel over GitHub Pages for a spoke frontend: any time the spoke needs server-side rendering, API routes, or LLM streaming — even if it's "mostly static."
Decision Matrix: Compute / API Hosting¶
| Provider | Model | Cold start | Max duration | Best for |
|---|---|---|---|---|
| Vercel Functions | Serverless + Edge | ~50ms (Edge), ~300ms (Node) | 300s (Pro), 30s (Edge) | Full-stack apps co-located with frontend |
| Google Cloud Run | Container, autoscale-to-zero | ~500ms | 3600s | Any container, long-running tasks, GCP IAM |
| Azure Container Apps | Container, KEDA autoscale | ~1s | Unlimited | Azure ecosystem, Dapr sidecar, KEDA scaling |
| AWS App Runner | Container, managed | ~1s | Unlimited | AWS-native, simpler than Fargate |
| AWS Fargate | Container, task-based | ~30s | Unlimited | AWS-native, fine-grained task control |
| Railway | Container, always-on option | Minimal | Unlimited | Solo dev, zero-config Dockerfile deploys |
| Fly.io | Container, anycast | Minimal | Unlimited | Multi-region, persistent volumes, ops-aware teams |
Recommendation: Cloud Run is the most flexible managed option — any container, true scale-to-zero, per-request billing. Vercel Functions wins when the spoke is a Next.js / SvelteKit app. Railway and Fly.io are great for teams that want Git-push deploys without a full cloud account setup.
Decision Matrix: Managed Databases¶
| Service | Engine | Free tier | Serverless | Best for |
|---|---|---|---|---|
| Supabase | Postgres | Yes (500 MB) | Yes (pause on inactivity) | Full-stack: Auth + Storage + Realtime + Postgres in one |
| Neon | Postgres | Yes (0.5 CU) | Yes | Vercel + Neon canonical pair; branch-per-PR databases |
| PlanetScale | MySQL (Vitess) | No (free tier paused) | Yes | High-scale MySQL, schema branching, no foreign keys |
| Firestore | NoSQL document | Yes (Spark plan) | Yes | GCP-native, mobile/serverless, event-driven |
| DynamoDB | NoSQL KV + doc | Free tier | On-demand billing | AWS-native, massive scale, single-digit ms latency |
| Upstash Redis | Redis | Yes (10k cmds/day) | Yes | Cache, sessions, rate limiting alongside a primary DB |
| Turso (libSQL) | SQLite (distributed) | Yes | Yes | Edge databases, extremely low-latency reads globally |
Recommendation for most spokes: Neon (with Vercel) or Supabase (standalone). Both are Postgres, serverless, and have generous free tiers. Add Upstash Redis for session storage or rate limiting if needed.
Decision Matrix: LLM Providers¶
AgentArmy uses Claude for orchestration. Spoke applications may use any provider — abstract the client so you can swap without code changes.
| Provider | Models | Pricing style | Strengths | Best for |
|---|---|---|---|---|
| Anthropic (Claude API) | Haiku, Sonnet, Opus | Per token (input/output) | Reasoning, code, long context | Default for AgentArmy-built agents |
| OpenAI | GPT-4o, GPT-4o-mini | Per token | Ecosystem breadth, vision, function calling | Existing OpenAI integrations |
| Vertex AI (GCP) | Gemini, Claude via Vertex | Per token | Audit logging, VPC Service Controls, GCP IAM | GCP-native compliance workloads |
| AWS Bedrock | Claude, Llama, Titan | Per token | PrivateLink, SCPs, AWS compliance | AWS-native, HIPAA/FedRAMP workloads |
| Ollama (self-hosted) | Llama, Mistral, Qwen, Phi | Compute cost only | Air-gapped, cost ceiling, no data egress | Privacy-sensitive, on-prem, cost-capped |
| Groq / Together.ai | Llama, Mixtral, Gemma | Per token (low cost) | Very high throughput, low latency | Budget inference, high-volume PoCs |
Cost guidance: Haiku-class models (Claude Haiku, GPT-4o-mini) are 10–20× cheaper than flagship models and handle most classification, extraction, and light reasoning tasks. Reserve Sonnet/Opus/GPT-4o for tasks that measurably need them.
The LLM Abstraction Principle¶
The AgentArmy roadmap (Play 5) says: consume, abstract provider, avoid lock-in, stay multi-vendor.
In practice this means:
# BAD — hardcoded provider
from anthropic import Anthropic
client = Anthropic()
# GOOD — parameterized via env var + abstraction layer
import litellm # or Vercel AI SDK, LangChain, custom wrapper
response = litellm.completion(
model=os.environ["LLM_MODEL"], # e.g. "claude-3-5-sonnet-latest" or "gpt-4o"
messages=[...]
)
Options for the abstraction layer:
- Vercel AI SDK (ai npm) — provider-agnostic, built for streaming, ideal for Vercel spokes
- LiteLLM (Python) — unified OpenAI-compatible API across 100+ providers
- LangChain — broader orchestration, more abstraction overhead
- Custom env-var wrapper — for simple spokes that call one model at a time
Set LLM_PROVIDER and LLM_MODEL as environment variables per spoke environment. The model changes in config, not code.
Which Agent to Use¶
| Scenario | Agent |
|---|---|
| GCP infrastructure (Cloud Run, Cloud SQL, GKE, IAM, Vertex AI, Cloud Build) | gcp-infra-engineer |
| AWS infrastructure (Fargate, RDS, Bedrock, EKS, CDK/CloudFormation, IAM/SCP) | aws-infra-engineer |
| Azure infrastructure (Container Apps, Bicep, Entra ID, Azure OpenAI) | azure-infra-engineer |
| Vercel platform (Functions, Postgres/KV/Blob, edge middleware, monorepo, AI SDK) | vercel-engineer |
| Multi-cloud strategy, cloud provider selection, landing zone design | cloud-architect |
| LLM system design, RAG, multi-model orchestration, inference serving | llm-architect |
| Cloud cost governance, unit economics, RI/Savings Plan commitments | finops-engineer |
| IaC module design, Terraform state management, Terragrunt orchestration | terraform-engineer |
Coming Soon (Backlog)¶
These features are tracked as GitHub Issues on the project board:
- IaC starter templates — Terraform modules for Cloud Run + Cloud SQL, Fargate + RDS, Container Apps
- Spoke stack builder — a CLI/Actions workflow that scaffolds a new spoke with provider-specific IaC and pipelines
- CI/CD workflow templates — reusable
deploy-gcp.yml,deploy-aws.yml,deploy-vercel.ymlfor spoke repos - LLM provider abstraction layer — shared wrapper library for RT2 Play 5
- Cloud-native CI/CD research spike — GCP Cloud Build/Deploy vs AWS CodePipeline vs GitHub Actions evaluation
Azure Dev Container Lane¶
For Azure-first spoke development, use Azure Container Apps Dev Deploy. It pairs a trusted local PC or self-hosted runner with Azure Container Registry and Azure Container Apps Dev, while keeping production promotion as a separate environment-gated workflow.
For multi-cloud lifecycle routing, use Lifecycle Promotion Management. It defines the local Docker gate and target-adapter pattern that can route a spoke to Azure Container Apps, GCP Cloud Run, Vertex AI Agent Engine, or future runtime targets.