Cheap GPUs.
Managed reliability.
Can't blow your budget.

One OpenAI-compatible API across cloud LLMs and your own + spot GPUs. Health-gated machines, automatic failover, and reapers that stop idle spend — the reliability of a managed provider at neocloud prices.

OpenAI
Anthropic
Gemini
vLLM
RunPod
Vast.ai
See it · 70 sec

The whole platform in 70 seconds.

The gap we fill

Two halves of the stack, finally fused.

LLM gateways route to models but never touch GPUs. GPU clouds run models but never route to OpenAI or protect your bill. ai-ctrl is the seam — one API that does both.

LLM gateways

  • route + fallback
  • token budgets only
  • ✗ no GPUs
  • ✗ no compute spend control
ai·ctrl routing + GPUs +
money-drain protection,
one endpoint

GPU / inference clouds

  • run your own models
  • scale-to-zero
  • ✗ no cross-provider routing
  • ✗ weak / no $ budget caps
The manager, live

Over-provision. Benchmark. Keep the best. Kill the rest.

A pool manager sits in the middle. Ask for 5 machines and it boots a few extra, benchmarks every one (GPU, SSD, network), keeps the fastest 5 and kills the rest. If one dies — or a benchmark flags slow disk or network — it's replaced automatically. You only ever pay for healthy machines.

Pool mgrtarget size: 5
booting benchmark ready killed

A pool manager keeps your fleet healthy — automatically.

Strategy shown: Vast.ai — machines are cheap, so over-provision and cull. On RunPod the manager keeps fewer spares — boot time is billed and machines are more reliable.

Why teams pick ai-ctrl
01

Cheap GPUs,
managed reliability

Run spot & neocloud GPUs (60–90% cheaper) without the spot-roulette. Only health-checked machines take traffic; if a node dies, requests fail over automatically.

SPOT GPU −60–90% cost HEALTH GATE ✓ SERVING CLOUD LLM fallback node dies →
02

Idle GPUs can't
drain your budget

Built-in reapers stop idle machines, kill stragglers, and reclaim provider-side zombies. Per-pool budgets pause spend before it runs away. The #1 FinOps problem, solved in the gateway.

RUNNING STOP KILL idle 60s +30m POOL BUDGET cap → pause
03

One endpoint
for everything

Point the OpenAI SDK at ai-ctrl. Reach every cloud provider and your own vLLM fleet behind a single API — automatic failover, BYOK, no token markup, no lock-in.

YOUR APP ai·ctrl OpenAI/Claude your vLLM spot GPUs
Workloads live

Not just chat — run your whole AI stack.

One control plane for inference, batch, training and image/video generation — all on the same pooled GPUs.

Chat & inference

OpenAI-compatible streaming completions across cloud models and your own vLLM.

Batch −50%

OpenAI-compatible batch jobs at roughly half price, ~24 h turnaround.

Training & LoRA

Fine-tune and LoRA-train open models on pooled GPUs — submit a run, get your adapter back.

ComfyUI

Image & video generation pipelines dispatched to GPU workers.

Image models

Run FLUX, SDXL & friends — text-to-image on your own GPUs, not a rented API.

Bring your model

Any vLLM / Hugging-Face model, swapped live onto a pool with prestaged weights.

Money-drain protection live

Five reapers + per-pool budgets, watching your bill.

Idle, stalled, crashed, never-booted, and provider-side zombie machines are detected and destroyed automatically. Budgets and account-halt stop spend before it runs away.

Auto-stop / kill

Idle machine → stop (keep disk) → kill after grace. No more paying for GPUs doing nothing.

Zombie & lost reapers

Provider-side pods with no DB record, or machines gone silent, are reclaimed so they stop billing.

Budgets & halt

Per-pool daily budgets pause provisioning; account-halt stops a runaway scope cold.

Reliability on cheap servers

Only healthy machines serve traffic.

Every provisioned box is audited — SSH, GPU, disk & network — before it joins a pool. Bad hardware is rejected; a dead spot node fails over to cloud so a cheap GPU never becomes a failed request.

PROVISION spot / neocloud HEALTH GATE GPU · disk · net · engine SERVING REJECT + DENY-LIST pass fail node dies FAILOVER CLOUD LLM
Mandatory health-gate + deny-list are on the roadmap soon — provisioning-time audit ships today.
Multi-provider & pools live

Every model, every GPU provider, declarative pools.

Route across 8+ LLM providers and your own vLLM. Provision GPUs on RunPod, Vast & Clore into autoscaling, budgeted pools — or add your own boxes to the same fleet.

Multi-provider routing

OpenAI, Anthropic, Gemini, Mistral, DeepSeek, Qwen, Perplexity, OpenRouter + self-hosted vLLM. Priority + cost ranking, opt-in fallback. BYOK.

GPU pools

Declarative, multi-member, autoscaling pools with per-pool budgets. RunPod / Vast / Clore — plus your own gx10 / on-prem boxes over Tailscale.

Built to be observable live

Every resource reports the same four fields.

A live SSE event stream, ~130 machine states and an append-only audit log. Automation routes on stable enums; humans read the detail. No guessing.

status

Small stable enum. Automation fans out on this.

state_detail

Human-readable current activity.

state_progress

0–100% through the current step.

error_code

Stable category for retry / alert switching.

Honest comparison

Where ai-ctrl sits.

CapabilityLLM gatewaysGPU / inference cloudsTrueFoundryai-ctrl
Cross-provider LLM routing + fallbackYesNoYesYes
Provisions cheap / spot cloud GPUsNoYesOwn GPUs onlyYes
Active money-drain reapers + $ budgetsToken $ onlyWeak / partialPartialYes
Health-gated machinesRareRoadmap
Your own / local boxes in the fleetNoSome (BYOC)Yes (your K8s)Yes
Fleet / GPU-lifecycle observabilityRequest-levelPartialPartialYes

Honest take: routing and observability are table-stakes among gateways; provisioning + reapers are the GPU clouds' turf. ai-ctrl is the only one combining both behind one OpenAI-compatible API — TrueFoundry is closest but orchestrates only GPUs you already own. Health-gating is on our roadmap (see badges above).

Quickstart
# Point any OpenAI client at ai-ctrl
curl https://api.ai-ctrl.net/v1/chat/completions \
  -H "Authorization: Bearer $AI_CTRL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'

# Or with the OpenAI Python SDK
from openai import OpenAI
client = OpenAI(base_url="https://api.ai-ctrl.net/v1", api_key="$AI_CTRL_API_KEY")
No surprises
OpenAI-compatible API
Bring your own keys
No token markup
Self-host friendly
No lock-in
Scale to zero

Run cheap GPUs without the bill shock.

Request early access →