ai-ctrl — Cheap GPUs. Managed reliability. Can't blow your budget.

See it · 70 sec

The whole platform in 70 seconds.

The gap we fill

Two halves of the stack, finally fused.

LLM gateways route to models but never touch GPUs. GPU clouds run models but never route to OpenAI or protect your bill. ai-ctrl is the seam — one API that does both.

LLM gateways

route + fallback
token budgets only
✗ no GPUs
✗ no compute spend control

ai·ctrl routing + GPUs +
money-drain protection,
one endpoint

GPU / inference clouds

run your own models
scale-to-zero
✗ no cross-provider routing
✗ weak / no $ budget caps

The manager, live

Over-provision. Benchmark. Keep the best. Kill the rest.

A pool manager sits in the middle. Ask for 5 machines and it boots a few extra, benchmarks every one (GPU, SSD, network), keeps the fastest 5 and kills the rest. If one dies — or a benchmark flags slow disk or network — it's replaced automatically. You only ever pay for healthy machines.

Pool mgrtarget size: 5

→

booting benchmark ready killed

A pool manager keeps your fleet healthy — automatically.

Strategy shown: Vast.ai — machines are cheap, so over-provision and cull. On RunPod the manager keeps fewer spares — boot time is billed and machines are more reliable.

Why teams pick ai-ctrl

01

Cheap GPUs,
managed reliability

Run spot & neocloud GPUs (60–90% cheaper) without the spot-roulette. Only health-checked machines take traffic; if a node dies, requests fail over automatically.

02

Idle GPUs can't
drain your budget

Built-in reapers stop idle machines, kill stragglers, and reclaim provider-side zombies. Per-pool budgets pause spend before it runs away. The #1 FinOps problem, solved in the gateway.

03

One endpoint
for everything

Point the OpenAI SDK at ai-ctrl. Reach every cloud provider and your own vLLM fleet behind a single API — automatic failover, BYOK, no token markup, no lock-in.

Workloads live

Not just chat — run your whole AI stack.

One control plane for inference, batch, training and image/video generation — all on the same pooled GPUs.

Chat & inference

OpenAI-compatible streaming completions across cloud models and your own vLLM.

Batch −50%

OpenAI-compatible batch jobs at roughly half price, ~24 h turnaround.

Training & LoRA

Fine-tune and LoRA-train open models on pooled GPUs — submit a run, get your adapter back.

ComfyUI

Image & video generation pipelines dispatched to GPU workers.

Image models

Run FLUX, SDXL & friends — text-to-image on your own GPUs, not a rented API.

Bring your model

Any vLLM / Hugging-Face model, swapped live onto a pool with prestaged weights.

Money-drain protection live

Five reapers + per-pool budgets, watching your bill.

Idle, stalled, crashed, never-booted, and provider-side zombie machines are detected and destroyed automatically. Budgets and account-halt stop spend before it runs away.

Auto-stop / kill

Idle machine → stop (keep disk) → kill after grace. No more paying for GPUs doing nothing.

Zombie & lost reapers

Provider-side pods with no DB record, or machines gone silent, are reclaimed so they stop billing.

Budgets & halt

Per-pool daily budgets pause provisioning; account-halt stops a runaway scope cold.

Reliability on cheap servers

Only healthy machines serve traffic.

Every provisioned box is audited — SSH, GPU, disk & network — before it joins a pool. Bad hardware is rejected; a dead spot node fails over to cloud so a cheap GPU never becomes a failed request.

Mandatory health-gate + deny-list are on the roadmap soon — provisioning-time audit ships today.

Multi-provider & pools live

Every model, every GPU provider, declarative pools.

Route across 8+ LLM providers and your own vLLM. Provision GPUs on RunPod, Vast & Clore into autoscaling, budgeted pools — or add your own boxes to the same fleet.

Multi-provider routing

OpenAI, Anthropic, Gemini, Mistral, DeepSeek, Qwen, Perplexity, OpenRouter + self-hosted vLLM. Priority + cost ranking, opt-in fallback. BYOK.

GPU pools

Declarative, multi-member, autoscaling pools with per-pool budgets. RunPod / Vast / Clore — plus your own gx10 / on-prem boxes over Tailscale.

Built to be observable live

Every resource reports the same four fields.

A live SSE event stream, ~130 machine states and an append-only audit log. Automation routes on stable enums; humans read the detail. No guessing.

status

Small stable enum. Automation fans out on this.

state_detail

Human-readable current activity.

state_progress

0–100% through the current step.

error_code

Stable category for retry / alert switching.

Honest comparison

Where ai-ctrl sits.

Capability	LLM gateways	GPU / inference clouds	TrueFoundry	ai-ctrl
Cross-provider LLM routing + fallback	Yes	No	Yes	Yes
Provisions cheap / spot cloud GPUs	No	Yes	Own GPUs only	Yes
Active money-drain reapers + $ budgets	Token $ only	Weak / partial	Partial	Yes
Health-gated machines	—	Rare	—	Roadmap
Your own / local boxes in the fleet	No	Some (BYOC)	Yes (your K8s)	Yes
Fleet / GPU-lifecycle observability	Request-level	Partial	Partial	Yes

Honest take: routing and observability are table-stakes among gateways; provisioning + reapers are the GPU clouds' turf. ai-ctrl is the only one combining both behind one OpenAI-compatible API — TrueFoundry is closest but orchestrates only GPUs you already own. Health-gating is on our roadmap (see badges above).

Quickstart

# Point any OpenAI client at ai-ctrl
curl https://api.ai-ctrl.net/v1/chat/completions \
  -H "Authorization: Bearer $AI_CTRL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'

# Or with the OpenAI Python SDK
from openai import OpenAI
client = OpenAI(base_url="https://api.ai-ctrl.net/v1", api_key="$AI_CTRL_API_KEY")

No surprises

OpenAI-compatible API

Bring your own keys

No token markup

Self-host friendly

No lock-in

Scale to zero

Cheap GPUs.Managed reliability.Can't blow your budget.