Fast inference,_
Frontier-speed inference on idle enterprise GPUs — same output quality, lower cost.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk_...",
)
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
)
# Same code. Same SDK. Fraction of the price.
OpenAI-Compatible
Drop-in SDK replacement
Fast Speeds
Low latency, high throughput
Always Warm
No cold starts, no spin-up
$0 Minimums
Pay per token, no contracts
Live status
Real-time model performance.
Recent performance snapshot across every endpoint we serve.
Updated every 30 seconds from recent API traffic.
Pricing
Pay per token. No commitments.
We route inference to idle enterprise GPUs — hardware that's already powered on and paid for. No reserved capacity markup. You only pay for what you use.
More models coming soon. Get in touch to request specific models.
The economics
Why we're cheaper.
Most GPU clusters run at 30–50% utilization. Lilac routes your inference to that idle capacity — hardware that's already powered on and paid for.
Start running inference in minutes.
No contracts, no commitments. Swap your base URL and pay less for the same output quality.
No commitment required.