GPU inference,_
Affordable inference, same output quality. Powered by idle enterprise GPUs.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk_...",
)
response = client.chat.completions.create(
model="kimi-k2-5",
messages=[{"role": "user", "content": "Hello!"}],
)
# Same code. Same SDK. Fraction of the price.
OpenAI-Compatible
Drop-in SDK replacement
1 Line to Switch
Change your base URL
Always Warm
No cold starts, no spin-up
$0 Minimums
Pay per token, no contracts
Pricing
Pay per token. No commitments.
We route inference to idle enterprise GPUs — hardware that's already powered on and paid for. No reserved capacity markup. You only pay for what you use.
More models coming soon. Get in touch to request specific models.
Popular searches
Start with the exact page you were searching for.
Cheap inference API
Low-cost shared endpoints for Kimi K2.5 with benchmark context and pricing.
Serverless inference API
Managed API usage without the usual cold-start tradeoff.
Kimi K2.5 API
Direct Kimi K2.5 endpoint, pricing snapshot, and OpenAI code sample.
OpenRouter alternative
Direct endpoint comparison for teams focused on Kimi K2.5 cost and simplicity.
How it works
Three steps. No infrastructure.
Swap your base URL
Point any OpenAI SDK to api.getlilac.com. One line of config — your existing code just works.
Pick a model
Kimi K2.5 available now on shared endpoints. More models added regularly.
Run inference
Requests route to the nearest idle GPU. Same output quality, significantly lower price.
The economics
Why we're cheaper.
Most GPU clusters run at 30–50% utilization. Lilac routes your inference to that idle capacity — hardware that's already powered on and paid for.
Start running inference in minutes.
No contracts, no commitments. Swap your base URL and pay less for the same output quality.
No commitment required.