Cheap inference API
Cheap inference API.
Visible token pricing, no contracts, no minimums. OpenAI-compatible.
Kimi K2.5 is live at $0.40/M input and $2.00/M output. More models coming soon.
Model pricing
Pay per token. No commitments.
~28 tok/s per user at 100 concurrent requests on Kimi K2.5 (March 2026).
25% off all tokens above 1B/month for 3 months. That is $0.30/M input and $1.50/M output above the threshold.
More models are coming soon and will be added as they go live.
Integration
One base URL change.
Keep the OpenAI SDK and point it at Lilac. Your existing code just works.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk_...",
)
response = client.chat.completions.create(
model="kimi-k2-5",
messages=[{"role": "user", "content": "Hello!"}],
)
# Same code. Same SDK. Fraction of the price.
OpenAI-compatible — switching is a base URL change.
Shared endpoints stay warm. No cold starts.
No contracts or minimums. Start immediately.
Frequently asked questions
What makes Lilac cheap?
We route inference to idle enterprise GPUs — hardware already powered on and paid for.
Does cheap mean slower?
No. We benchmark competitively with OpenRouter-listed providers at the same price point or lower.
Start running inference in minutes.
No contracts, no commitments. Swap your base URL and pay less for the same output quality.
No commitment required.