Serverless inference API
Serverless inference, no cold starts.
Shared endpoints backed by idle enterprise GPUs. Warm capacity and token pricing — no dedicated GPUs to manage.
OpenAI-compatible serverless API. No contracts, token-based pricing.
Kimi K2.5 pricing
Pay per token. No commitments.
~28 tok/s per user, 0.38s TTFT on Kimi K2.5 shared endpoints (March 2026).
Input
$0.40
per million tokens
Output
$2.00
per million tokens
25% off all tokens above 1B/month for 3 months. That is $0.30/M input and $1.50/M output above the threshold.
Integration
One base URL change.
Keep the OpenAI SDK and point it at Lilac. Your existing code just works.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk_...",
)
response = client.chat.completions.create(
model="kimi-k2-5",
messages=[{"role": "user", "content": "Hello!"}],
)
# Same code. Same SDK. Fraction of the price.
Nothing to provision, autoscale, or keep warm.
Endpoints are already running — no container spin-up.
Switch from OpenAI by updating one base URL.
Frequently asked questions
Is this really serverless?
Yes. No infrastructure to provision or maintain — just an API call.
How do you avoid cold starts?
Traffic routes to already-running shared capacity, not freshly spun containers.
Start running inference in minutes.
No contracts, no commitments. Swap your base URL and pay less for the same output quality.
No commitment required.