Serverless inference API

    Serverless inference, no cold starts.

    Shared endpoints backed by idle enterprise GPUs. Warm capacity and token pricing — no dedicated GPUs to manage.

    OpenAI-compatible serverless API. No contracts, token-based pricing.

    Kimi K2.5 pricing

    Pay per token. No commitments.

    ~28 tok/s per user, 0.38s TTFT on Kimi K2.5 shared endpoints (March 2026).

    Input

    $0.40

    per million tokens

    Output

    $2.00

    per million tokens

    OpenAI-compatibleShared warm endpointsNo contractsNo minimums

    25% off all tokens above 1B/month for 3 months. That is $0.30/M input and $1.50/M output above the threshold.

    Integration

    One base URL change.

    Keep the OpenAI SDK and point it at Lilac. Your existing code just works.

    inference.py

    from openai import OpenAI

    client = OpenAI(

    base_url="https://api.openai.com/v1",

    api_key="sk_...",

    )

    response = client.chat.completions.create(

    model="kimi-k2-5",

    messages=[{"role": "user", "content": "Hello!"}],

    )

    # Same code. Same SDK. Fraction of the price.

    01

    Nothing to provision, autoscale, or keep warm.

    02

    Endpoints are already running — no container spin-up.

    03

    Switch from OpenAI by updating one base URL.

    Frequently asked questions

    Is this really serverless?

    Yes. No infrastructure to provision or maintain — just an API call.

    How do you avoid cold starts?

    Traffic routes to already-running shared capacity, not freshly spun containers.

    Start running inference in minutes.

    No contracts, no commitments. Swap your base URL and pay less for the same output quality.

    contact@getlilac.com

    No commitment required.