GPU inference,_

    Affordable inference, same output quality. Powered by idle enterprise GPUs.

    Backed by Y Combinator (S25)
    inference.py

    from openai import OpenAI

    client = OpenAI(

    base_url="https://api.openai.com/v1",

    api_key="sk_...",

    )

    response = client.chat.completions.create(

    model="kimi-k2-5",

    messages=[{"role": "user", "content": "Hello!"}],

    )

    # Same code. Same SDK. Fraction of the price.

    Pricing

    Pay per token. No commitments.

    We route inference to idle enterprise GPUs — hardware that's already powered on and paid for. No reserved capacity markup. You only pay for what you use.

    Model
    Input
    Output
    Latency
    Kimi K2.5
    $0.40/M
    $2.00/M
    0.38s TTFT

    More models coming soon. Get in touch to request specific models.

    How it works

    Three steps. No infrastructure.

    01

    Swap your base URL

    Point any OpenAI SDK to api.getlilac.com. One line of config — your existing code just works.

    02

    Pick a model

    Kimi K2.5 available now on shared endpoints. More models added regularly.

    03

    Run inference

    Requests route to the nearest idle GPU. Same output quality, significantly lower price.

    The economics

    Why we're cheaper.

    Most GPU clusters run at 30–50% utilization. Lilac routes your inference to that idle capacity — hardware that's already powered on and paid for.

    Idle GPUs are already powered on — no cold start overhead.
    Providers set competitive rates to monetize spare capacity.
    Lilac routes to the best available GPU automatically.
    You only pay per token. No reserved instances, no commitments.

    Start running inference in minutes.

    No contracts, no commitments. Swap your base URL and pay less for the same output quality.

    contact@getlilac.com

    No commitment required.