Run Inference Monetize GPUs

GPU inference,_

Affordable inference, same output quality. Powered by idle enterprise GPUs.

Backed by Y Combinator (S25)

inference.py

from openai import OpenAI

client = OpenAI(

base_url="https://api.openai.com/v1",

api_key="sk_...",

)

response = client.chat.completions.create(

model="kimi-k2-5",

messages=[{"role": "user", "content": "Hello!"}],

)

# Same code. Same SDK. Fraction of the price.

OpenAI-Compatible

Drop-in SDK replacement

1 Line to Switch

Change your base URL

Always Warm

No cold starts, no spin-up

$0 Minimums

Pay per token, no contracts

Pricing

Pay per token. No commitments.

We route inference to idle enterprise GPUs — hardware that's already powered on and paid for. No reserved capacity markup. You only pay for what you use.

Model

Input

Output

Latency

Kimi K2.5

$0.40/M

$2.00/M

0.38s TTFT

More models coming soon. Get in touch to request specific models.

Popular searches

Start with the exact page you were searching for.

Cheap inference API

Low-cost shared endpoints for Kimi K2.5 with benchmark context and pricing.

Serverless inference API

Managed API usage without the usual cold-start tradeoff.

Kimi K2.5 API

Direct Kimi K2.5 endpoint, pricing snapshot, and OpenAI code sample.

OpenRouter alternative

Direct endpoint comparison for teams focused on Kimi K2.5 cost and simplicity.

How it works

Three steps. No infrastructure.

01

Swap your base URL

Point any OpenAI SDK to api.getlilac.com. One line of config — your existing code just works.

02

Pick a model

Kimi K2.5 available now on shared endpoints. More models added regularly.

03

Run inference

Requests route to the nearest idle GPU. Same output quality, significantly lower price.

The economics

Why we're cheaper.

Most GPU clusters run at 30–50% utilization. Lilac routes your inference to that idle capacity — hardware that's already powered on and paid for.

Idle GPUs are already powered on — no cold start overhead.

Providers set competitive rates to monetize spare capacity.

Lilac routes to the best available GPU automatically.

You only pay per token. No reserved instances, no commitments.

Start running inference in minutes.

No contracts, no commitments. Swap your base URL and pay less for the same output quality.

contact@getlilac.com

No commitment required.