Cheap inference API

Cheap inference API.

Visible token pricing, no contracts, no minimums. OpenAI-compatible.

Get Started

Open-weight models on shared warm endpoints, priced per token. See live pricing below.

Model pricing

Pay per token. No commitments.

Competitive speed vs. OpenRouter-listed providers, at the lowest price in the benchmark snapshot.

Model

Context

Input

Output

Latency

Kimi K2.6

Live now

INT4

262K

$0.70/MCache $0.20/M

$3.50/M

0.45s TTFT

GLM 5.1

Live now

FP8

203K

$0.90/MCache $0.27/M

$3.00/M

0.58s TTFT

Gemma 4 (31B)

Live now

BF16

262K

$0.11/M

$0.35/M

0.72s TTFT

OpenAI-compatibleShared warm endpointsNo contractsNo minimums

25% off all tokens above 1B/month for 3 months. Volume pricing applies automatically above the threshold.

More models are coming soon and will be added as they go live.

Integration

One base URL change.

Keep the OpenAI SDK and point it at Lilac. Your existing code just works.

inference.py

from openai import OpenAI

client = OpenAI(

base_url="https://api.openai.com/v1",

api_key="sk_...",

)

response = client.chat.completions.create(

model="openai/gpt-5.4",

messages=[{"role": "user", "content": "Hello!"}],

)

# Same code. Same SDK. Fraction of the price.

OpenAI-compatible — switching is a base URL change.

Shared endpoints stay warm. No cold starts.

No contracts or minimums. Start immediately.

Read the benchmark snapshot

Frequently asked questions

What makes Lilac cheap?

We route inference to idle enterprise GPUs — hardware already powered on and paid for.

Does cheap mean slower?

No. We benchmark competitively with OpenRouter-listed providers at the same price point or lower.

Start running inference in minutes.

No contracts, no commitments. Swap your base URL and pay less for the same output quality.

Get Started

No commitment required.

Serverless inference API Kimi K2.6 API GPU inference API pricing