Blog

Updates, thinking, and technical deep-dives from the Lilac team.

May 1, 2026

Kimi K2.6 is live on Lilac

Kimi K2.6 is now available on Lilac with OpenAI-compatible chat completions, 262K context, cache-read pricing, and no commitments.

April 27, 2026

Cache read pricing is now live on Lilac

Supported Lilac models now show lower cache read rates for repeated context, making long-context and agent workloads cheaper to run.

April 8, 2026

Lilac is now self-serve — plus GLM 5.1 and Gemma 4 are live

No more waitlist. Sign up, grab an API key, and start running inference. GLM 5.1 is live at $0.90/M input, and Gemma 4 is live at $0.11/M input.

April 8, 2026

GLM 5.1 Inference Benchmark

We benchmarked our GLM 5.1 endpoint against every GLM 5.1 provider listed on OpenRouter. Competitive throughput at the lowest per-token price in the comparison.

March 25, 2026

How Idle GPUs Make Cheap Inference Possible

Lilac serves Kimi K2.6 inference on idle enterprise GPUs with OpenAI-compatible, pay-per-token shared endpoints.

March 23, 2026

GPU Inference API Pricing Compared

A direct comparison of GPU inference API pricing across major providers. How idle GPU economics enable Lilac to offer lower per-token rates.

March 16, 2026

The GPU Scarcity Paradox

The GPU shortage isn't what you think. The industry doesn't have a supply problem — it has a utilization problem masquerading as one.

March 1, 2025

Introducing Lilac: Turn Idle GPU Capacity into Revenue

Most Kubernetes clusters run GPUs at 30-50% utilization. We built a single operator to change that.