Blog

    Notes from the team

    Updates, thinking, and technical deep-dives from the Lilac team.

    May 7, 20263 min read

    We're partnering with MiniMax to bring M2.7 to Lilac

    We are partnering with MiniMax to bring commercially licensed MiniMax M2.7 access to Lilac.

    Read more
    May 7, 20262 min read

    How to keep frontier open weights viable

    Why Lilac supports open-weight licensing, and why commercial rights can help more frontier models stay open.

    Read more
    May 1, 20261 min read

    Kimi K2.6 is live on Lilac

    Kimi K2.6 is now available on Lilac with OpenAI-compatible chat completions, 262K context, cache-read pricing, and no commitments.

    Read more
    Apr 27, 20262 min read

    Cache read pricing is now live on Lilac

    Supported Lilac models now show lower cache read rates for repeated context, making long-context and agent workloads cheaper to run.

    Read more
    Apr 8, 20262 min read

    Lilac is now self-serve — plus GLM 5.1 and Gemma 4 are live

    No more waitlist. Sign up, grab an API key, and start running inference. GLM 5.1 is live at $0.90/M input, and Gemma 4 is live at $0.11/M input.

    Read more
    Apr 8, 20262 min read

    GLM 5.1 Inference Benchmark

    We benchmarked our GLM 5.1 endpoint against every GLM 5.1 provider listed on OpenRouter. Competitive throughput at the lowest per-token price in the comparison.

    Read more
    Mar 25, 20262 min read

    How Idle GPUs Make Cheap Inference Possible

    Lilac serves Kimi K2.6 inference on idle enterprise GPUs with OpenAI-compatible, pay-per-token shared endpoints.

    Read more
    Mar 23, 20263 min read

    GPU Inference API Pricing Compared

    A direct comparison of GPU inference API pricing across major providers. How idle GPU economics enable Lilac to offer lower per-token rates.

    Read more
    Mar 16, 20266 min read

    The GPU Scarcity Paradox

    The GPU shortage isn't what you think. The industry doesn't have a supply problem — it has a utilization problem masquerading as one.

    Read more
    Mar 1, 20252 min read

    Introducing Lilac: Turn Idle GPU Capacity into Revenue

    Most Kubernetes clusters run GPUs at 30-50% utilization. We built a single operator to change that.

    Read more