Inference Open Source Research Blog About 10.3K

Backed by Y Combinator

QHexRT launchedInference engine for Qualcomm NPUs

Weekly briefing · Inference Radar

The state of open-source
inference, every week.

Automated, citation-backed briefings on the repositories that actually move the AI inference stack — vLLM, llama.cpp, MLX, TensorRT-LLM, and 130+ more. Produced by Inference Radar, our research arm for tracking the open-source inference ecosystem.

Subscribe to the briefing Read the latest issue

Cadence

Weekly

Coverage

130+ repos

Issues

11

Cost

Free

Latest issue

Read full briefing

Cover: Gemma 4 Exposes Inference’s Memory Wall

Latest2026-W23·Jun 4 — Jun 10, 2026·16 min read

Gemma 4 Exposes Inference’s Memory Wall

“This week made the inference stack feel like one continuous memory hierarchy: cloud servers, local runtimes, browser engines, mobile frameworks, and edge NPUs are all fighting the same bottlenecks. The center of gravity moved from “can this model run?” to “can this model stream, cache, quantize, route, and recover under real workloads?””

80 active repos

Read full briefing

Archive

Every issue we've published.

Cover: vLLM Ignites the KV-Cache War

2026-W22·18 min read

vLLM Ignites the KV-Cache War

3,448 commits3,283 PRs139 releases

Cover: Qwen3.7-Max Forces Runtimes Into Session Mode

2026-W21·18 min read

Qwen3.7-Max Forces Runtimes Into Session Mode

3,164 commits2,779 PRs94 releases

Cover: llama.cpp Shoves MTP Into the Mainstream

2026-W20·16 min read

llama.cpp Shoves MTP Into the Mainstream

3,702 commits3,109 PRs95 releases

Cover: DeepSeek V4 Drags Every Runtime

2026-W19·19 min read

DeepSeek V4 Drags Every Runtime

3,961 commits3,190 PRs99 releases

Cover: Google Bets LiteRT-LM Owns Edge LLMs

2026-W18·27 min read

Google Bets LiteRT-LM Owns Edge LLMs

5,247 commits4,147 PRs150 releases

Cover: DeepSeek V4 Sets Off a Stackwide Sprint

2026-W17·19 min read

DeepSeek V4 Sets Off a Stackwide Sprint

4,069 commits3,134 PRs118 releases

Cover: Inference Layers Collapse Into One

2026-W16·20 min read

Inference Layers Collapse Into One

3,741 commits2,899 PRs107 releases

Cover: Local Runtimes Turn Into Serving Platforms

2026-W15·20 min read

Local Runtimes Turn Into Serving Platforms

3,731 commits2,941 PRs114 releases

Cover: Gemma 4 Ignites the KV-Cache Wars

2026-W14·17 min read

Gemma 4 Ignites the KV-Cache Wars

1,816 commits1,330 PRs101 releases

Cover: KV Cache Wars Go Local

2026-W13·18 min read

KV Cache Wars Go Local

1,956 commits1,714 PRs92 releases

Powered by RunAnywhere

The signal,
not the noise.

A weekly briefing for engineers who ship inference infrastructure for a living. Every link is cited. Every claim is grounded in code.

Subscribe on Inference Radar Visit inference-radar.com

RunAnywhere

RunAnywhere Labs

A research-first inference lab. We hand-write the kernels that make consumer silicon fast — and open-source the SDKs and infrastructure that run them on every platform.

Inference

MetalRT binaries

Open Source

Research

Inference Radar

Company

© 2026 RunAnywhere, Inc.