Y Combinator

Backed by Y Combinator

All issues
Inference Radar·2026-W20·May 14 — May 20, 2026·16 min read

llama.cpp Shoves MTP Into the Mainstream

The inference stack is no longer waiting for a single blockbuster model drop to move forward. This week, cloud servers, local runtimes, Apple Silicon stacks, and edge frameworks all converged on the same practical bottlenecks: speculative decoding, KV-cache pressure, low-bit kernels, and production security.

Cover for llama.cpp Shoves MTP Into the Mainstream
3,702 commits
3,109 PRs
1,340 issues
95 releases
76 active repos
Weekly activity by organization

Weekly briefing

Get the next issue in your inbox.

One email, every week. Every link cited. No fluff, no crypto analogies.

Subscribe on Inference Radar
RunAnywhere

RunAnywhere Labs

We build the engines, SDKs, and agents that put inference where latency, cost, and privacy want it — on-prem, cloud, edge, or in between.

© 2026 RunAnywhere, Inc.