About RunAnywhere
We left big tech to write custom kernels by hand.
We build inference engines and ship production infrastructure for on-device AI. Custom kernels, cross-platform SDKs, and fleet observability - from research to production. Backed by Y Combinator.
0 tok/s
LLM decode
0ms
STT latency
0ms
first-audio
Our approach
Research that ships.
Research is product
Every benchmark we publish runs on the same engine that powers production applications. MetalRT isn't a prototype - it's a C++ inference runtime that developers use today.
Hard systems problems
Memory-efficient model execution, co-scheduled multi-modal pipelines, and hardware-specific optimization for Apple Silicon's Metal GPU architecture.
Open by default
When we claim a speed record, we show the numbers, the methodology, and the hardware configuration so others can reproduce and build on our work.
// qmv.metal - 4-bit quantized matmul
kernel void qmv_decode(
device const uint8_t* W [[buffer(0)]],
device const half* x [[buffer(1)]],
device half* y [[buffer(2)]],
uint tid [[thread_position_in_grid]]
) {
half4 acc = half4(0.0h);
// nibble pre-scaling trick
acc += unpack_nibbles(W, tid) * x;
y[tid] = simd_sum(acc);
}
Founders
Built by engineers who ship at scale.
Sanchit built mobile SDKs used by 50M+ users at Intuit - and watched every team struggle with the same problem: cloud AI was too slow, too expensive, and too fragile for real products. He left to fix it. At RunAnywhere, he leads the SDK and product layer - making on-device AI deployment feel as simple as a cloud API call, across Swift, Kotlin, React Native, and Flutter.
Shubham spent years building infrastructure at Microsoft Azure and AWS - systems that needed to be fast, reliable, and globally distributed. He applies that same discipline to MetalRT, writing every GPU kernel by hand in Metal Shading Language to squeeze maximum performance from Apple Silicon. He leads the inference engine research, the fleet-scale control plane, and the systems architecture behind RunAnywhere.

