About RunAnywhere

We left big tech to write custom kernels by hand.

We build inference engines and production infrastructure that put AI where it runs best — on-device, on-prem, in the cloud, or in between. Custom kernels, cross-platform SDKs, and vision agents — from research to production. Backed by Y Combinator.

0 tok/s

LLM decode

0ms

STT latency

0ms

first-audio

Our approach

Research that ships.

Research is product

Every benchmark we publish runs on the same engine that powers production applications. MetalRT isn't a prototype - it's a C++ inference runtime that developers use today.

Hard systems problems

Memory-efficient model execution, co-scheduled multi-modal pipelines, and hardware-specific optimization for Apple Silicon's Metal GPU architecture.

Open by default

When we claim a speed record, we show the numbers, the methodology, and the hardware configuration so others can reproduce and build on our work.

qmv.metal

// qmv.metal - 4-bit quantized matmul

kernel void qmv_decode(

device const uint8_t* W [[buffer(0)]],

device const half* x [[buffer(1)]],

device half* y [[buffer(2)]],

uint tid [[thread_position_in_grid]]

) {

half4 acc = half4(0.0h);

// nibble pre-scaling trick

acc += unpack_nibbles(W, tid) * x;

y[tid] = simd_sum(acc);

}

Founders

Built by engineers who ship at scale.

Sanchit Monga

Co-Founder & CEO

Sanchit built mobile SDKs used by 50M+ users at Intuit - and watched every team struggle with the same problem: cloud AI was too slow, too expensive, and too fragile for real products. He left to fix it. At RunAnywhere, he leads the SDK and product layer - making on-device AI deployment feel as simple as a cloud API call, across Swift, Kotlin, React Native, and Flutter.

Ex-IntuitYC W2650M+ SDK Users

Shubham Malhotra

Co-Founder & CTO

Shubham spent years building infrastructure at Microsoft Azure and AWS - systems that needed to be fast, reliable, and globally distributed. He applies that same discipline to MetalRT, writing every GPU kernel by hand in Metal Shading Language to squeeze maximum performance from Apple Silicon. He leads the inference engine research, the fleet-scale control plane, and the systems architecture behind RunAnywhere.

Ex-MicrosoftEx-AWSYC W26