Y Combinator

Backed by Y Combinator

About RunAnywhere

We left big tech to write custom kernels by hand.

We build inference engines and ship production infrastructure for on-device AI. Custom kernels, cross-platform SDKs, and fleet observability - from research to production. Backed by Y Combinator.

0 tok/s

LLM decode

0ms

STT latency

0ms

first-audio

Our approach

Research that ships.

01

Research is product

Every benchmark we publish runs on the same engine that powers production applications. MetalRT isn't a prototype - it's a C++ inference runtime that developers use today.

02

Hard systems problems

Memory-efficient model execution, co-scheduled multi-modal pipelines, and hardware-specific optimization for Apple Silicon's Metal GPU architecture.

03

Open by default

When we claim a speed record, we show the numbers, the methodology, and the hardware configuration so others can reproduce and build on our work.

qmv.metal

// qmv.metal - 4-bit quantized matmul

kernel void qmv_decode(

device const uint8_t* W [[buffer(0)]],

device const half* x [[buffer(1)]],

device half* y [[buffer(2)]],

uint tid [[thread_position_in_grid]]

) {

half4 acc = half4(0.0h);

// nibble pre-scaling trick

acc += unpack_nibbles(W, tid) * x;

y[tid] = simd_sum(acc);

}

Founders

Built by engineers who ship at scale.

Sanchit Monga

Sanchit Monga

Co-Founder & CEO

Sanchit built mobile SDKs used by 50M+ users at Intuit - and watched every team struggle with the same problem: cloud AI was too slow, too expensive, and too fragile for real products. He left to fix it. At RunAnywhere, he leads the SDK and product layer - making on-device AI deployment feel as simple as a cloud API call, across Swift, Kotlin, React Native, and Flutter.

Ex-IntuitYC W2650M+ SDK Users
Shubham Malhotra

Shubham Malhotra

Co-Founder & CTO

Shubham spent years building infrastructure at Microsoft Azure and AWS - systems that needed to be fast, reliable, and globally distributed. He applies that same discipline to MetalRT, writing every GPU kernel by hand in Metal Shading Language to squeeze maximum performance from Apple Silicon. He leads the inference engine research, the fleet-scale control plane, and the systems architecture behind RunAnywhere.

Ex-MicrosoftEx-AWSYC W26
RunAnywhere Logo

RunAnywhere

On-device AI inference research and infrastructure. Building the fastest engines for the hardware you already own.

© 2026 RunAnywhere, Inc.

Playground