Y Combinator

Backed by Y Combinator

New Research

Every claim comes with numbers.

We publish our benchmark results and engineering deep-dives openly. On-device inference - fast, private, hardware-native.

Benchmarks · Apple M4 Max

LLM Decode

higher is better
RunAnywhere
658 tok/s
Apple MLX
553 tok/s
llama.cpp
394 tok/s

Time to First Token

lower is better
RunAnywhere
6.6ms
Apple MLX
8ms
llama.cpp
11ms

Speech-to-Text

lower is better
RunAnywhere
101ms
Apple MLX
465ms

Speech-to-Speech

higher is better
RunAnywhere
123 tok/s
mlx-audio
81 tok/s

MetalRT

Custom kernel inference engine for Apple Silicon. Record-setting LLM, speech, vision, and speech-to-speech performance.

MetalRT · Speech-to-SpeechMar 15, 2026

MetalRT Now Does Speech-to-Speech. 1.52x Faster Than mlx-audio.

Read the benchmarks

123 tok/s

S2S throughput

MetalRT · VisionMar 13, 2026

MetalRT Now Runs Vision Language Models. Fastest on Apple Silicon.

Read the benchmarks

287 tok/s

vision decode

MetalRT · SpeechMar 9, 2026

The First Complete AI Inference Engine for Apple Silicon. Now with Speech.

Read the benchmarks

101ms

STT latency

MetalRT · LLMMar 3, 2026

We Built the Fastest LLM Decode Engine for Apple Silicon.

Read the benchmarks

658 tok/s

LLM decode

FastVoice

End-to-end on-device voice AI. Co-scheduled inference for sub-100ms first-audio latency.

FastVoice · RAGFeb 24, 2026

FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device.

Read the benchmarks

<200ms

first-audio

FastVoice · PipelineFeb 22, 2026

63ms First-Audio Latency for On-Device Voice AI on Apple Silicon.

Read the benchmarks

63ms

first-audio latency

RunAnywhere Logo

RunAnywhere

On-device AI inference research and infrastructure. Building the fastest engines for the hardware you already own.

© 2026 RunAnywhere, Inc.

Playground