New Research

Every claim comes with numbers.

We publish our benchmark results and engineering deep-dives openly. On-device inference - fast, private, hardware-native.

Benchmarks · Apple M4 Max

LLM Decode

higher is better

RunAnywhere

658 tok/s

Apple MLX

553 tok/s

llama.cpp

394 tok/s

Time to First Token

lower is better

RunAnywhere

6.6ms

Apple MLX

8ms

llama.cpp

11ms

Speech-to-Text

lower is better

RunAnywhere

101ms

Apple MLX

465ms

Speech-to-Speech

higher is better

RunAnywhere

123 tok/s

mlx-audio

81 tok/s

MetalRT

Custom kernel inference engine for Apple Silicon. Record-setting LLM, speech, vision, and speech-to-speech performance.

MetalRT · Speech-to-SpeechMar 15, 2026

MetalRT Now Does Speech-to-Speech. 1.52x Faster Than mlx-audio.

Read the benchmarks

123 tok/s

S2S throughput

MetalRT · VisionMar 13, 2026

MetalRT Now Runs Vision Language Models. Fastest on Apple Silicon.

Read the benchmarks

287 tok/s

vision decode

MetalRT · SpeechMar 9, 2026

The First Complete AI Inference Engine for Apple Silicon. Now with Speech.

Read the benchmarks

101ms

STT latency

MetalRT · LLMMar 3, 2026

We Built the Fastest LLM Decode Engine for Apple Silicon.

Read the benchmarks

658 tok/s

LLM decode

FastVoice

End-to-end on-device voice AI. Co-scheduled inference for sub-100ms first-audio latency.

FastVoice · RAGFeb 24, 2026

FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device.

Read the benchmarks

<200ms

first-audio

FastVoice · PipelineFeb 22, 2026

63ms First-Audio Latency for On-Device Voice AI on Apple Silicon.

Read the benchmarks

63ms

first-audio latency