New Research
Every claim comes with numbers.
We publish our benchmark results and engineering deep-dives openly. On-device inference - fast, private, hardware-native.
Benchmarks · Apple M4 Max
LLM Decode
higher is betterRunAnywhere
658 tok/s
Apple MLX
553 tok/s
llama.cpp
394 tok/s
Time to First Token
lower is betterRunAnywhere
6.6ms
Apple MLX
8ms
llama.cpp
11ms
Speech-to-Text
lower is betterRunAnywhere
101ms
Apple MLX
465ms
Speech-to-Speech
higher is betterRunAnywhere
123 tok/s
mlx-audio
81 tok/s
MetalRT
Custom kernel inference engine for Apple Silicon. Record-setting LLM, speech, vision, and speech-to-speech performance.
MetalRT · Speech-to-SpeechMar 15, 2026
MetalRT Now Does Speech-to-Speech. 1.52x Faster Than mlx-audio.
Read the benchmarks123 tok/s
S2S throughput
MetalRT · VisionMar 13, 2026
MetalRT Now Runs Vision Language Models. Fastest on Apple Silicon.
Read the benchmarks287 tok/s
vision decode
MetalRT · SpeechMar 9, 2026
The First Complete AI Inference Engine for Apple Silicon. Now with Speech.
Read the benchmarks101ms
STT latency
MetalRT · LLMMar 3, 2026
We Built the Fastest LLM Decode Engine for Apple Silicon.
Read the benchmarks658 tok/s
LLM decode
FastVoice
End-to-end on-device voice AI. Co-scheduled inference for sub-100ms first-audio latency.