Blog

Engineering notes from every layer of the on-device stack

All QHexRT2 MetalRT4 SDKs3 Agents2 Voice2

February 24, 2026

FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device

We added hybrid retrieval (BM25 + vector search) to our on-device voice pipeline. Retrieval adds less than 4ms. The real cost is LLM prefill — but word-level flushing absorbs it. Sub-200ms first-audio on 5,016 chunks with zero cloud dependencies.

Voice

February 22, 2026

FastVoice: 63ms First-Audio Latency for On-Device Voice AI on Apple Silicon

FastVoice achieves 63ms first-audio latency — well under the 200ms perceptual threshold — by composing STT, LLM, and TTS into a single C++ pipeline on Apple Silicon. No cloud. No network. Just speed.