VoiceFebruary 24, 2026
FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device
We added hybrid retrieval (BM25 + vector search) to our on-device voice pipeline. Retrieval adds less than 4ms. The real cost is LLM prefill — but word-level flushing absorbs it. Sub-200ms first-audio on 5,016 chunks with zero cloud dependencies.
