January 30, 2026

·

RunAnywhere Kotlin SDK Part 4: Building a Voice Assistant with VAD

RunAnywhere Kotlin SDK Part 4: Building a Voice Assistant with VAD
DEVELOPERS

A Complete Voice Assistant Running Entirely On-Device


This is Part 4 of our RunAnywhere Kotlin SDK tutorial series:

  1. Chat with LLMs — Project setup and streaming text generation
  2. Speech-to-Text — Real-time transcription with Whisper
  3. Text-to-Speech — Natural voice synthesis with Piper
  4. Voice Pipeline (this post) — Full voice assistant with VAD

This is the culmination of the series: a voice assistant that automatically detects when you stop speaking, processes your request with an LLM, and responds with synthesized speech—all running on-device on Android.

Prerequisites

  • Complete Parts 1-3 to have all three model types (LLM, STT, TTS) working in your project
  • Physical device required — the pipeline uses microphone input
  • All three models downloaded (~495MB total: 400 + 75 + 20)

The Voice Pipeline Flow

text
1┌─────────────────────────────────────────────────────────────────┐
2│ Voice Assistant Pipeline │
3├─────────────────────────────────────────────────────────────────┤
4│ │
5│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
6│ │ Record │ -> │ STT │ -> │ LLM │ -> │ TTS │ │
7│ │ + VAD │ │ Whisper │ │ SmolLM2 │ │ Piper │ │
8│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
9│ │ │ │
10│ │ Auto-stop when │ │
11│ └────────── silence detected ────────────────┘ │
12│ │
13└─────────────────────────────────────────────────────────────────┘

The streamVoiceSession API

Instead of manually wiring up VAD, STT, LLM, and TTS, the RunAnywhere SDK provides streamVoiceSession—a single API that handles the entire voice pipeline. You provide a continuous audio stream, and the SDK emits events as it listens, transcribes, generates responses, and synthesizes speech.

Audio Capture as a Flow

First, create an audio capture utility that provides a Flow<ByteArray> for streamVoiceSession. Create AudioCapture.kt:

kotlin
1package com.example.localaiplayground.domain.services
2
3import android.Manifest
4import android.media.AudioFormat
5import android.media.AudioRecord
6import android.media.MediaRecorder
7import androidx.annotation.RequiresPermission
8import kotlinx.coroutines.channels.awaitClose
9import kotlinx.coroutines.flow.Flow
10import kotlinx.coroutines.flow.callbackFlow
11
12object AudioCapture {
13 private const val SAMPLE_RATE = 16000
14 private const val CHUNK_SIZE_MS = 100
15
16 @RequiresPermission(Manifest.permission.RECORD_AUDIO)
17 fun startCapture(): Flow<ByteArray> = callbackFlow {
18 val bufferSize = AudioRecord.getMinBufferSize(
19 SAMPLE_RATE,
20 AudioFormat.CHANNEL_IN_MONO,
21 AudioFormat.ENCODING_PCM_16BIT
22 )
23
24 val recorder = AudioRecord(
25 MediaRecorder.AudioSource.MIC,
26 SAMPLE_RATE,
27 AudioFormat.CHANNEL_IN_MONO,
28 AudioFormat.ENCODING_PCM_16BIT,
29 bufferSize * 2
30 )
31
32 recorder.startRecording()
33
34 val chunkSize = SAMPLE_RATE * 2 * CHUNK_SIZE_MS / 1000 // bytes per chunk
35 val buffer = ByteArray(chunkSize)
36
37 try {
38 while (!isClosedForSend) {
39 val bytesRead = recorder.read(buffer, 0, chunkSize)
40 if (bytesRead > 0) {
41 trySend(buffer.copyOf(bytesRead))
42 }
43 }
44 } finally {
45 recorder.stop()
46 recorder.release()
47 }
48
49 awaitClose { }
50 }
51}

This produces a continuous stream of PCM audio chunks at 16kHz mono—exactly what streamVoiceSession expects.

Pipeline State

Create VoiceAssistantViewModel.kt:

kotlin
1package com.example.localaiplayground.presentation.voice
2
3import android.app.Application
4import androidx.lifecycle.AndroidViewModel
5import androidx.lifecycle.viewModelScope
6import com.example.localaiplayground.domain.services.AudioCapture
7import com.runanywhere.sdk.public.RunAnywhere
8import com.runanywhere.sdk.public.extensions.VoiceAgent.VoiceSessionConfig
9import com.runanywhere.sdk.public.extensions.VoiceAgent.VoiceSessionEvent
10import com.runanywhere.sdk.public.extensions.isVoiceAgentReady
11import com.runanywhere.sdk.public.extensions.streamVoiceSession
12import kotlinx.coroutines.*
13import kotlinx.coroutines.flow.*
14
15enum class PipelineState {
16 IDLE,
17 LISTENING,
18 PROCESSING,
19 SPEAKING
20}
21
22data class VoiceMessage(val text: String, val role: String) // "user" or "ai"
23
24data class VoiceAssistantUiState(
25 val pipelineState: PipelineState = PipelineState.IDLE,
26 val messages: List<VoiceMessage> = emptyList(),
27 val audioLevel: Float = 0f,
28 val error: String? = null,
29 val isReady: Boolean = false
30)
31
32class VoiceAssistantViewModel(application: Application) : AndroidViewModel(application) {
33 private val _uiState = MutableStateFlow(VoiceAssistantUiState())
34 val uiState: StateFlow<VoiceAssistantUiState> = _uiState.asStateFlow()
35
36 private var sessionJob: Job? = null
37
38 init {
39 checkReadiness()
40 }
41
42 private fun checkReadiness() {
43 viewModelScope.launch {
44 val isReady = RunAnywhere.isVoiceAgentReady()
45 _uiState.update { it.copy(isReady = isReady) }
46 }
47 }
48
49 fun start() {
50 if (_uiState.value.pipelineState != PipelineState.IDLE) return
51 if (!_uiState.value.isReady) {
52 _uiState.update { it.copy(error = "Models not loaded") }
53 return
54 }
55
56 sessionJob = viewModelScope.launch {
57 _uiState.update {
58 it.copy(
59 pipelineState = PipelineState.LISTENING,
60 error = null
61 )
62 }
63
64 try {
65 val audioFlow = AudioCapture.startCapture()
66
67 val config = VoiceSessionConfig(
68 silenceDuration = 1.5, // seconds of silence to trigger processing
69 speechThreshold = 0.1f, // audio level threshold (0.0-1.0)
70 autoPlayTTS = false, // we'll handle playback manually
71 continuousMode = true // auto-resume after each turn
72 )
73
74 RunAnywhere.streamVoiceSession(audioFlow, config).collect { event ->
75 when (event) {
76 is VoiceSessionEvent.Listening -> {
77 _uiState.update {
78 it.copy(
79 pipelineState = PipelineState.LISTENING,
80 audioLevel = event.audioLevel
81 )
82 }
83 }
84 is VoiceSessionEvent.SpeechStarted -> {
85 // User started talking
86 }
87 is VoiceSessionEvent.Processing -> {
88 _uiState.update {
89 it.copy(pipelineState = PipelineState.PROCESSING)
90 }
91 }
92 is VoiceSessionEvent.Transcribed -> {
93 _uiState.update {
94 it.copy(
95 messages = it.messages + VoiceMessage(event.text, "user")
96 )
97 }
98 }
99 is VoiceSessionEvent.Responded -> {
100 _uiState.update {
101 it.copy(
102 messages = it.messages + VoiceMessage(event.text, "ai")
103 )
104 }
105 }
106 is VoiceSessionEvent.TurnCompleted -> {
107 _uiState.update {
108 it.copy(pipelineState = PipelineState.SPEAKING)
109 }
110 // Play the synthesized audio
111 event.audio?.let { audio ->
112 playWavAudio(audio)
113 }
114 // continuousMode resumes listening automatically
115 }
116 is VoiceSessionEvent.Error -> {
117 _uiState.update {
118 it.copy(error = event.message)
119 }
120 }
121 is VoiceSessionEvent.Stopped -> {
122 _uiState.update {
123 it.copy(pipelineState = PipelineState.IDLE)
124 }
125 }
126 else -> { /* Handle other events as needed */ }
127 }
128 }
129
130 } catch (e: Exception) {
131 _uiState.update {
132 it.copy(
133 pipelineState = PipelineState.IDLE,
134 error = e.message
135 )
136 }
137 }
138 }
139 }
140
141 fun stop() {
142 sessionJob?.cancel()
143 sessionJob = null
144 _uiState.update { it.copy(pipelineState = PipelineState.IDLE) }
145 }
146
147 private fun playWavAudio(audioData: ByteArray) {
148 // Parse WAV header and play via AudioTrack
149 // (See Part 3 for the WAV playback implementation)
150 }
151
152 override fun onCleared() {
153 super.onCleared()
154 sessionJob?.cancel()
155 }
156}

Key difference from Parts 1-3: Instead of manually calling transcribe(), chat(), and synthesize() in sequence, streamVoiceSession handles the entire pipeline internally. You just provide audio input and react to events.

Voice Assistant Screen

Create VoiceAssistantScreen.kt:

kotlin
1package com.example.localaiplayground.presentation.voice
2
3import androidx.compose.foundation.background
4import androidx.compose.foundation.clickable
5import androidx.compose.foundation.layout.*
6import androidx.compose.foundation.lazy.LazyColumn
7import androidx.compose.foundation.lazy.items
8import androidx.compose.foundation.shape.CircleShape
9import androidx.compose.foundation.shape.RoundedCornerShape
10import androidx.compose.material3.*
11import androidx.compose.runtime.*
12import androidx.compose.ui.Alignment
13import androidx.compose.ui.Modifier
14import androidx.compose.ui.draw.clip
15import androidx.compose.ui.graphics.Color
16import androidx.compose.ui.unit.dp
17import androidx.lifecycle.viewmodel.compose.viewModel
18
19@Composable
20fun VoiceAssistantScreen(
21 viewModel: VoiceAssistantViewModel = viewModel()
22) {
23 val uiState by viewModel.uiState.collectAsState()
24
25 Column(
26 modifier = Modifier
27 .fillMaxSize()
28 .background(Color.Black)
29 .padding(24.dp),
30 horizontalAlignment = Alignment.CenterHorizontally
31 ) {
32 // State indicator
33 StateIndicator(state = uiState.pipelineState)
34
35 Spacer(modifier = Modifier.height(24.dp))
36
37 // Error message
38 uiState.error?.let { error ->
39 Surface(
40 shape = RoundedCornerShape(8.dp),
41 color = Color.Red.copy(alpha = 0.1f),
42 modifier = Modifier.fillMaxWidth()
43 ) {
44 Text(
45 text = error,
46 color = Color.Red,
47 modifier = Modifier.padding(12.dp)
48 )
49 }
50 Spacer(modifier = Modifier.height(16.dp))
51 }
52
53 // Conversation messages
54 LazyColumn(
55 modifier = Modifier.weight(1f),
56 verticalArrangement = Arrangement.spacedBy(8.dp)
57 ) {
58 items(uiState.messages) { message ->
59 ConversationBubble(
60 label = if (message.role == "user") "You:" else "Assistant:",
61 text = message.text,
62 color = if (message.role == "user") Color(0xFF007AFF) else Color(0xFF44FF44)
63 )
64 }
65 }
66
67 // Audio level indicator
68 if (uiState.pipelineState == PipelineState.LISTENING) {
69 LinearProgressIndicator(
70 progress = { uiState.audioLevel },
71 modifier = Modifier
72 .fillMaxWidth()
73 .height(4.dp),
74 color = Color.Red,
75 trackColor = Color.DarkGray
76 )
77 Spacer(modifier = Modifier.height(16.dp))
78 }
79
80 // Main button
81 MainButton(
82 state = uiState.pipelineState,
83 isReady = uiState.isReady,
84 onClick = {
85 when (uiState.pipelineState) {
86 PipelineState.IDLE -> viewModel.start()
87 else -> viewModel.stop()
88 }
89 }
90 )
91
92 Spacer(modifier = Modifier.height(16.dp))
93
94 Text(
95 text = getStateHint(uiState.pipelineState),
96 color = Color.Gray,
97 style = MaterialTheme.typography.bodySmall
98 )
99
100 if (!uiState.isReady) {
101 Spacer(modifier = Modifier.height(8.dp))
102 Text(
103 text = "Please load LLM, STT, and TTS models first",
104 color = Color(0xFFFFAA00),
105 style = MaterialTheme.typography.bodySmall
106 )
107 }
108 }
109}
110
111@Composable
112private fun StateIndicator(state: PipelineState) {
113 Row(
114 verticalAlignment = Alignment.CenterVertically,
115 horizontalArrangement = Arrangement.Center
116 ) {
117 Box(
118 modifier = Modifier
119 .size(12.dp)
120 .clip(CircleShape)
121 .background(getStateColor(state))
122 )
123 Spacer(modifier = Modifier.width(8.dp))
124 Text(
125 text = getStateText(state),
126 color = Color.White,
127 style = MaterialTheme.typography.titleMedium
128 )
129 }
130}
131
132@Composable
133private fun ConversationBubble(label: String, text: String, color: Color) {
134 Surface(
135 modifier = Modifier.fillMaxWidth(),
136 shape = RoundedCornerShape(12.dp),
137 color = color.copy(alpha = 0.1f)
138 ) {
139 Column(modifier = Modifier.padding(16.dp)) {
140 Text(text = label, color = Color.Gray, style = MaterialTheme.typography.labelSmall)
141 Spacer(modifier = Modifier.height(4.dp))
142 Text(text = text, color = Color.White, style = MaterialTheme.typography.bodyLarge)
143 }
144 }
145}
146
147@Composable
148private fun MainButton(state: PipelineState, isReady: Boolean, onClick: () -> Unit) {
149 Box(
150 modifier = Modifier
151 .size(100.dp)
152 .clip(CircleShape)
153 .background(
154 when {
155 !isReady -> Color.Gray
156 state == PipelineState.IDLE -> Color(0xFF007AFF)
157 else -> Color.Red
158 }
159 )
160 .clickable(enabled = isReady, onClick = onClick),
161 contentAlignment = Alignment.Center
162 ) {
163 Text(
164 text = if (state == PipelineState.IDLE) "🎤" else "⬛",
165 style = MaterialTheme.typography.headlineLarge
166 )
167 }
168}
169
170private fun getStateColor(state: PipelineState): Color = when (state) {
171 PipelineState.IDLE -> Color.Gray
172 PipelineState.LISTENING -> Color.Red
173 PipelineState.PROCESSING -> Color(0xFFFFAA00)
174 PipelineState.SPEAKING -> Color(0xFF44FF44)
175}
176
177private fun getStateText(state: PipelineState): String = when (state) {
178 PipelineState.IDLE -> "Ready"
179 PipelineState.LISTENING -> "Listening..."
180 PipelineState.PROCESSING -> "Processing..."
181 PipelineState.SPEAKING -> "Speaking..."
182}
183
184private fun getStateHint(state: PipelineState): String = when (state) {
185 PipelineState.IDLE -> "Tap to start"
186 PipelineState.LISTENING -> "Stops automatically when you pause"
187 PipelineState.PROCESSING -> "Transcribing and generating response..."
188 PipelineState.SPEAKING -> "Playing audio response..."
189}
Voice pipeline on Android

Best Practices

1. Preload Models on App Start

kotlin
1// In RunAnywhereApp.kt or during onboarding
2suspend fun preloadAllModels() {
3 downloadAndLoadLLM("smollm2-360m-instruct-q8_0")
4 downloadAndLoadSTT("sherpa-onnx-whisper-tiny.en")
5 downloadAndLoadTTS("vits-piper-en_US-lessac-medium")
6}

2. Audio Format Summary

ComponentSample RateFormatChannels
AudioRecord16,000 HzInt161
Whisper STT16,000 HzInt161
Piper TTS Output22,050 HzWAV/Int161
AudioTrackMatch outputInt161

3. Check Model State

kotlin
1// The SDK provides a single check for voice agent readiness
2val isReady = RunAnywhere.isVoiceAgentReady() // checks LLM + STT + TTS

4. Session Configuration Tuning

kotlin
1// For noisy environments, adjust the session config
2val config = VoiceSessionConfig(
3 silenceDuration = 2.0, // Longer pause tolerance (seconds)
4 speechThreshold = 0.2f, // Higher for noisy environments
5 autoPlayTTS = false,
6 continuousMode = true
7)

5. Prevent Concurrent Operations

kotlin
1fun start() {
2 if (_uiState.value.pipelineState != PipelineState.IDLE) return // Prevent double-starts
3 // ...
4}

Models Reference

TypeModel IDSizeNotes
LLMsmollm2-360m-instruct-q8_0~400MBSmolLM2, recommended
STTsherpa-onnx-whisper-tiny.en~75MBEnglish
TTSvits-piper-en_US-lessac-medium~20MBUS English

Conclusion

You've built a complete voice assistant that:

  • Listens with automatic speech detection
  • Transcribes using on-device Whisper
  • Thinks with a local LLM
  • Responds with natural TTS

All processing happens on-device. No data ever leaves the phone. No API keys. No cloud costs. Pure native Android performance with Kotlin and Jetpack Compose.

This is the future of private, native Android AI applications.


Complete Source Code

The full source code is available on GitHub:

Kotlin Starter App

Includes:

  • Starter Kotlin app matching this tutorial series
  • MVVM architecture with ViewModel + StateFlow
  • Jetpack Compose UI
  • LLM chat, STT, TTS, and voice pipeline

Resources


Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Logo

RunAnywhere

Connect with developers, share ideas, get support, and stay updated on the latest features. Our Discord community is the heart of everything we build.

Company

Copyright © 2025 RunAnywhere, Inc.