January 28, 2026

·

RunAnywhere Kotlin SDK Part 2: Speech-to-Text with Whisper

RunAnywhere Kotlin SDK Part 2: Speech-to-Text with Whisper
DEVELOPERS

Real-Time Transcription with On-Device Whisper


This is Part 2 of our RunAnywhere Kotlin SDK tutorial series:

  1. Chat with LLMs — Project setup and streaming text generation
  2. Speech-to-Text (this post) — Real-time transcription with Whisper
  3. Text-to-Speech — Natural voice synthesis with Piper
  4. Voice Pipeline — Full voice assistant with VAD

Speech recognition unlocks natural interaction with your app. With RunAnywhere, you can run Whisper entirely on-device—no network requests, no privacy concerns, no API costs.

The key challenge on Android is configuring the AudioRecord API to output audio in the format Whisper expects.

Prerequisites

  • Complete Part 1 first to set up your project with the RunAnywhere SDK
  • Physical device required — emulator microphone support is limited
  • ~75MB additional storage for the Whisper model

Register the STT Model

Add Whisper to your model registration in RunAnywhereApp.kt:

kotlin
1import com.runanywhere.sdk.core.types.InferenceFramework
2import com.runanywhere.sdk.public.extensions.Models.ModelCategory
3import com.runanywhere.sdk.public.extensions.registerModel
4
5// Register STT model (Whisper)
6RunAnywhere.registerModel(
7 id = "sherpa-onnx-whisper-tiny.en",
8 name = "Whisper Tiny English",
9 url = "https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz",
10 framework = InferenceFramework.ONNX,
11 modality = ModelCategory.SPEECH_RECOGNITION,
12 memoryRequirement = 75_000_000
13)

Critical: Audio Format Requirements

Whisper requires a very specific audio format:

ParameterRequired Value
Sample Rate16,000 Hz
Channels1 (mono)
Format16-bit signed integer (Int16) PCM

Android's AudioRecord defaults to different settings. You MUST configure it correctly.

Audio Recording Service

Create AudioCaptureService.kt:

kotlin
1package com.example.localaiplayground.domain.services
2
3import android.Manifest
4import android.content.Context
5import android.content.pm.PackageManager
6import android.media.AudioFormat
7import android.media.AudioRecord
8import android.media.MediaRecorder
9import android.util.Log
10import androidx.core.app.ActivityCompat
11import kotlinx.coroutines.Dispatchers
12import kotlinx.coroutines.flow.MutableStateFlow
13import kotlinx.coroutines.flow.StateFlow
14import kotlinx.coroutines.withContext
15import java.io.ByteArrayOutputStream
16
17class AudioCaptureService(private val context: Context) {
18 companion object {
19 private const val TAG = "AudioCaptureService"
20 private const val SAMPLE_RATE = 16000 // Whisper requires 16kHz
21 private const val CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO
22 private const val AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT
23 }
24
25 private var audioRecord: AudioRecord? = null
26 private var isRecording = false
27 private val audioBuffer = ByteArrayOutputStream()
28
29 private val _audioLevel = MutableStateFlow(0f)
30 val audioLevel: StateFlow<Float> = _audioLevel
31
32 fun hasPermission(): Boolean {
33 return ActivityCompat.checkSelfPermission(
34 context,
35 Manifest.permission.RECORD_AUDIO
36 ) == PackageManager.PERMISSION_GRANTED
37 }
38
39 suspend fun startRecording(): Result<Unit> = withContext(Dispatchers.IO) {
40 if (!hasPermission()) {
41 return@withContext Result.failure(SecurityException("Microphone permission not granted"))
42 }
43
44 try {
45 val bufferSize = AudioRecord.getMinBufferSize(
46 SAMPLE_RATE,
47 CHANNEL_CONFIG,
48 AUDIO_FORMAT
49 )
50
51 audioRecord = AudioRecord(
52 MediaRecorder.AudioSource.MIC,
53 SAMPLE_RATE,
54 CHANNEL_CONFIG,
55 AUDIO_FORMAT,
56 bufferSize * 2 // Double buffer for safety
57 )
58
59 audioBuffer.reset()
60 isRecording = true
61
62 audioRecord?.startRecording()
63 Log.d(TAG, "Recording started at $SAMPLE_RATE Hz")
64
65 // Read audio data in a loop
66 val buffer = ShortArray(bufferSize)
67 while (isRecording) {
68 val read = audioRecord?.read(buffer, 0, buffer.size) ?: 0
69 if (read > 0) {
70 // Convert shorts to bytes
71 val byteBuffer = ByteArray(read * 2)
72 for (i in 0 until read) {
73 byteBuffer[i * 2] = (buffer[i].toInt() and 0xFF).toByte()
74 byteBuffer[i * 2 + 1] = (buffer[i].toInt() shr 8).toByte()
75 }
76 audioBuffer.write(byteBuffer)
77
78 // Calculate audio level for visualization
79 val rms = calculateRMS(buffer, read)
80 _audioLevel.value = rms
81 }
82 }
83
84 Result.success(Unit)
85 } catch (e: Exception) {
86 Log.e(TAG, "Recording error", e)
87 Result.failure(e)
88 }
89 }
90
91 fun stopRecording(): ByteArray {
92 isRecording = false
93
94 audioRecord?.apply {
95 stop()
96 release()
97 }
98 audioRecord = null
99
100 _audioLevel.value = 0f
101
102 val audioData = audioBuffer.toByteArray()
103 Log.d(TAG, "Recording stopped: ${audioData.size} bytes")
104 return audioData
105 }
106
107 private fun calculateRMS(buffer: ShortArray, length: Int): Float {
108 var sum = 0.0
109 for (i in 0 until length) {
110 sum += buffer[i] * buffer[i]
111 }
112 val rms = kotlin.math.sqrt(sum / length)
113 // Normalize to 0-1 range (max short value is 32767)
114 return (rms / 32767f).toFloat().coerceIn(0f, 1f)
115 }
116}

Important: The 16kHz sample rate and mono configuration are non-negotiable. Sending audio in a different format will produce garbage output.

STT ViewModel

Create SpeechToTextViewModel.kt:

kotlin
1package com.example.localaiplayground.presentation.stt
2
3import android.app.Application
4import androidx.lifecycle.AndroidViewModel
5import androidx.lifecycle.viewModelScope
6import com.example.localaiplayground.domain.services.AudioCaptureService
7import com.runanywhere.sdk.public.RunAnywhere
8import com.runanywhere.sdk.public.extensions.availableModels
9import com.runanywhere.sdk.public.extensions.downloadModel
10import com.runanywhere.sdk.public.extensions.loadSTTModel
11import com.runanywhere.sdk.public.extensions.transcribe
12import kotlinx.coroutines.flow.*
13import kotlinx.coroutines.launch
14
15data class STTUiState(
16 val isLoading: Boolean = true,
17 val isModelLoaded: Boolean = false,
18 val downloadProgress: Float = 0f,
19 val isRecording: Boolean = false,
20 val isTranscribing: Boolean = false,
21 val transcription: String = "",
22 val audioLevel: Float = 0f,
23 val error: String? = null
24)
25
26class SpeechToTextViewModel(application: Application) : AndroidViewModel(application) {
27 private val _uiState = MutableStateFlow(STTUiState())
28 val uiState: StateFlow<STTUiState> = _uiState.asStateFlow()
29
30 private val audioService = AudioCaptureService(application)
31 private val modelId = "sherpa-onnx-whisper-tiny.en"
32
33 init {
34 loadModel()
35 observeAudioLevel()
36 }
37
38 private fun loadModel() {
39 viewModelScope.launch {
40 try {
41 val models = RunAnywhere.availableModels()
42 val isDownloaded = models.any { it.id == modelId && it.localPath != null }
43
44 if (!isDownloaded) {
45 RunAnywhere.downloadModel(modelId).collect { progress ->
46 _uiState.update {
47 it.copy(downloadProgress = progress.progress)
48 }
49 }
50 }
51
52 // Load STT model
53 RunAnywhere.loadSTTModel(modelId)
54
55 _uiState.update {
56 it.copy(isLoading = false, isModelLoaded = true)
57 }
58
59 } catch (e: Exception) {
60 _uiState.update {
61 it.copy(isLoading = false, error = e.message)
62 }
63 }
64 }
65 }
66
67 private fun observeAudioLevel() {
68 viewModelScope.launch {
69 audioService.audioLevel.collect { level ->
70 _uiState.update { it.copy(audioLevel = level) }
71 }
72 }
73 }
74
75 fun toggleRecording() {
76 if (_uiState.value.isRecording) {
77 stopAndTranscribe()
78 } else {
79 startRecording()
80 }
81 }
82
83 private fun startRecording() {
84 if (!audioService.hasPermission()) {
85 _uiState.update { it.copy(error = "Microphone permission required") }
86 return
87 }
88
89 viewModelScope.launch {
90 _uiState.update {
91 it.copy(isRecording = true, transcription = "", error = null)
92 }
93
94 audioService.startRecording()
95 }
96 }
97
98 private fun stopAndTranscribe() {
99 viewModelScope.launch {
100 _uiState.update {
101 it.copy(isRecording = false, isTranscribing = true)
102 }
103
104 try {
105 val audioData = audioService.stopRecording()
106
107 if (audioData.isNotEmpty()) {
108 val text = RunAnywhere.transcribe(audioData)
109 _uiState.update {
110 it.copy(transcription = text, isTranscribing = false)
111 }
112 } else {
113 _uiState.update {
114 it.copy(transcription = "No audio recorded", isTranscribing = false)
115 }
116 }
117
118 } catch (e: Exception) {
119 _uiState.update {
120 it.copy(
121 transcription = "Error: ${e.message}",
122 isTranscribing = false
123 )
124 }
125 }
126 }
127 }
128}

STT Screen

Create SpeechToTextScreen.kt:

kotlin
1package com.example.localaiplayground.presentation.stt
2
3import androidx.compose.foundation.background
4import androidx.compose.foundation.clickable
5import androidx.compose.foundation.layout.*
6import androidx.compose.foundation.shape.CircleShape
7import androidx.compose.foundation.shape.RoundedCornerShape
8import androidx.compose.material3.*
9import androidx.compose.runtime.*
10import androidx.compose.ui.Alignment
11import androidx.compose.ui.Modifier
12import androidx.compose.ui.draw.clip
13import androidx.compose.ui.graphics.Color
14import androidx.compose.ui.unit.dp
15import androidx.lifecycle.viewmodel.compose.viewModel
16
17@Composable
18fun SpeechToTextScreen(
19 viewModel: SpeechToTextViewModel = viewModel()
20) {
21 val uiState by viewModel.uiState.collectAsState()
22
23 Column(
24 modifier = Modifier
25 .fillMaxSize()
26 .background(Color.Black)
27 .padding(24.dp),
28 horizontalAlignment = Alignment.CenterHorizontally,
29 verticalArrangement = Arrangement.Center
30 ) {
31 // Loading state
32 if (uiState.isLoading) {
33 CircularProgressIndicator()
34 Spacer(modifier = Modifier.height(16.dp))
35 Text(
36 "Downloading model... ${(uiState.downloadProgress * 100).toInt()}%",
37 color = Color.White
38 )
39 LinearProgressIndicator(
40 progress = { uiState.downloadProgress },
41 modifier = Modifier
42 .fillMaxWidth()
43 .padding(top = 8.dp)
44 )
45 return
46 }
47
48 // Transcription display
49 Surface(
50 modifier = Modifier
51 .fillMaxWidth()
52 .heightIn(min = 100.dp),
53 shape = RoundedCornerShape(12.dp),
54 color = Color(0xFF111111)
55 ) {
56 Text(
57 text = uiState.transcription.ifEmpty { "Tap the microphone to record..." },
58 color = Color.White,
59 modifier = Modifier.padding(16.dp)
60 )
61 }
62
63 Spacer(modifier = Modifier.height(48.dp))
64
65 // Audio level indicator
66 if (uiState.isRecording) {
67 LinearProgressIndicator(
68 progress = { uiState.audioLevel },
69 modifier = Modifier
70 .fillMaxWidth()
71 .height(4.dp),
72 color = Color.Red,
73 trackColor = Color.DarkGray
74 )
75 Spacer(modifier = Modifier.height(16.dp))
76 }
77
78 // Record button
79 Box(
80 modifier = Modifier
81 .size(100.dp)
82 .clip(CircleShape)
83 .background(
84 if (uiState.isRecording) Color.Red else Color(0xFF007AFF)
85 )
86 .clickable(
87 enabled = uiState.isModelLoaded && !uiState.isTranscribing,
88 onClick = { viewModel.toggleRecording() }
89 ),
90 contentAlignment = Alignment.Center
91 ) {
92 Text(
93 text = if (uiState.isRecording) "⬛" else "🎤",
94 style = MaterialTheme.typography.headlineLarge
95 )
96 }
97
98 Spacer(modifier = Modifier.height(16.dp))
99
100 if (uiState.isTranscribing) {
101 Row(verticalAlignment = Alignment.CenterVertically) {
102 CircularProgressIndicator(
103 modifier = Modifier.size(16.dp),
104 strokeWidth = 2.dp
105 )
106 Spacer(modifier = Modifier.width(8.dp))
107 Text("Transcribing...", color = Color.White)
108 }
109 }
110 }
111}
Speech-to-text on Android

Requesting Permissions

In your Activity, request the microphone permission:

kotlin
1import android.Manifest
2import androidx.activity.result.contract.ActivityResultContracts
3
4class MainActivity : ComponentActivity() {
5 private val requestPermissionLauncher = registerForActivityResult(
6 ActivityResultContracts.RequestPermission()
7 ) { isGranted ->
8 if (isGranted) {
9 // Permission granted, can start recording
10 }
11 }
12
13 override fun onCreate(savedInstanceState: Bundle?) {
14 super.onCreate(savedInstanceState)
15
16 // Request microphone permission
17 requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
18
19 setContent {
20 // Your app content
21 }
22 }
23}

Memory Management

When you're done with STT, unload the model:

kotlin
1import com.runanywhere.sdk.public.extensions.unloadSTTModel
2
3// Unload STT model
4RunAnywhere.unloadSTTModel()

STT models can be loaded independently alongside the LLM—they don't conflict.

Models Reference

Model IDSizeNotes
sherpa-onnx-whisper-tiny.en~75MBEnglish, real-time capable

What's Next

In Part 3, we'll add text-to-speech with Piper using Android's AudioTrack for playback.


Resources


Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Logo

RunAnywhere

Connect with developers, share ideas, get support, and stay updated on the latest features. Our Discord community is the heart of everything we build.

Company

Copyright © 2025 RunAnywhere, Inc.