RunAnywhere React Native SDK Part 3: Text-to-Speech with Piper
DEVELOPERSNatural Voice Synthesis Entirely On-Device
This is Part 3 of our RunAnywhere React Native SDK tutorial series:
- Chat with LLMs — Project setup and streaming text generation
- Speech-to-Text — Real-time transcription with Whisper
- Text-to-Speech (this post) — Natural voice synthesis with Piper
- Voice Pipeline — Full voice assistant with VAD
Text-to-speech brings your app to life. With RunAnywhere, you can synthesize natural-sounding speech using Piper—completely on-device, with no network latency, working on both iOS and Android.
Like STT, TTS has an audio format consideration: Piper outputs base64-encoded Float32 PCM that needs to be converted for playback.
Prerequisites
- Complete Part 1 first to set up your project with the RunAnywhere SDK
- ~65MB additional storage for the Piper voice model
Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.
Register the TTS Voice
Add Piper to your model registration in App.tsx:
1import { RunAnywhere, ModelCategory } from '@runanywhere/core'2import { ModelArtifactType } from '@runanywhere/onnx'34// Register TTS voice (Piper)5RunAnywhere.registerModel({6 id: 'vits-piper-en_US-lessac-medium',7 name: 'Piper US English',8 url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',9 framework: 'onnx',10 modality: ModelCategory.SpeechSynthesis,11 artifactType: ModelArtifactType.TarGzArchive,12 memoryRequirement: 65_000_000,13})
Important: Piper Output Format
Piper outputs audio in a specific format:
| Parameter | Value |
|---|---|
| Sample Rate | 22,050 Hz |
| Channels | 1 (mono) |
| Format | 32-bit float (Float32) PCM, base64 encoded |
The SDK returns base64-encoded float32 samples that need to be decoded and converted for playback.
Loading and Using TTS
Create src/hooks/useTTS.ts:
1import { useState, useCallback } from 'react'2import { RunAnywhere } from '@runanywhere/core'34interface TTSResult {5 audio: string // base64 encoded Float32 PCM6 sampleRate: number7 numSamples: number8 duration: number9}1011export function useTTS() {12 const [isLoaded, setIsLoaded] = useState(false)13 const [isLoading, setIsLoading] = useState(false)14 const [downloadProgress, setDownloadProgress] = useState(0)1516 const loadModel = useCallback(async () => {17 setIsLoading(true)18 const modelId = 'vits-piper-en_US-lessac-medium'1920 try {21 // Check if already downloaded22 const isDownloaded = await RunAnywhere.isModelDownloaded(modelId)2324 if (!isDownloaded) {25 await RunAnywhere.downloadModel(modelId, (progress) => {26 setDownloadProgress(progress.progress)27 })28 }2930 // Load TTS voice into memory31 await RunAnywhere.loadTTSVoice(modelId)32 setIsLoaded(true)33 console.log('TTS voice loaded successfully')34 } catch (e) {35 console.error('TTS load error:', e)36 throw e37 } finally {38 setIsLoading(false)39 }40 }, [])4142 const synthesize = useCallback(43 async (44 text: string,45 options?: { rate?: number; pitch?: number; volume?: number }46 ): Promise<TTSResult> => {47 if (!isLoaded) throw new Error('TTS model not loaded')4849 const result = await RunAnywhere.synthesize(text, {50 voice: 'default',51 rate: options?.rate ?? 1.0,52 pitch: options?.pitch ?? 1.0,53 volume: options?.volume ?? 1.0,54 })5556 return result57 },58 [isLoaded]59 )6061 return {62 isLoaded,63 isLoading,64 downloadProgress,65 loadModel,66 synthesize,67 }68}
Why
loadTTSVoice()instead ofloadModel()? The SDK uses separate methods for each modality:loadModel()for LLMs,loadSTTModel()for speech-to-text, andloadTTSVoice()for text-to-speech. This reflects that each uses a different runtime (LlamaCPP vs ONNX) and can be loaded simultaneously without conflicts.
Audio Playback
For audio playback, you'll need to convert the base64 float32 data to a playable format. Install the required dependencies:
1npm install react-native-sound react-native-fs @react-native-community/slider2cd ios && pod install && cd ..
Why base64? React Native's JS bridge can't transfer raw binary data directly between native code and JavaScript. The SDK returns Float32 PCM audio as a base64-encoded string, which you decode on the JS side. This is a React Native-specific consideration—Swift and Kotlin SDKs return raw bytes.
Create src/services/TTSAudioPlayer.ts:
1import Sound from 'react-native-sound'2import RNFS from 'react-native-fs'34// Enable playback in silence mode5Sound.setCategory('Playback')67export class TTSAudioPlayer {8 private currentSound: Sound | null = null910 async playTTSAudio(base64Audio: string, sampleRate: number): Promise<void> {11 // Decode base64 to Float32 array12 const float32Data = this.base64ToFloat32(base64Audio)1314 // Convert Float32 to Int1615 const int16Data = this.float32ToInt16(float32Data)1617 // Create WAV file18 const wavData = this.createWavFile(int16Data, sampleRate)1920 // Save to temp file21 const tempPath = `${RNFS.TemporaryDirectoryPath}/tts_output_${Date.now()}.wav`22 await RNFS.writeFile(tempPath, wavData, 'base64')2324 // Play the file25 return new Promise((resolve, reject) => {26 this.currentSound = new Sound(tempPath, '', (error) => {27 if (error) {28 reject(error)29 return30 }3132 this.currentSound?.play((success) => {33 if (success) {34 resolve()35 } else {36 reject(new Error('Playback failed'))37 }3839 // Cleanup40 RNFS.unlink(tempPath).catch(() => {})41 })42 })43 })44 }4546 stop(): void {47 if (this.currentSound) {48 this.currentSound.stop()49 this.currentSound.release()50 this.currentSound = null51 }52 }5354 private base64ToFloat32(base64: string): Float32Array {55 const binaryString = atob(base64)56 const bytes = new Uint8Array(binaryString.length)57 for (let i = 0; i < binaryString.length; i++) {58 bytes[i] = binaryString.charCodeAt(i)59 }60 return new Float32Array(bytes.buffer)61 }6263 private float32ToInt16(float32: Float32Array): Int16Array {64 const int16 = new Int16Array(float32.length)65 for (let i = 0; i < float32.length; i++) {66 // Clamp to [-1, 1] and scale to Int1667 const clamped = Math.max(-1, Math.min(1, float32[i]))68 int16[i] = Math.round(clamped * 32767)69 }70 return int1671 }7273 private createWavFile(audioData: Int16Array, sampleRate: number): string {74 const channels = 175 const bitsPerSample = 1676 const byteRate = sampleRate * channels * (bitsPerSample / 8)77 const blockAlign = channels * (bitsPerSample / 8)78 const dataSize = audioData.length * 2 // Int16 = 2 bytes79 const fileSize = 36 + dataSize8081 // Create header (44 bytes)82 const header = new ArrayBuffer(44)83 const view = new DataView(header)8485 // RIFF header86 this.writeString(view, 0, 'RIFF')87 view.setUint32(4, fileSize, true)88 this.writeString(view, 8, 'WAVE')8990 // fmt subchunk91 this.writeString(view, 12, 'fmt ')92 view.setUint32(16, 16, true) // Subchunk size93 view.setUint16(20, 1, true) // PCM format94 view.setUint16(22, channels, true)95 view.setUint32(24, sampleRate, true)96 view.setUint32(28, byteRate, true)97 view.setUint16(32, blockAlign, true)98 view.setUint16(34, bitsPerSample, true)99100 // data subchunk101 this.writeString(view, 36, 'data')102 view.setUint32(40, dataSize, true)103104 // Combine header and audio data105 const wavBuffer = new ArrayBuffer(44 + dataSize)106 const wavView = new Uint8Array(wavBuffer)107 wavView.set(new Uint8Array(header), 0)108109 // Write audio data110 const audioBytes = new Uint8Array(audioData.buffer)111 wavView.set(audioBytes, 44)112113 // Convert to base64114 let binary = ''115 for (let i = 0; i < wavView.length; i++) {116 binary += String.fromCharCode(wavView[i])117 }118 return btoa(binary)119 }120121 private writeString(view: DataView, offset: number, str: string): void {122 for (let i = 0; i < str.length; i++) {123 view.setUint8(offset + i, str.charCodeAt(i))124 }125 }126}
Complete TTS Screen
Create src/screens/TTSScreen.tsx:
1import React, { useState, useEffect } from 'react';2import {3 View,4 Text,5 TextInput,6 TouchableOpacity,7 StyleSheet,8} from 'react-native';9import Slider from '@react-native-community/slider';10import { useTTS } from '../hooks/useTTS';11import { TTSAudioPlayer } from '../services/TTSAudioPlayer';1213const audioPlayer = new TTSAudioPlayer();1415export function TTSScreen() {16 const [inputText, setInputText] = useState(17 'Hello! This is text-to-speech running entirely on your device.'18 );19 const [isSynthesizing, setIsSynthesizing] = useState(false);20 const [speechRate, setSpeechRate] = useState(1.0);21 const [pitch, setPitch] = useState(1.0);2223 const { isLoaded, isLoading, downloadProgress, loadModel, synthesize } = useTTS();2425 useEffect(() => {26 loadModel();27 }, [loadModel]);2829 async function synthesizeAndPlay() {30 if (!inputText.trim() || isSynthesizing) return;3132 setIsSynthesizing(true);3334 try {35 const result = await synthesize(inputText, {36 rate: speechRate,37 pitch: pitch,38 volume: 1.0,39 });4041 console.log(`Synthesized: ${result.duration.toFixed(2)}s`);4243 await audioPlayer.playTTSAudio(result.audio, result.sampleRate);4445 } catch (e) {46 console.error('TTS error:', e);47 } finally {48 setIsSynthesizing(false);49 }50 }5152 if (isLoading) {53 return (54 <View style={styles.container}>55 <Text style={styles.statusText}>56 Downloading voice model... {(downloadProgress * 100).toFixed(0)}%57 </Text>58 <View style={styles.progressBar}>59 <View style={[styles.progressFill, { width: `${downloadProgress * 100}%` }]} />60 </View>61 </View>62 );63 }6465 return (66 <View style={styles.container}>67 {/* Text input */}68 <TextInput69 style={styles.textInput}70 value={inputText}71 onChangeText={setInputText}72 placeholder="Enter text to speak..."73 placeholderTextColor="#666"74 multiline75 numberOfLines={4}76 />7778 {/* Speed slider */}79 <View style={styles.sliderContainer}>80 <Text style={styles.sliderLabel}>Speed: {speechRate.toFixed(1)}x</Text>81 <Slider82 style={styles.slider}83 minimumValue={0.5}84 maximumValue={2.0}85 step={0.1}86 value={speechRate}87 onValueChange={setSpeechRate}88 minimumTrackTintColor="#007AFF"89 maximumTrackTintColor="#333"90 thumbTintColor="#007AFF"91 />92 </View>9394 {/* Pitch slider */}95 <View style={styles.sliderContainer}>96 <Text style={styles.sliderLabel}>Pitch: {pitch.toFixed(1)}</Text>97 <Slider98 style={styles.slider}99 minimumValue={0.5}100 maximumValue={1.5}101 step={0.1}102 value={pitch}103 onValueChange={setPitch}104 minimumTrackTintColor="#007AFF"105 maximumTrackTintColor="#333"106 thumbTintColor="#007AFF"107 />108 </View>109110 {/* Speak button */}111 <TouchableOpacity112 style={[styles.speakButton, (!isLoaded || isSynthesizing) && styles.disabled]}113 onPress={synthesizeAndPlay}114 disabled={!isLoaded || isSynthesizing}115 >116 <Text style={styles.speakButtonText}>117 {isSynthesizing ? 'Synthesizing...' : '🔊 Speak'}118 </Text>119 </TouchableOpacity>120 </View>121 );122}123124const styles = StyleSheet.create({125 container: {126 flex: 1,127 backgroundColor: '#000',128 padding: 24,129 },130 statusText: {131 color: '#fff',132 fontSize: 16,133 marginBottom: 16,134 textAlign: 'center',135 },136 progressBar: {137 width: '100%',138 height: 8,139 backgroundColor: '#333',140 borderRadius: 4,141 overflow: 'hidden',142 },143 progressFill: {144 height: '100%',145 backgroundColor: '#007AFF',146 },147 textInput: {148 backgroundColor: '#111',149 borderRadius: 12,150 padding: 16,151 color: '#fff',152 fontSize: 16,153 minHeight: 120,154 textAlignVertical: 'top',155 },156 sliderContainer: {157 marginTop: 24,158 },159 sliderLabel: {160 color: '#fff',161 fontSize: 14,162 marginBottom: 8,163 },164 slider: {165 width: '100%',166 height: 40,167 },168 speakButton: {169 backgroundColor: '#007AFF',170 borderRadius: 12,171 padding: 16,172 marginTop: 32,173 alignItems: 'center',174 },175 speakButtonText: {176 color: '#fff',177 fontSize: 18,178 fontWeight: '600',179 },180 disabled: {181 opacity: 0.5,182 },183});

Memory Management
When you're done with TTS, unload the voice to free memory:
1// Unload TTS voice to free memory2await RunAnywhere.unloadTTSVoice()
TTS voices can be loaded independently alongside the LLM and STT models—they don't conflict.
Models Reference
| Model ID | Size | Notes |
|---|---|---|
| vits-piper-en_US-lessac-medium | ~65MB | Natural US English |
What's Next
In Part 4, we'll combine everything into a complete voice assistant with automatic Voice Activity Detection.
Resources
Questions? Open an issue on GitHub or reach out on Twitter/X.