RunAnywhere React Native SDK Part 3: Text-to-Speech with Piper

Natural Voice Synthesis Entirely On-Device

This is Part 3 of our RunAnywhere React Native SDK tutorial series:

Chat with LLMs — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech (this post) — Natural voice synthesis with Piper
Voice Pipeline — Full voice assistant with VAD

Text-to-speech brings your app to life. With RunAnywhere, you can synthesize natural-sounding speech using Piper—completely on-device, with no network latency, working on both iOS and Android.

Like STT, TTS has an audio format consideration: Piper outputs base64-encoded Float32 PCM that needs to be converted for playback.

Prerequisites

Complete Part 1 first to set up your project with the RunAnywhere SDK
~65MB additional storage for the Piper voice model

Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.

Register the TTS Voice

Add Piper to your model registration in App.tsx:

typescript

1import { RunAnywhere, ModelCategory } from '@runanywhere/core'
2import { ModelArtifactType } from '@runanywhere/onnx'
3
4// Register TTS voice (Piper)
5RunAnywhere.registerModel({
6  id: 'vits-piper-en_US-lessac-medium',
7  name: 'Piper US English',
8  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
9  framework: 'onnx',
10  modality: ModelCategory.SpeechSynthesis,
11  artifactType: ModelArtifactType.TarGzArchive,
12  memoryRequirement: 65_000_000,
13})

Important: Piper Output Format

Piper outputs audio in a specific format:

Parameter	Value
Sample Rate	22,050 Hz
Channels	1 (mono)
Format	32-bit float (Float32) PCM, base64 encoded

The SDK returns base64-encoded float32 samples that need to be decoded and converted for playback.

Loading and Using TTS

Create src/hooks/useTTS.ts:

typescript

1import { useState, useCallback } from 'react'
2import { RunAnywhere } from '@runanywhere/core'
3
4interface TTSResult {
5  audio: string // base64 encoded Float32 PCM
6  sampleRate: number
7  numSamples: number
8  duration: number
9}
10
11export function useTTS() {
12  const [isLoaded, setIsLoaded] = useState(false)
13  const [isLoading, setIsLoading] = useState(false)
14  const [downloadProgress, setDownloadProgress] = useState(0)
15
16  const loadModel = useCallback(async () => {
17    setIsLoading(true)
18    const modelId = 'vits-piper-en_US-lessac-medium'
19
20    try {
21      // Check if already downloaded
22      const isDownloaded = await RunAnywhere.isModelDownloaded(modelId)
23
24      if (!isDownloaded) {
25        await RunAnywhere.downloadModel(modelId, (progress) => {
26          setDownloadProgress(progress.progress)
27        })
28      }
29
30      // Load TTS voice into memory
31      await RunAnywhere.loadTTSVoice(modelId)
32      setIsLoaded(true)
33      console.log('TTS voice loaded successfully')
34    } catch (e) {
35      console.error('TTS load error:', e)
36      throw e
37    } finally {
38      setIsLoading(false)
39    }
40  }, [])
41
42  const synthesize = useCallback(
43    async (
44      text: string,
45      options?: { rate?: number; pitch?: number; volume?: number }
46    ): Promise<TTSResult> => {
47      if (!isLoaded) throw new Error('TTS model not loaded')
48
49      const result = await RunAnywhere.synthesize(text, {
50        voice: 'default',
51        rate: options?.rate ?? 1.0,
52        pitch: options?.pitch ?? 1.0,
53        volume: options?.volume ?? 1.0,
54      })
55
56      return result
57    },
58    [isLoaded]
59  )
60
61  return {
62    isLoaded,
63    isLoading,
64    downloadProgress,
65    loadModel,
66    synthesize,
67  }
68}

Why loadTTSVoice() instead of loadModel()? The SDK uses separate methods for each modality: loadModel() for LLMs, loadSTTModel() for speech-to-text, and loadTTSVoice() for text-to-speech. This reflects that each uses a different runtime (LlamaCPP vs ONNX) and can be loaded simultaneously without conflicts.

Audio Playback

For audio playback, you'll need to convert the base64 float32 data to a playable format. Install the required dependencies:

bash

1npm install react-native-sound react-native-fs @react-native-community/slider
2cd ios && pod install && cd ..

Why base64? React Native's JS bridge can't transfer raw binary data directly between native code and JavaScript. The SDK returns Float32 PCM audio as a base64-encoded string, which you decode on the JS side. This is a React Native-specific consideration—Swift and Kotlin SDKs return raw bytes.

Create src/services/TTSAudioPlayer.ts:

typescript

1import Sound from 'react-native-sound'
2import RNFS from 'react-native-fs'
3
4// Enable playback in silence mode
5Sound.setCategory('Playback')
6
7export class TTSAudioPlayer {
8  private currentSound: Sound | null = null
9
10  async playTTSAudio(base64Audio: string, sampleRate: number): Promise<void> {
11    // Decode base64 to Float32 array
12    const float32Data = this.base64ToFloat32(base64Audio)
13
14    // Convert Float32 to Int16
15    const int16Data = this.float32ToInt16(float32Data)
16
17    // Create WAV file
18    const wavData = this.createWavFile(int16Data, sampleRate)
19
20    // Save to temp file
21    const tempPath = `${RNFS.TemporaryDirectoryPath}/tts_output_${Date.now()}.wav`
22    await RNFS.writeFile(tempPath, wavData, 'base64')
23
24    // Play the file
25    return new Promise((resolve, reject) => {
26      this.currentSound = new Sound(tempPath, '', (error) => {
27        if (error) {
28          reject(error)
29          return
30        }
31
32        this.currentSound?.play((success) => {
33          if (success) {
34            resolve()
35          } else {
36            reject(new Error('Playback failed'))
37          }
38
39          // Cleanup
40          RNFS.unlink(tempPath).catch(() => {})
41        })
42      })
43    })
44  }
45
46  stop(): void {
47    if (this.currentSound) {
48      this.currentSound.stop()
49      this.currentSound.release()
50      this.currentSound = null
51    }
52  }
53
54  private base64ToFloat32(base64: string): Float32Array {
55    const binaryString = atob(base64)
56    const bytes = new Uint8Array(binaryString.length)
57    for (let i = 0; i < binaryString.length; i++) {
58      bytes[i] = binaryString.charCodeAt(i)
59    }
60    return new Float32Array(bytes.buffer)
61  }
62
63  private float32ToInt16(float32: Float32Array): Int16Array {
64    const int16 = new Int16Array(float32.length)
65    for (let i = 0; i < float32.length; i++) {
66      // Clamp to [-1, 1] and scale to Int16
67      const clamped = Math.max(-1, Math.min(1, float32[i]))
68      int16[i] = Math.round(clamped * 32767)
69    }
70    return int16
71  }
72
73  private createWavFile(audioData: Int16Array, sampleRate: number): string {
74    const channels = 1
75    const bitsPerSample = 16
76    const byteRate = sampleRate * channels * (bitsPerSample / 8)
77    const blockAlign = channels * (bitsPerSample / 8)
78    const dataSize = audioData.length * 2 // Int16 = 2 bytes
79    const fileSize = 36 + dataSize
80
81    // Create header (44 bytes)
82    const header = new ArrayBuffer(44)
83    const view = new DataView(header)
84
85    // RIFF header
86    this.writeString(view, 0, 'RIFF')
87    view.setUint32(4, fileSize, true)
88    this.writeString(view, 8, 'WAVE')
89
90    // fmt subchunk
91    this.writeString(view, 12, 'fmt ')
92    view.setUint32(16, 16, true) // Subchunk size
93    view.setUint16(20, 1, true) // PCM format
94    view.setUint16(22, channels, true)
95    view.setUint32(24, sampleRate, true)
96    view.setUint32(28, byteRate, true)
97    view.setUint16(32, blockAlign, true)
98    view.setUint16(34, bitsPerSample, true)
99
100    // data subchunk
101    this.writeString(view, 36, 'data')
102    view.setUint32(40, dataSize, true)
103
104    // Combine header and audio data
105    const wavBuffer = new ArrayBuffer(44 + dataSize)
106    const wavView = new Uint8Array(wavBuffer)
107    wavView.set(new Uint8Array(header), 0)
108
109    // Write audio data
110    const audioBytes = new Uint8Array(audioData.buffer)
111    wavView.set(audioBytes, 44)
112
113    // Convert to base64
114    let binary = ''
115    for (let i = 0; i < wavView.length; i++) {
116      binary += String.fromCharCode(wavView[i])
117    }
118    return btoa(binary)
119  }
120
121  private writeString(view: DataView, offset: number, str: string): void {
122    for (let i = 0; i < str.length; i++) {
123      view.setUint8(offset + i, str.charCodeAt(i))
124    }
125  }
126}

Complete TTS Screen

Create src/screens/TTSScreen.tsx:

typescript

1import React, { useState, useEffect } from 'react';
2import {
3  View,
4  Text,
5  TextInput,
6  TouchableOpacity,
7  StyleSheet,
8} from 'react-native';
9import Slider from '@react-native-community/slider';
10import { useTTS } from '../hooks/useTTS';
11import { TTSAudioPlayer } from '../services/TTSAudioPlayer';
12
13const audioPlayer = new TTSAudioPlayer();
14
15export function TTSScreen() {
16  const [inputText, setInputText] = useState(
17    'Hello! This is text-to-speech running entirely on your device.'
18  );
19  const [isSynthesizing, setIsSynthesizing] = useState(false);
20  const [speechRate, setSpeechRate] = useState(1.0);
21  const [pitch, setPitch] = useState(1.0);
22
23  const { isLoaded, isLoading, downloadProgress, loadModel, synthesize } = useTTS();
24
25  useEffect(() => {
26    loadModel();
27  }, [loadModel]);
28
29  async function synthesizeAndPlay() {
30    if (!inputText.trim() || isSynthesizing) return;
31
32    setIsSynthesizing(true);
33
34    try {
35      const result = await synthesize(inputText, {
36        rate: speechRate,
37        pitch: pitch,
38        volume: 1.0,
39      });
40
41      console.log(`Synthesized: ${result.duration.toFixed(2)}s`);
42
43      await audioPlayer.playTTSAudio(result.audio, result.sampleRate);
44
45    } catch (e) {
46      console.error('TTS error:', e);
47    } finally {
48      setIsSynthesizing(false);
49    }
50  }
51
52  if (isLoading) {
53    return (
54      <View style={styles.container}>
55        <Text style={styles.statusText}>
56          Downloading voice model... {(downloadProgress * 100).toFixed(0)}%
57        </Text>
58        <View style={styles.progressBar}>
59          <View style={[styles.progressFill, { width: `${downloadProgress * 100}%` }]} />
60        </View>
61      </View>
62    );
63  }
64
65  return (
66    <View style={styles.container}>
67      {/* Text input */}
68      <TextInput
69        style={styles.textInput}
70        value={inputText}
71        onChangeText={setInputText}
72        placeholder="Enter text to speak..."
73        placeholderTextColor="#666"
74        multiline
75        numberOfLines={4}
76      />
77
78      {/* Speed slider */}
79      <View style={styles.sliderContainer}>
80        <Text style={styles.sliderLabel}>Speed: {speechRate.toFixed(1)}x</Text>
81        <Slider
82          style={styles.slider}
83          minimumValue={0.5}
84          maximumValue={2.0}
85          step={0.1}
86          value={speechRate}
87          onValueChange={setSpeechRate}
88          minimumTrackTintColor="#007AFF"
89          maximumTrackTintColor="#333"
90          thumbTintColor="#007AFF"
91        />
92      </View>
93
94      {/* Pitch slider */}
95      <View style={styles.sliderContainer}>
96        <Text style={styles.sliderLabel}>Pitch: {pitch.toFixed(1)}</Text>
97        <Slider
98          style={styles.slider}
99          minimumValue={0.5}
100          maximumValue={1.5}
101          step={0.1}
102          value={pitch}
103          onValueChange={setPitch}
104          minimumTrackTintColor="#007AFF"
105          maximumTrackTintColor="#333"
106          thumbTintColor="#007AFF"
107        />
108      </View>
109
110      {/* Speak button */}
111      <TouchableOpacity
112        style={[styles.speakButton, (!isLoaded || isSynthesizing) && styles.disabled]}
113        onPress={synthesizeAndPlay}
114        disabled={!isLoaded || isSynthesizing}
115      >
116        <Text style={styles.speakButtonText}>
117          {isSynthesizing ? 'Synthesizing...' : '🔊 Speak'}
118        </Text>
119      </TouchableOpacity>
120    </View>
121  );
122}
123
124const styles = StyleSheet.create({
125  container: {
126    flex: 1,
127    backgroundColor: '#000',
128    padding: 24,
129  },
130  statusText: {
131    color: '#fff',
132    fontSize: 16,
133    marginBottom: 16,
134    textAlign: 'center',
135  },
136  progressBar: {
137    width: '100%',
138    height: 8,
139    backgroundColor: '#333',
140    borderRadius: 4,
141    overflow: 'hidden',
142  },
143  progressFill: {
144    height: '100%',
145    backgroundColor: '#007AFF',
146  },
147  textInput: {
148    backgroundColor: '#111',
149    borderRadius: 12,
150    padding: 16,
151    color: '#fff',
152    fontSize: 16,
153    minHeight: 120,
154    textAlignVertical: 'top',
155  },
156  sliderContainer: {
157    marginTop: 24,
158  },
159  sliderLabel: {
160    color: '#fff',
161    fontSize: 14,
162    marginBottom: 8,
163  },
164  slider: {
165    width: '100%',
166    height: 40,
167  },
168  speakButton: {
169    backgroundColor: '#007AFF',
170    borderRadius: 12,
171    padding: 16,
172    marginTop: 32,
173    alignItems: 'center',
174  },
175  speakButtonText: {
176    color: '#fff',
177    fontSize: 18,
178    fontWeight: '600',
179  },
180  disabled: {
181    opacity: 0.5,
182  },
183});

Text-to-speech synthesis and playback controls

Memory Management

When you're done with TTS, unload the voice to free memory:

typescript

1// Unload TTS voice to free memory
2await RunAnywhere.unloadTTSVoice()

TTS voices can be loaded independently alongside the LLM and STT models—they don't conflict.

Models Reference

Model ID	Size	Notes
vits-piper-en_US-lessac-medium	~65MB	Natural US English

What's Next

In Part 4, we'll combine everything into a complete voice assistant with automatic Voice Activity Detection.

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.