RunAnywhere Flutter SDK Part 3: Text-to-Speech with Piper

Natural Voice Synthesis Entirely On-Device

This is Part 3 of our RunAnywhere Flutter SDK tutorial series:

Chat with LLMs — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech (this post) — Natural voice synthesis with Piper
Voice Pipeline — Full voice assistant with VAD

Text-to-speech brings your app to life. With RunAnywhere, you can synthesize natural-sounding speech using Piper—completely on-device, with no network latency, working identically on iOS and Android.

Like STT, TTS has an audio format consideration: Piper outputs raw Float32 PCM samples that need to be converted for playback.

Prerequisites

Complete Part 1 first to set up your project with the RunAnywhere SDK
~65MB additional storage for the Piper voice model

Dependencies

Add the audio playback package to your pubspec.yaml:

yaml

1dependencies:
2  audioplayers: ^6.0.0

Then run:

bash

1flutter pub get

Register the TTS Voice

Add Piper to your model registration in your initialization code:

dart

1// Register TTS voice (Piper)
2RunAnywhere.registerModel(
3  id: 'vits-piper-en_US-lessac-medium',
4  name: 'Piper US English',
5  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
6  framework: InferenceFramework.onnx,
7  modality: ModelCategory.speechSynthesis,
8  artifactType: ArtifactType.tarGzArchive,
9  memoryRequirement: 65000000,
10);

Important: Piper Output Format

Piper outputs audio in a specific format:

Parameter	Value
Sample Rate	22,050 Hz
Channels	1 (mono)
Format	32-bit float (Float32) PCM

Most audio players can't play raw Float32 PCM directly—you need to convert to a playable format or use a specialized player.

Loading and Using TTS

dart

1// Download the voice (one-time, ~65MB)
2final isDownloaded = await RunAnywhere.isModelDownloaded('vits-piper-en_US-lessac-medium');
3
4if (!isDownloaded) {
5  await for (final progress in RunAnywhere.downloadModel('vits-piper-en_US-lessac-medium')) {
6    debugPrint('Download: ${(progress.progress * 100).toStringAsFixed(1)}%');
7    if (progress.stage == DownloadStage.completed) break;
8  }
9}
10
11// Load TTS voice into memory
12await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');
13
14// Synthesize speech
15final result = await RunAnywhere.synthesize(
16  'Hello, world!',
17  rate: 1.0,
18  pitch: 1.0,
19  volume: 1.0,
20);
21
22// result.samples is Float32List at 22kHz
23// result.sampleRate is 22050
24// result.duration is the audio length in seconds

API Pattern: Like loadSTTModel(), the SDK uses loadTTSVoice() for speech synthesis models. LLM, STT, and TTS each have dedicated load/unload methods because they use different runtimes and memory pools. You can have all three loaded simultaneously.

Audio Playback Service

Create lib/services/audio_playback_service.dart:

dart

1import 'dart:typed_data';
2import 'package:flutter/foundation.dart';
3import 'package:audioplayers/audioplayers.dart';
4import 'package:path_provider/path_provider.dart';
5import 'dart:io';
6
7class AudioPlaybackService {
8  final AudioPlayer _player = AudioPlayer();
9
10  /// Convert Float32 samples to WAV file and play
11  Future<void> playFloat32Audio(Float32List samples, int sampleRate) async {
12    // Convert Float32 to Int16
13    final int16Data = _convertFloat32ToInt16(samples);
14
15    // Create WAV file
16    final wavData = _createWavFile(int16Data, sampleRate);
17
18    // Save to temp file
19    final directory = await getTemporaryDirectory();
20    final wavPath = '${directory.path}/tts_output_${DateTime.now().millisecondsSinceEpoch}.wav';
21    final file = File(wavPath);
22    await file.writeAsBytes(wavData);
23
24    debugPrint('Audio saved to: $wavPath (${wavData.length} bytes)');
25
26    // Play the WAV file
27    await _player.play(DeviceFileSource(wavPath));
28
29    // Wait for playback to complete
30    await _player.onPlayerComplete.first;
31
32    // Clean up temp file
33    try {
34      await file.delete();
35    } catch (_) {}
36  }
37
38  /// Stop current playback
39  Future<void> stop() async {
40    await _player.stop();
41  }
42
43  /// Convert Float32 samples to Int16
44  Uint8List _convertFloat32ToInt16(Float32List samples) {
45    final int16Bytes = ByteData(samples.length * 2);
46
47    for (int i = 0; i < samples.length; i++) {
48      // Clamp to [-1, 1] range and scale to Int16
49      final clamped = samples[i].clamp(-1.0, 1.0);
50      final int16Value = (clamped * 32767).toInt();
51      int16Bytes.setInt16(i * 2, int16Value, Endian.little);
52    }
53
54    return int16Bytes.buffer.asUint8List();
55  }
56
57  /// Create a WAV file from Int16 audio data
58  Uint8List _createWavFile(Uint8List audioData, int sampleRate) {
59    const channels = 1;
60    const bitsPerSample = 16;
61    final byteRate = sampleRate * channels * (bitsPerSample ~/ 8);
62    final blockAlign = channels * (bitsPerSample ~/ 8);
63    final dataSize = audioData.length;
64    final fileSize = 36 + dataSize;
65
66    final header = ByteData(44);
67    int offset = 0;
68
69    // RIFF header
70    header.setUint8(offset++, 0x52); // R
71    header.setUint8(offset++, 0x49); // I
72    header.setUint8(offset++, 0x46); // F
73    header.setUint8(offset++, 0x46); // F
74    header.setUint32(offset, fileSize, Endian.little);
75    offset += 4;
76    header.setUint8(offset++, 0x57); // W
77    header.setUint8(offset++, 0x41); // A
78    header.setUint8(offset++, 0x56); // V
79    header.setUint8(offset++, 0x45); // E
80
81    // fmt subchunk
82    header.setUint8(offset++, 0x66); // f
83    header.setUint8(offset++, 0x6D); // m
84    header.setUint8(offset++, 0x74); // t
85    header.setUint8(offset++, 0x20); // space
86    header.setUint32(offset, 16, Endian.little); // Subchunk size
87    offset += 4;
88    header.setUint16(offset, 1, Endian.little); // PCM format
89    offset += 2;
90    header.setUint16(offset, channels, Endian.little);
91    offset += 2;
92    header.setUint32(offset, sampleRate, Endian.little);
93    offset += 4;
94    header.setUint32(offset, byteRate, Endian.little);
95    offset += 4;
96    header.setUint16(offset, blockAlign, Endian.little);
97    offset += 2;
98    header.setUint16(offset, bitsPerSample, Endian.little);
99    offset += 2;
100
101    // data subchunk
102    header.setUint8(offset++, 0x64); // d
103    header.setUint8(offset++, 0x61); // a
104    header.setUint8(offset++, 0x74); // t
105    header.setUint8(offset++, 0x61); // a
106    header.setUint32(offset, dataSize, Endian.little);
107
108    // Combine header and audio data
109    final result = Uint8List(44 + audioData.length);
110    result.setRange(0, 44, header.buffer.asUint8List());
111    result.setRange(44, 44 + audioData.length, audioData);
112
113    return result;
114  }
115
116  /// Dispose of resources
117  void dispose() {
118    _player.dispose();
119  }
120}

Important: The Float32-to-Int16 conversion is essential for standard audio players. The clamping step ensures no overflow during scaling.

Complete TTS View

Create lib/features/tts/text_to_speech_view.dart:

dart

1import 'package:flutter/material.dart';
2import 'package:runanywhere/runanywhere.dart';
3import '../../services/audio_playback_service.dart';
4
5class TextToSpeechView extends StatefulWidget {
6  const TextToSpeechView({super.key});
7
8  @override
9  State<TextToSpeechView> createState() => _TextToSpeechViewState();
10}
11
12class _TextToSpeechViewState extends State<TextToSpeechView> {
13  final TextEditingController _controller = TextEditingController(
14    text: 'Hello! This is text-to-speech running entirely on your device.',
15  );
16  final AudioPlaybackService _audioService = AudioPlaybackService();
17
18  bool _isSynthesizing = false;
19  bool _isModelLoaded = false;
20  double _downloadProgress = 0.0;
21  double _speechRate = 1.0;
22  double _pitch = 1.0;
23
24  @override
25  void initState() {
26    super.initState();
27    _loadModel();
28  }
29
30  Future<void> _loadModel() async {
31    const modelId = 'vits-piper-en_US-lessac-medium';
32
33    final isDownloaded = await RunAnywhere.isModelDownloaded(modelId);
34
35    if (!isDownloaded) {
36      await for (final progress in RunAnywhere.downloadModel(modelId)) {
37        setState(() {
38          _downloadProgress = progress.progress;
39        });
40        if (progress.stage == DownloadStage.completed) break;
41      }
42    }
43
44    await RunAnywhere.loadTTSVoice(modelId);
45    setState(() {
46      _isModelLoaded = true;
47    });
48  }
49
50  Future<void> _synthesizeAndPlay() async {
51    final text = _controller.text.trim();
52    if (text.isEmpty || _isSynthesizing) return;
53
54    setState(() {
55      _isSynthesizing = true;
56    });
57
58    try {
59      final result = await RunAnywhere.synthesize(
60        text,
61        rate: _speechRate,
62        pitch: _pitch,
63        volume: 1.0,
64      );
65
66      debugPrint('Synthesized: ${result.duration.toStringAsFixed(2)}s, ${result.sampleRate}Hz');
67
68      // Play the audio
69      await _audioService.playFloat32Audio(result.samples, result.sampleRate);
70
71    } catch (e) {
72      ScaffoldMessenger.of(context).showSnackBar(
73        SnackBar(content: Text('TTS Error: $e')),
74      );
75    } finally {
76      setState(() {
77        _isSynthesizing = false;
78      });
79    }
80  }
81
82  @override
83  Widget build(BuildContext context) {
84    return Scaffold(
85      appBar: AppBar(
86        title: const Text('Text to Speech'),
87      ),
88      body: Padding(
89        padding: const EdgeInsets.all(24),
90        child: Column(
91          crossAxisAlignment: CrossAxisAlignment.start,
92          children: [
93            // Status indicator
94            if (!_isModelLoaded)
95              Column(
96                children: [
97                  const Text('Downloading voice model...'),
98                  const SizedBox(height: 8),
99                  LinearProgressIndicator(value: _downloadProgress),
100                  const SizedBox(height: 24),
101                ],
102              ),
103
104            // Text input
105            TextField(
106              controller: _controller,
107              maxLines: 4,
108              decoration: const InputDecoration(
109                labelText: 'Text to speak',
110                border: OutlineInputBorder(),
111              ),
112            ),
113
114            const SizedBox(height: 24),
115
116            // Speed slider
117            Text('Speed: ${_speechRate.toStringAsFixed(1)}x'),
118            Slider(
119              value: _speechRate,
120              min: 0.5,
121              max: 2.0,
122              divisions: 15,
123              onChanged: (value) => setState(() => _speechRate = value),
124            ),
125
126            const SizedBox(height: 16),
127
128            // Pitch slider
129            Text('Pitch: ${_pitch.toStringAsFixed(1)}'),
130            Slider(
131              value: _pitch,
132              min: 0.5,
133              max: 1.5,
134              divisions: 10,
135              onChanged: (value) => setState(() => _pitch = value),
136            ),
137
138            const SizedBox(height: 32),
139
140            // Speak button
141            SizedBox(
142              width: double.infinity,
143              child: ElevatedButton.icon(
144                onPressed: _isModelLoaded && !_isSynthesizing
145                    ? _synthesizeAndPlay
146                    : null,
147                icon: Icon(_isSynthesizing ? Icons.hourglass_empty : Icons.volume_up),
148                label: Text(_isSynthesizing ? 'Synthesizing...' : 'Speak'),
149                style: ElevatedButton.styleFrom(
150                  padding: const EdgeInsets.all(16),
151                ),
152              ),
153            ),
154          ],
155        ),
156      ),
157    );
158  }
159}

Text-to-speech synthesis and playback controls

Memory Management

When you're done with TTS, unload the voice to free memory:

dart

1// Unload TTS voice
2await RunAnywhere.unloadTTSVoice();

TTS voices can be loaded independently alongside the LLM and STT models—they don't conflict.

Models Reference

Model ID	Size	Notes
vits-piper-en_US-lessac-medium	~65MB	Natural US English

What's Next

In Part 4, we'll combine everything into a complete voice assistant with automatic Voice Activity Detection.

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Flutter SDK Part 3: Text-to-Speech with Piper

Prerequisites

Dependencies

Register the TTS Voice

Important: Piper Output Format

Loading and Using TTS

Audio Playback Service

Complete TTS View

Memory Management

Models Reference

What's Next

Resources

Frequently Asked Questions

Why don't I hear any audio when I tap Speak?

Do I need to convert Piper's output?

I used loadSTTModel() for Piper and it failed.

The AudioPlaybackService only prints to debug—how do I play audio?

The Piper model download hangs or is slow.

My registerModel call fails with a type error.

Can I load TTS alongside LLM and STT?

How much additional RAM does the Piper model use?