RunAnywhere Swift SDK Part 1: Chat with LLMs On-Device

Run LLMs Entirely On-Device with iOS

This is Part 1 of our RunAnywhere Swift SDK tutorial series:

Chat with LLMs (this post) — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech — Natural voice synthesis with Piper
Voice Pipeline — Full voice assistant with VAD

The promise of on-device AI is compelling: instant responses, complete privacy, and no API costs. But implementing it has traditionally been challenging—managing model formats, optimizing for mobile hardware, and building reliable interfaces.

RunAnywhere changes this. It's a production-ready SDK that brings powerful AI capabilities to iOS with a simple, unified API. In this tutorial, we'll set up the SDK and build a streaming chat interface with LLMs.

Why On-Device AI?

Aspect	Cloud AI	On-Device AI
Privacy	Data sent to servers	Data stays on device
Latency	Network round-trip	Instant local processing
Offline	Requires internet	Works anywhere
Cost	Per-request billing	One-time download

For applications handling sensitive data—health, finance, personal conversations—on-device processing isn't just a feature, it's a requirement.

Prerequisites

Xcode 15.0+ with Swift 5.9
iOS 17.0+ physical device (required for Parts 2-4 which use the microphone; simulator works for this part)
Basic familiarity with SwiftUI
~250MB storage for the LLM model (Parts 2-4 add ~140MB more)

Project Setup

1. Create a New Xcode Project

Create a new iOS App with SwiftUI interface. Use the following settings:

Product Name: LocalAIPlayground (or your preferred name)
Interface: SwiftUI
Language: Swift
Storage: None

2. Add the RunAnywhere SDK

Add via Swift Package Manager:

File → Add Package Dependencies...
Enter: https://github.com/RunanywhereAI/runanywhere-sdks
For Dependency Rule, select Up to Next Major Version and enter 0.17.4
Add these products:
- RunAnywhere — Core SDK
- RunAnywhereLlamaCPP — LLM backend
- RunAnywhereONNX — STT/TTS/VAD backend

Swift Package Manager adding RunAnywhere SDK

After adding the package, verify the dependency is configured correctly in your project's Package Dependencies tab:

Package Dependencies showing version-based dependency rule

The dependency should show:

Name: runanywhere-sdks
Location: https://github.com/RunanywhereAI/runanywhere-sdks
Dependency Rule: Up to Next Major Version — 0.17.4 < 1.0.0

3. Configure Info.plist

Add required permissions (needed for later tutorials in this series):

xml

1<key>NSMicrophoneUsageDescription</key>
2<string>This app needs microphone access to transcribe your speech using on-device AI.</string>
3
4<key>CFBundleIdentifier</key>
5<string>$(PRODUCT_BUNDLE_IDENTIFIER)</string>
6
7<key>UIBackgroundModes</key>
8<array>
9    <string>audio</string>
10</array>

SDK Initialization

The SDK requires a specific initialization order:

swift

1import SwiftUI
2import RunAnywhere
3import RunAnywhereLlamaCPP
4import RunAnywhereONNX
5
6@main
7struct LocalAIPlaygroundApp: App {
8    @State private var isSDKInitialized = false
9    @State private var initError: String?
10
11    var body: some Scene {
12        WindowGroup {
13            Group {
14                if isSDKInitialized {
15                    ContentView()
16                } else {
17                    ProgressView("Initializing AI...")
18                }
19            }
20            .task { await initializeSDK() }
21        }
22    }
23
24    @MainActor
25    private func initializeSDK() async {
26        do {
27            // Step 1: Initialize core SDK
28            try RunAnywhere.initialize(environment: .development)
29
30            // Step 2: Register backends BEFORE registering models
31            LlamaCPP.register()  // For LLM text generation
32            ONNX.register()      // For STT, TTS, VAD (used in later tutorials)
33
34            // Step 3: Register the LLM model
35            RunAnywhere.registerModel(
36                id: "lfm2-350m-q4_k_m",
37                name: "LiquidAI LFM2 350M",
38                url: URL(string: "https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf")!,
39                framework: .llamaCpp,
40                memoryRequirement: 250_000_000
41            )
42
43            print("✅ RunAnywhere SDK v\(RunAnywhere.version) initialized")
44            isSDKInitialized = true
45        } catch {
46            print("❌ SDK init failed: \(error)")
47            initError = error.localizedDescription
48            isSDKInitialized = true  // Show UI with error state
49        }
50    }
51}

Architecture Overview

text

1┌─────────────────────────────────────────────────────┐
2│                 RunAnywhere Core                     │
3│         (Unified API, Model Management)              │
4├───────────────────────┬─────────────────────────────┤
5│   LlamaCPP Backend    │      ONNX Backend           │
6│   ─────────────────   │   ─────────────────         │
7│   • Text Generation   │   • Speech-to-Text          │
8│   • Chat Completion   │   • Text-to-Speech          │
9│   • Streaming         │   • Voice Activity (VAD)    │
10└───────────────────────┴─────────────────────────────┘

Downloading & Loading Models

After registering a model (as shown in the initialization code above), you need to download it once, then load it into memory:

swift

1// Check if already downloaded (avoids re-downloading)
2let isDownloaded = await RunAnywhere.isModelDownloaded("lfm2-350m-q4_k_m")
3
4if !isDownloaded {
5    // Download with progress tracking (~250MB)
6    let progressStream = try await RunAnywhere.downloadModel("lfm2-350m-q4_k_m")
7    for await progress in progressStream {
8        print("Download: \(Int(progress.overallProgress * 100))%")
9        if progress.stage == .completed { break }
10    }
11}
12
13// Load into memory (required before generating text)
14try await RunAnywhere.loadModel("lfm2-350m-q4_k_m")
15
16// Verify it's loaded
17let isLoaded = await RunAnywhere.isModelLoaded
18print("Model loaded: \(isLoaded)")

Note: Only one LLM model can be loaded at a time. Loading a different model automatically unloads the current one. The SDK uses loadModel() for LLMs, but Parts 2-3 will introduce loadSTTModel() and loadTTSVoice() for speech models—these use separate memory pools and can be loaded simultaneously.

Model download progress showing 77% complete

Streaming Text Generation

Now for the fun part—generating text with your on-device LLM:

swift

1func generateResponse(to prompt: String) async throws -> String {
2    let options = LLMGenerationOptions(
3        maxTokens: 256,
4        temperature: 0.7
5    )
6
7    let result = try await RunAnywhere.generateStream(prompt, options: options)
8
9    var fullResponse = ""
10    for try await token in result.stream {
11        fullResponse += token
12        // Update UI with each token
13        await MainActor.run {
14            self.responseText = fullResponse
15        }
16    }
17
18    // Get metrics
19    let metrics = try await result.result.value
20    print("Speed: \(metrics.tokensPerSecond) tok/s")
21
22    return fullResponse
23}

Models Reference

Model ID	Size	Notes
`lfm2-350m-q4_k_m`	~250MB	LiquidAI LFM2, fast, efficient

What's Next

In Part 2, we'll add speech-to-text capabilities using Whisper, including critical audio format handling that trips up most developers.

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.