January 20, 2026

RunAnywhere Swift SDK Part 1: Chat with LLMs On-Device

RunAnywhere Swift SDK Part 1: Chat with LLMs On-Device
DEVELOPERS

Run LLMs Entirely On-Device with iOS


This is Part 1 of our RunAnywhere Swift SDK tutorial series:

  1. Chat with LLMs (this post) — Project setup and streaming text generation
  2. Speech-to-Text — Real-time transcription with Whisper
  3. Text-to-Speech — Natural voice synthesis with Piper
  4. Voice Pipeline — Full voice assistant with VAD

The promise of on-device AI is compelling: instant responses, complete privacy, and no API costs. But implementing it has traditionally been challenging—managing model formats, optimizing for mobile hardware, and building reliable interfaces.

RunAnywhere changes this. It's a production-ready SDK that brings powerful AI capabilities to iOS with a simple, unified API. In this tutorial, we'll set up the SDK and build a streaming chat interface with LLMs.

Why On-Device AI?

AspectCloud AIOn-Device AI
PrivacyData sent to serversData stays on device
LatencyNetwork round-tripInstant local processing
OfflineRequires internetWorks anywhere
CostPer-request billingOne-time download

For applications handling sensitive data—health, finance, personal conversations—on-device processing isn't just a feature, it's a requirement.

Prerequisites

  • Xcode 15.0+ with Swift 5.9
  • iOS 17.0+ physical device (required for Parts 2-4 which use the microphone; simulator works for this part)
  • Basic familiarity with SwiftUI
  • ~250MB storage for the LLM model (Parts 2-4 add ~140MB more)

Project Setup

1. Create a New Xcode Project

Create a new iOS App with SwiftUI interface. Use the following settings:

  • Product Name: LocalAIPlayground (or your preferred name)
  • Interface: SwiftUI
  • Language: Swift
  • Storage: None
Xcode project options dialog

2. Add the RunAnywhere SDK

Add via Swift Package Manager:

  1. File → Add Package Dependencies...
  2. Enter: https://github.com/RunanywhereAI/runanywhere-sdks
  3. For Dependency Rule, select Up to Next Major Version and enter 0.17.4
  4. Add these products:
    • RunAnywhere — Core SDK
    • RunAnywhereLlamaCPP — LLM backend
    • RunAnywhereONNX — STT/TTS/VAD backend
Swift Package Manager adding RunAnywhere SDK

After adding the package, verify the dependency is configured correctly in your project's Package Dependencies tab:

Package Dependencies showing version-based dependency rule

The dependency should show:

  • Name: runanywhere-sdks
  • Location: https://github.com/RunanywhereAI/runanywhere-sdks
  • Dependency Rule: Up to Next Major Version — 0.17.4 < 1.0.0

3. Configure Info.plist

Add required permissions (needed for later tutorials in this series):

xml
1<key>NSMicrophoneUsageDescription</key>
2<string>This app needs microphone access to transcribe your speech using on-device AI.</string>
3
4<key>CFBundleIdentifier</key>
5<string>$(PRODUCT_BUNDLE_IDENTIFIER)</string>
6
7<key>UIBackgroundModes</key>
8<array>
9 <string>audio</string>
10</array>

SDK Initialization

The SDK requires a specific initialization order:

swift
1import SwiftUI
2import RunAnywhere
3import RunAnywhereLlamaCPP
4import RunAnywhereONNX
5
6@main
7struct LocalAIPlaygroundApp: App {
8 @State private var isSDKInitialized = false
9 @State private var initError: String?
10
11 var body: some Scene {
12 WindowGroup {
13 Group {
14 if isSDKInitialized {
15 ContentView()
16 } else {
17 ProgressView("Initializing AI...")
18 }
19 }
20 .task { await initializeSDK() }
21 }
22 }
23
24 @MainActor
25 private func initializeSDK() async {
26 do {
27 // Step 1: Initialize core SDK
28 try RunAnywhere.initialize(environment: .development)
29
30 // Step 2: Register backends BEFORE registering models
31 LlamaCPP.register() // For LLM text generation
32 ONNX.register() // For STT, TTS, VAD (used in later tutorials)
33
34 // Step 3: Register the LLM model
35 RunAnywhere.registerModel(
36 id: "lfm2-350m-q4_k_m",
37 name: "LiquidAI LFM2 350M",
38 url: URL(string: "https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf")!,
39 framework: .llamaCpp,
40 memoryRequirement: 250_000_000
41 )
42
43 print("✅ RunAnywhere SDK v\(RunAnywhere.version) initialized")
44 isSDKInitialized = true
45 } catch {
46 print("❌ SDK init failed: \(error)")
47 initError = error.localizedDescription
48 isSDKInitialized = true // Show UI with error state
49 }
50 }
51}
App initializing SDK on launch

Architecture Overview

text
1┌─────────────────────────────────────────────────────┐
2│ RunAnywhere Core │
3│ (Unified API, Model Management) │
4├───────────────────────┬─────────────────────────────┤
5│ LlamaCPP Backend │ ONNX Backend │
6│ ───────────────── │ ───────────────── │
7│ • Text Generation │ • Speech-to-Text │
8│ • Chat Completion │ • Text-to-Speech │
9│ • Streaming │ • Voice Activity (VAD) │
10└───────────────────────┴─────────────────────────────┘

Downloading & Loading Models

After registering a model (as shown in the initialization code above), you need to download it once, then load it into memory:

swift
1// Check if already downloaded (avoids re-downloading)
2let isDownloaded = await RunAnywhere.isModelDownloaded("lfm2-350m-q4_k_m")
3
4if !isDownloaded {
5 // Download with progress tracking (~250MB)
6 let progressStream = try await RunAnywhere.downloadModel("lfm2-350m-q4_k_m")
7 for await progress in progressStream {
8 print("Download: \(Int(progress.overallProgress * 100))%")
9 if progress.stage == .completed { break }
10 }
11}
12
13// Load into memory (required before generating text)
14try await RunAnywhere.loadModel("lfm2-350m-q4_k_m")
15
16// Verify it's loaded
17let isLoaded = await RunAnywhere.isModelLoaded
18print("Model loaded: \(isLoaded)")

Note: Only one LLM model can be loaded at a time. Loading a different model automatically unloads the current one. The SDK uses loadModel() for LLMs, but Parts 2-3 will introduce loadSTTModel() and loadTTSVoice() for speech models—these use separate memory pools and can be loaded simultaneously.

Model download progress showing 77% complete

Streaming Text Generation

Now for the fun part—generating text with your on-device LLM:

swift
1func generateResponse(to prompt: String) async throws -> String {
2 let options = LLMGenerationOptions(
3 maxTokens: 256,
4 temperature: 0.7
5 )
6
7 let result = try await RunAnywhere.generateStream(prompt, options: options)
8
9 var fullResponse = ""
10 for try await token in result.stream {
11 fullResponse += token
12 // Update UI with each token
13 await MainActor.run {
14 self.responseText = fullResponse
15 }
16 }
17
18 // Get metrics
19 let metrics = try await result.result.value
20 print("Speed: \(metrics.tokensPerSecond) tok/s")
21
22 return fullResponse
23}
Chat interface with streaming response

Models Reference

Model IDSizeNotes
lfm2-350m-q4_k_m~250MBLiquidAI LFM2, fast, efficient

What's Next

In Part 2, we'll add speech-to-text capabilities using Whisper, including critical audio format handling that trips up most developers.


Resources


Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Logo

RunAnywhere

Connect with developers, share ideas, get support, and stay updated on the latest features. Our Discord community is the heart of everything we build.

Company

Copyright © 2025 RunAnywhere, Inc.