Getting Started

The Llama provider enables you to run GGUF models directly on-device in React Native applications using llama.rn. This allows you to download and run any GGUF model from HuggingFace for privacy, performance, and offline capabilities.

Installation

Install the Llama provider and its peer dependencies:

npm install @react-native-ai/llama llama.rn

While you can use the Llama provider standalone, we recommend using it with the Vercel AI SDK for a much better developer experience. The AI SDK provides unified APIs, streaming support, and advanced features. To use with the AI SDK, you'll need v5 and required polyfills:

npm install ai

Requirements

  • React Native >= 0.76.0 - Required for native module functionality
  • llama.rn >= 0.10.0 - The underlying llama.cpp bindings

Expo Setup

For use with the Expo framework and CNG builds, you will need expo-build-properties to utilize iOS and OpenCL features. Simply add the following to your app.json or app.config.js file:

module.exports = {
  expo: {
    // ...
    plugins: [
      // ...
      [
        'llama.rn',
        // optional fields, below are the default values
        {
          enableEntitlements: true,
          entitlementsProfile: 'production',
          forceCxx20: true,
          enableOpenCL: true,
        },
      ],
    ],
  },
}

For all other installation tips and tricks, refer to the llama.rn Expo documentation.

Available Model Types

The Llama provider supports multiple model types:

Model TypeMethodUse Case
Language Modelllama.languageModel()Text generation, chat, reasoning
Embedding Modelllama.textEmbeddingModel()Text embeddings for RAG, similarity
Speech Modelllama.speechModel()Text-to-speech with vocoder

Basic Usage

Import the Llama provider and use it with the AI SDK:

import { llama, downloadModel } from '@react-native-ai/llama'
import { streamText } from 'ai'

// Download model from HuggingFace - returns the file path
const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

// Create model instance with the path
const model = llama.languageModel(modelPath)

// Initialize model (loads into memory)
await model.prepare()

const { textStream } = streamText({
  model,
  prompt: 'Explain quantum computing in simple terms',
})

for await (const delta of textStream) {
  console.log(delta)
}

// Cleanup when done
await model.unload()

Model ID Format

Models are identified using the HuggingFace format: owner/repo/filename.gguf

For example:

  • ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf
  • Qwen/Qwen2.5-3B-Instruct-GGUF/qwen2.5-3b-instruct-q3_k_m.gguf
  • lmstudio-community/gemma-2-2b-it-GGUF/gemma-2-2b-it-Q3_K_M.gguf

You can find GGUF models on HuggingFace.

Next Steps

  • Model Management - Complete guide to model lifecycle, downloading, and API reference
  • Generating - Learn how to generate text, use multimodal inputs, and stream responses
  • Embeddings - Generate text embeddings for RAG and similarity search

Need React or React Native expertise you can count on?

Need help with React or React Native projects?
We support teams building scalable apps with React and React Native.