Generating

You can generate responses using Llama models with the Vercel AI SDK's generateText or streamText functions.

Requirements

Models must be downloaded and prepared before use
Sufficient device storage for model files (typically 1-4GB per model depending on quantization)

Text Generation

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

// Download model - returns the file path
const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Explain quantum computing in simple terms',
})

console.log(result.text)

Streaming

Stream responses for real-time output:

import { llama, downloadModel } from '@react-native-ai/llama'
import { streamText } from 'ai'

// Download model - returns the file path
const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const { textStream } = streamText({
  model,
  prompt: 'Write a short story about a robot learning to paint',
})

for await (const delta of textStream) {
  console.log(delta)
}

Tool Calling

You can enable tool calling for both generateText and streamText by passing tools to the AI SDK.

Setup

Define tools using the AI SDK tool helper:

import { tool } from 'ai'
import { z } from 'zod'

const getWeather = tool({
  description: 'Get current weather information',
  inputSchema: z.object({
    city: z.string(),
  }),
  execute: async ({ city }) => {
    return `Weather in ${city}: Sunny, 25°C`
  },
})

Basic Tool Usage

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('Qwen/Qwen2.5-3B-Instruct-GGUF/qwen2.5-3b-instruct-q3_k_m.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'What is the weather in Paris?',
  tools: {
    getWeather,
  },
})

Multimodal (Vision & Audio)

The Llama provider supports multimodal models that can process images and audio. To enable multimodal capabilities, provide a projectorPath when creating the model:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('owner/repo/vision-model.gguf')

const model = llama.languageModel(modelPath, {
  projectorPath: '/path/to/mmproj-model.gguf',
})

await model.prepare()

// Use with images
const result = await generateText({
  model,
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What do you see in this image?' },
        {
          type: 'file',
          mediaType: 'image/jpeg',
          data: 'file:///path/to/image.jpg', // or base64 data URL
        },
      ],
    },
  ],
})

Supported Formats

Images: JPEG, PNG, BMP, GIF, TGA, HDR, PIC, PNM
Audio: WAV, MP3

Supported URL Patterns

file:// - Local file paths
data: - Base64 data URLs

Note: HTTP URLs are not yet supported. Use local files or base64 data URLs.

Reasoning Models

Models that support reasoning (like DeepSeek-R1) automatically handle <think> tags. The reasoning content is separated from the main response:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('owner/repo/deepseek-r1.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Solve this math problem step by step: 2x + 5 = 13',
})

// Access main response
console.log(result.text)

// Access reasoning content (if present)
console.log(result.reasoning)

When streaming, reasoning tokens are emitted separately via reasoning-start, reasoning-delta, and reasoning-end events.

JSON Mode

Generate structured JSON responses:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateObject } from 'ai'
import { z } from 'zod'

const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const { object } = await generateObject({
  model,
  schema: z.object({
    name: z.string(),
    age: z.number(),
    hobbies: z.array(z.string()),
  }),
  prompt: 'Generate a fictional person profile',
})

console.log(object)
// { name: 'Alice', age: 28, hobbies: ['reading', 'hiking'] }

Available Options

Configure model behavior with generation options:

Option	Type	Description
`temperature`	number (0-1)	Controls randomness. Higher = more creative
`maxTokens`	number	Maximum tokens to generate
`topP`	number (0-1)	Nucleus sampling threshold
`topK`	number	Top-K sampling parameter
`presencePenalty`	number	Penalize tokens based on presence
`frequencyPenalty`	number	Penalize tokens based on frequency
`stopSequences`	string[]	Stop generation at these sequences
`seed`	number	Random seed for reproducibility

Example with all options:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Write a creative story',
  temperature: 0.8,
  maxTokens: 500,
  topP: 0.9,
  topK: 40,
  presencePenalty: 0.5,
  frequencyPenalty: 0.5,
  stopSequences: ['THE END'],
  seed: 42,
})

Model Configuration Options

When creating a model instance, you can configure llama.rn specific options via contextParams:

const model = llama.languageModel(modelPath, {
  contextParams: {
    n_ctx: 4096, // Context size (default: 2048, or 4096 for multimodal)
    n_gpu_layers: 99, // Number of GPU layers (default: 99)
  },
})

For multimodal models:

const model = llama.languageModel(modelPath, {
  projectorPath: '/path/to/mmproj.gguf', // Required for multimodal
  projectorUseGpu: true, // Use GPU for multimodal (default: true)
  contextParams: {
    n_ctx: 4096,
    n_gpu_layers: 99,
  },
})

Need React or React Native expertise you can count on?

Let's talk Let's talk

Need help with React or React Native projects?

We support teams building scalable apps with React and React Native.

Let's talk Let's talk

#Generating

#Requirements

#Text Generation

#Streaming

#Tool Calling

#Setup

#Basic Tool Usage

#Multimodal (Vision & Audio)

#Supported Formats

#Supported URL Patterns

#Reasoning Models

#JSON Mode

#Available Options

#Model Configuration Options