Generating

You can generate responses using Llama models with the Vercel AI SDK's generateText or streamText functions.

Requirements

  • Models must be downloaded and prepared before use
  • Sufficient device storage for model files (typically 1-4GB per model depending on quantization)

Text Generation

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

// Download model - returns the file path
const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Explain quantum computing in simple terms',
})

console.log(result.text)

Streaming

Stream responses for real-time output:

import { llama, downloadModel } from '@react-native-ai/llama'
import { streamText } from 'ai'

// Download model - returns the file path
const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const { textStream } = streamText({
  model,
  prompt: 'Write a short story about a robot learning to paint',
})

for await (const delta of textStream) {
  console.log(delta)
}

Tool Calling

You can enable tool calling for both generateText and streamText by passing tools to the AI SDK.

Setup

Define tools using the AI SDK tool helper:

import { tool } from 'ai'
import { z } from 'zod'

const getWeather = tool({
  description: 'Get current weather information',
  inputSchema: z.object({
    city: z.string(),
  }),
  execute: async ({ city }) => {
    return `Weather in ${city}: Sunny, 25°C`
  },
})

Basic Tool Usage

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('Qwen/Qwen2.5-3B-Instruct-GGUF/qwen2.5-3b-instruct-q3_k_m.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'What is the weather in Paris?',
  tools: {
    getWeather,
  },
})

Multimodal (Vision & Audio)

The Llama provider supports multimodal models that can process images and audio. To enable multimodal capabilities, provide a projectorPath when creating the model:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('owner/repo/vision-model.gguf')

const model = llama.languageModel(modelPath, {
  projectorPath: '/path/to/mmproj-model.gguf',
})

await model.prepare()

// Use with images
const result = await generateText({
  model,
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What do you see in this image?' },
        {
          type: 'file',
          mediaType: 'image/jpeg',
          data: 'file:///path/to/image.jpg', // or base64 data URL
        },
      ],
    },
  ],
})

Supported Formats

  • Images: JPEG, PNG, BMP, GIF, TGA, HDR, PIC, PNM
  • Audio: WAV, MP3

Supported URL Patterns

  • file:// - Local file paths
  • data: - Base64 data URLs

Note: HTTP URLs are not yet supported. Use local files or base64 data URLs.

Reasoning Models

Models that support reasoning (like DeepSeek-R1) automatically handle <think> tags. The reasoning content is separated from the main response:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('owner/repo/deepseek-r1.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Solve this math problem step by step: 2x + 5 = 13',
})

// Access main response
console.log(result.text)

// Access reasoning content (if present)
console.log(result.reasoning)

When streaming, reasoning tokens are emitted separately via reasoning-start, reasoning-delta, and reasoning-end events.

JSON Mode

Generate structured JSON responses:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateObject } from 'ai'
import { z } from 'zod'

const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const { object } = await generateObject({
  model,
  schema: z.object({
    name: z.string(),
    age: z.number(),
    hobbies: z.array(z.string()),
  }),
  prompt: 'Generate a fictional person profile',
})

console.log(object)
// { name: 'Alice', age: 28, hobbies: ['reading', 'hiking'] }

Available Options

Configure model behavior with generation options:

OptionTypeDescription
temperaturenumber (0-1)Controls randomness. Higher = more creative
maxTokensnumberMaximum tokens to generate
topPnumber (0-1)Nucleus sampling threshold
topKnumberTop-K sampling parameter
presencePenaltynumberPenalize tokens based on presence
frequencyPenaltynumberPenalize tokens based on frequency
stopSequencesstring[]Stop generation at these sequences
seednumberRandom seed for reproducibility

Example with all options:

import { llama, downloadModel } from '@react-native-ai/llama'
import { generateText } from 'ai'

const modelPath = await downloadModel('ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf')

const model = llama.languageModel(modelPath)
await model.prepare()

const result = await generateText({
  model,
  prompt: 'Write a creative story',
  temperature: 0.8,
  maxTokens: 500,
  topP: 0.9,
  topK: 40,
  presencePenalty: 0.5,
  frequencyPenalty: 0.5,
  stopSequences: ['THE END'],
  seed: 42,
})

Model Configuration Options

When creating a model instance, you can configure llama.rn specific options via contextParams:

const model = llama.languageModel(modelPath, {
  contextParams: {
    n_ctx: 4096, // Context size (default: 2048, or 4096 for multimodal)
    n_gpu_layers: 99, // Number of GPU layers (default: 99)
  },
})

For multimodal models:

const model = llama.languageModel(modelPath, {
  projectorPath: '/path/to/mmproj.gguf', // Required for multimodal
  projectorUseGpu: true, // Use GPU for multimodal (default: true)
  contextParams: {
    n_ctx: 4096,
    n_gpu_layers: 99,
  },
})

Need React or React Native expertise you can count on?

Need help with React or React Native projects?
We support teams building scalable apps with React and React Native.