The Llama provider enables you to run GGUF models directly on-device in React Native applications using llama.rn. This allows you to download and run any GGUF model from HuggingFace for privacy, performance, and offline capabilities.
Install the Llama provider and its peer dependencies:
While you can use the Llama provider standalone, we recommend using it with the Vercel AI SDK for a much better developer experience. The AI SDK provides unified APIs, streaming support, and advanced features. To use with the AI SDK, you'll need v5 and required polyfills:
For use with the Expo framework and CNG builds, you will need expo-build-properties to utilize iOS and OpenCL features. Simply add the following to your app.json or app.config.js file:
For all other installation tips and tricks, refer to the llama.rn Expo documentation.
The Llama provider supports multiple model types:
| Model Type | Method | Use Case |
|---|---|---|
| Language Model | llama.languageModel() | Text generation, chat, reasoning |
| Embedding Model | llama.textEmbeddingModel() | Text embeddings for RAG, similarity |
| Speech Model | llama.speechModel() | Text-to-speech with vocoder |
Import the Llama provider and use it with the AI SDK:
Models are identified using the HuggingFace format: owner/repo/filename.gguf
For example:
ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.ggufQwen/Qwen2.5-3B-Instruct-GGUF/qwen2.5-3b-instruct-q3_k_m.gguflmstudio-community/gemma-2-2b-it-GGUF/gemma-2-2b-it-Q3_K_M.ggufYou can find GGUF models on HuggingFace.