This guide covers the complete lifecycle of Llama models - from discovery and download to cleanup and removal.
Unlike MLC which has a prebuilt set of models, the Llama provider can run any GGUF model from HuggingFace. You can browse available models at HuggingFace GGUF Models.
Here are some popular models that work well on mobile devices:
| Model ID | Size | Best For |
|---|---|---|
ggml-org/SmolLM3-3B-GGUF/SmolLM3-Q4_K_M.gguf | ~1.8GB | Balanced performance and quality |
Qwen/Qwen2.5-3B-Instruct-GGUF/qwen2.5-3b-instruct-q3_k_m.gguf | ~1.9GB | General conversations |
lmstudio-community/gemma-2-2b-it-GGUF/gemma-2-2b-it-Q3_K_M.gguf | ~2.3GB | High quality responses |
Note: When selecting models, consider quantization levels (Q3, Q4, Q5, etc.). Lower quantization = smaller size but potentially lower quality. Q4_K_M is a good balance for mobile.
Get the list of models that have been downloaded to the device:
Create a model instance using the llama.languageModel() method:
With configuration options:
Check if a model is already downloaded:
Models are downloaded from HuggingFace automatically:
You can track download progress:
After downloading, prepare the model for inference (loads it into memory):
Once prepared, use the model with AI SDK functions:
Unload the model from memory to free resources:
Delete downloaded model files to free storage:
By default, models are stored in ${DocumentDir}/llama-models/. You can customize this:
llama.languageModel(modelId, options?)Creates a language model instance.
modelId: Model identifier in format owner/repo/filename.ggufoptions:
n_ctx: Context size (default: 2048)n_gpu_layers: Number of GPU layers (default: 99)contextParams: Additional llama.rn context parametersLlamaEnginegetModels(): Get list of downloaded modelsisDownloaded(modelId): Check if a model is downloadedsetStoragePath(path): Set custom storage directorydownload(progressCallback?): Download model from HuggingFaceisDownloaded(): Check if this model is downloadedprepare(): Initialize/load model into memoryunload(): Release model from memoryremove(): Delete model from disk