This guide covers the complete lifecycle of MLC models - from discovery and download to cleanup and removal.
The package includes a prebuilt runtime optimized for the following models:
| Model ID | Size | Best For |
|---|---|---|
Qwen2.5-0.5B-Instruct | ~600MB | Fast responses, basic conversations |
Llama-3.2-1B-Instruct | ~1.2GB | Balanced performance and quality |
Llama-3.2-3B-Instruct | ~2GB | High quality responses, complex reasoning |
Phi-3.5-mini-instruct | ~2.3GB | Code generation, technical tasks |
Note: These models use q4f16_1 quantization (4-bit weights, 16-bit activations) optimized for mobile devices. For other models, you'll need to build MLC from source (documentation coming soon).
Get the list of models included in the runtime:
Create a model instance using the mlc.languageModel() method:
Models need to be downloaded to the device before use.
You can track download progress:
After downloading, prepare the model for inference:
Once prepared, use the model with AI SDK functions:
Unload the current model from memory to free resources:
Delete downloaded model files to free storage: