Convert speech to text using Apple's on-device SpeechAnalyzer and SpeechTranscriber.
This provider uses Apple's SpeechAnalyzer and SpeechTranscriber to perform speech-to-text transcription entirely on-device. This is Apple's new advanced speech recognition model and is available in iOS 26 and onwards.
The audio parameter accepts either an ArrayBuffer or a base64-encoded string.
The API currently does not support streaming or live transcription. It is relatively easy to include, please let us know on Github if you need support for this.
The transcription model supports multiple languages with automatic language detection. You can configure a custom language when creating the model:
By default, the transcription model will use device language.
Apple's SpeechAnalyzer requires downloading language-specific assets to the device. While the provider automatically prepares assets when needed, you can call prepare() ahead of time for better performance:
Calling prepare() ahead of time is recommended to avoid delays on first use. If not called, the model will auto-prepare when first used, but a warning will be logged.
When you call prepare(), the system first checks if the required assets are already present on the device. If they are, the method resolves immediately without any network activity.
All language models and assets are stored in Apple's system-wide assets catalog, separate from your app bundle. This means zero impact on your app's size. Assets may already be available if the user has previously used other apps, or if system features have requested them.
For advanced use cases, you can access the speech transcription API directly:
Performance comparison showing transcription speed for a 34-minute audio file (source):
| System | Processing Time | Performance |
|---|---|---|
| Apple SpeechAnalyzer | 45 seconds | Baseline |
| MacWhisper Large V3 Turbo | 1 minute 41 seconds | 2.2× slower |