Convert speech to text using Apple's on-device SpeechAnalyzer
and SpeechTranscriber
.
This provider uses Apple's SpeechAnalyzer
and SpeechTranscriber
to perform speech-to-text transcription entirely on-device. This is Apple's new advanced speech recognition model and is available in iOS 26 and onwards.
The audio
parameter accepts either an ArrayBuffer
or a base64-encoded string.
The API currently does not support streaming or live transcription. It is relatively easy to include, please let us know on Github if you need support for this.
The transcription model supports multiple languages with automatic language detection. You can configure a custom one with this API:
By default, the transcription model will use device language.
Apple's SpeechAnalyzer requires downloading language-specific assets to the device. The provider automatically requests assets when needed, but you can also prepare them manually:
When you call prepare()
for a language, the system first checks if the required assets are already present on the device. If they are, the method resolves immediately without any network activity, making subsequent embedding operations instant.
All language models and assets are stored in Apple's system-wide assets catalog, separate from your app bundle. This means zero impact on your app's size. Assets may already be available if the user has previously used other apps, or if system features have requested them.
For advanced use cases, you can access the speech transcription API directly:
Performance comparison showing transcription speed for a 34-minute audio file (source):
System | Processing Time | Performance |
---|---|---|
Apple SpeechAnalyzer | 45 seconds | Baseline |
MacWhisper Large V3 Turbo | 1 minute 41 seconds | 2.2× slower |