Convert text to speech using Apple's on-device speech synthesis capabilities.
This provider uses Apple's AVSpeechSynthesizer to perform text-to-speech entirely on-device. Audio is synthesized locally and returned to your app as a WAV byte stream.
AVSpeechSynthesizer.write(...).Sample rate and bit depth are determined by the system voice (commonly 44.1 kHz, 16‑bit PCM; some devices may return 32‑bit float which is encoded accordingly in the WAV header).
You can control the output language or select a specific voice by identifier. To see what voices are available on the device:
Use a specific voice by passing its identifier:
Or specify only language to use the system’s default voice for that locale:
If both voice and language are provided, voice takes priority.
If only language is provided, the default system voice for that locale is used.
The system provides a catalog of built‑in voices, including enhanced and premium variants, which may require a one‑time system download. If you have created a Personal Voice on device (iOS 17+), it appears in the list and is flagged accordingly.
Voice assets are managed by the operating system. To add or manage voices, use iOS Settings → Accessibility → Read & Speak → Voices. This provider does not bundle or manage voice downloads.
For advanced use cases, you can access the speech API directly:
Returned voice objects include:
identifier: stringname: stringlanguage: BCP‑47 code, e.g. en-USquality: default | enhanced | premiumisPersonalVoice: boolean (iOS 17+)isNoveltyVoice: boolean (iOS 17+)On iOS 17+, the provider requests Personal Voice authorization before listing voices so your Personal Voice can be surfaced if available.