Convert text to speech using Apple's on-device speech synthesis capabilities.
This provider uses Apple's AVSpeechSynthesizer
to perform text-to-speech entirely on-device. Audio is synthesized locally and returned to your app as a WAV byte stream.
AVSpeechSynthesizer.write(...)
.Sample rate and bit depth are determined by the system voice (commonly 44.1 kHz, 16‑bit PCM; some devices may return 32‑bit float which is encoded accordingly in the WAV header).
You can control the output language or select a specific voice by identifier. To see what voices are available on the device:
Use a specific voice by passing its identifier
:
Or specify only language
to use the system’s default voice for that locale:
If both voice
and language
are provided, voice
takes priority.
If only language
is provided, the default system voice for that locale is used.
The system provides a catalog of built‑in voices, including enhanced and premium variants, which may require a one‑time system download. If you have created a Personal Voice on device (iOS 17+), it appears in the list and is flagged accordingly.
Voice assets are managed by the operating system. To add or manage voices, use iOS Settings → Accessibility → Read & Speak → Voices. This provider does not bundle or manage voice downloads.
For advanced use cases, you can access the speech API directly:
Returned voice objects include:
identifier
: stringname
: stringlanguage
: BCP‑47 code, e.g. en-US
quality
: default
| enhanced
| premium
isPersonalVoice
: boolean (iOS 17+)isNoveltyVoice
: boolean (iOS 17+)On iOS 17+, the provider requests Personal Voice authorization before listing voices so your Personal Voice can be surfaced if available.