Azure Speech
Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call.
OpenClaw uses the Azure Speech REST API directly with SSML and sends the
provider-owned output format through X-Microsoft-OutputFormat.
| Detail | Value |
|---|---|
| Website | Azure AI Speech |
| Docs | Speech REST text-to-speech |
| Auth | AZURE_SPEECH_KEY plus AZURE_SPEECH_REGION |
| Default voice | en-US-JennyNeural |
| Default file output | audio-24khz-48kbitrate-mono-mp3 |
| Default voice-note file | ogg-24khz-16bit-mono-opus |
Getting started
Section titled “Getting started”Create an Azure Speech resource
In the Azure portal, create a Speech resource. Copy KEY 1 from Resource Management > Keys and Endpoint, and copy the resource location such as
eastus.AZURE_SPEECH_KEY=AZURE_SPEECH_REGION=eastus
Select Azure Speech in messages.tts
{messages: {tts: {auto: "always",provider: "azure-speech",providers: {"azure-speech": {voice: "en-US-JennyNeural",lang: "en-US",},},},},}Send a message
Send a reply through any connected channel. OpenClaw synthesizes the audio with Azure Speech and delivers MP3 for standard audio, or Ogg/Opus when the channel expects a voice note.
Configuration options
Section titled “Configuration options”| Option | Path | Description |
|---|---|---|
apiKey | messages.tts.providers.azure-speech.apiKey | Azure Speech resource key. Falls back to AZURE_SPEECH_KEY, AZURE_SPEECH_API_KEY, or SPEECH_KEY. |
region | messages.tts.providers.azure-speech.region | Azure Speech resource region. Falls back to AZURE_SPEECH_REGION or SPEECH_REGION. |
endpoint | messages.tts.providers.azure-speech.endpoint | Optional Azure Speech endpoint/base URL override. |
baseUrl | messages.tts.providers.azure-speech.baseUrl | Optional Azure Speech base URL override. |
voice | messages.tts.providers.azure-speech.voice | Azure voice ShortName (default en-US-JennyNeural). |
lang | messages.tts.providers.azure-speech.lang | SSML language code (default en-US). |
outputFormat | messages.tts.providers.azure-speech.outputFormat | Audio-file output format (default audio-24khz-48kbitrate-mono-mp3). |
voiceNoteOutputFormat | messages.tts.providers.azure-speech.voiceNoteOutputFormat | Voice-note output format (default ogg-24khz-16bit-mono-opus). |
Authentication
Azure Speech uses a Speech resource key, not an Azure OpenAI key. The key
is sent as Ocp-Apim-Subscription-Key; OpenClaw derives
`https://
.tts.speech.microsoft.comfromregionunless you provideendpointorbaseUrl`.
Voice names
Use the Azure Speech voice ShortName value, for example
en-US-JennyNeural. The bundled provider can list voices through the
same Speech resource and filters voices marked deprecated or retired.
Audio outputs
Azure accepts output formats such as audio-24khz-48kbitrate-mono-mp3,
ogg-24khz-16bit-mono-opus, and riff-24khz-16bit-mono-pcm. OpenClaw
requests Ogg/Opus for voice-note targets so channels can send native
voice bubbles without an extra MP3 conversion.
Alias
azure is accepted as a provider alias for existing PRs and user config,
but new config should use azure-speech to avoid confusion with Azure
OpenAI model providers.
Related
Section titled “Related”TTS overview, providers, and messages.tts config.
Full config reference including messages.tts settings.
All bundled OpenClaw providers.
Common issues and debugging steps.