Deepgram
Deepgram is a speech-to-text API. In OpenClaw it is used for inbound
audio/voice-note transcription through tools.media.audio and for Voice Call
streaming STT through plugins.entries.voice-call.config.streaming.
For batch transcription, OpenClaw uploads the complete audio file to Deepgram
and injects the transcript into the reply pipeline ({{Transcript}} +
[Audio] block). For Voice Call streaming, OpenClaw forwards live G.711
u-law frames over Deepgram’s WebSocket listen endpoint and emits partial or
final transcripts as Deepgram returns them.
| Detail | Value |
|---|---|
| Website | deepgram.com |
| Docs | developers.deepgram.com |
| Auth | DEEPGRAM_API_KEY |
| Default model | nova-3 |
Getting started
Section titled “Getting started”Set your API key
Add your Deepgram API key to the environment:
DEEPGRAM_API_KEY=dg_...Enable the audio provider
{tools: {media: {audio: {enabled: true,models: [{ provider: "deepgram", model: "nova-3" }],},},},}Send a voice note
Send an audio message through any connected channel. OpenClaw transcribes it via Deepgram and injects the transcript into the reply pipeline.
Configuration options
Section titled “Configuration options”| Option | Path | Description |
|---|---|---|
model | tools.media.audio.models[].model | Deepgram model id (default: nova-3) |
language | tools.media.audio.models[].language | Language hint (optional) |
detect_language | tools.media.audio.providerOptions.deepgram.detect_language | Enable language detection (optional) |
punctuate | tools.media.audio.providerOptions.deepgram.punctuate | Enable punctuation (optional) |
smart_format | tools.media.audio.providerOptions.deepgram.smart_format | Enable smart formatting (optional) |
{ tools: { media: { audio: { enabled: true, models: [{ provider: "deepgram", model: "nova-3", language: "en" }], }, }, },}{ tools: { media: { audio: { enabled: true, providerOptions: { deepgram: { detect_language: true, punctuate: true, smart_format: true, }, }, models: [{ provider: "deepgram", model: "nova-3" }], }, }, },}Voice Call streaming STT
Section titled “Voice Call streaming STT”The bundled deepgram plugin also registers a realtime transcription provider
for the Voice Call plugin.
| Setting | Config path | Default |
|---|---|---|
| API key | plugins.entries.voice-call.config.streaming.providers.deepgram.apiKey | Falls back to DEEPGRAM_API_KEY |
| Model | ...deepgram.model | nova-3 |
| Language | ...deepgram.language | (unset) |
| Encoding | ...deepgram.encoding | mulaw |
| Sample rate | ...deepgram.sampleRate | 8000 |
| Endpointing | ...deepgram.endpointingMs | 800 |
| Interim results | ...deepgram.interimResults | true |
{ plugins: { entries: { "voice-call": { config: { streaming: { enabled: true, provider: "deepgram", providers: { deepgram: { apiKey: "${DEEPGRAM_API_KEY}", model: "nova-3", endpointingMs: 800, language: "en-US", }, }, }, }, }, }, },}Authentication
Authentication follows the standard provider auth order. DEEPGRAM_API_KEY is
the simplest path.
Proxy and custom endpoints
Override endpoints or headers with tools.media.audio.baseUrl and
tools.media.audio.headers when using a proxy.
Output behavior
Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).
Related
Section titled “Related”Audio, image, and video processing pipeline overview.
Full config reference including media tool settings.
Common issues and debugging steps.
Frequently asked questions about OpenClaw setup.