Google (Gemini)

The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.

Provider: google
Auth: GEMINI_API_KEY or GOOGLE_API_KEY
API: Google Gemini API
Runtime option: provider/model agentRuntime.id: "google-gemini-cli" reuses Gemini CLI OAuth while keeping model refs canonical as google/*.

Getting started

Choose your preferred auth method and follow the setup steps.

Best for: standard Gemini API access through Google AI Studio.

Run onboarding

openclaw onboard --auth-choice gemini-api-key

Or pass the key directly:

openclaw onboard --non-interactive \
  --mode local \
  --auth-choice gemini-api-key \
  --gemini-api-key "$GEMINI_API_KEY"

Set a default model

{
  agents: {
    defaults: {
      model: { primary: "google/gemini-3.1-pro-preview" },
    },
  },
}

Verify the model is available
Terminal window
```
openclaw models list --provider google
```

Best for: reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.

Install the Gemini CLI
The local gemini command must be available on PATH.
Terminal window
```
# Homebrew
brew install gemini-cli

# or npm
npm install -g @google/gemini-cli
```
OpenClaw supports both Homebrew installs and global npm installs, including common Windows/npm layouts.

openclaw models auth login --provider google-gemini-cli --set-default

Verify the model is available
Terminal window
```
openclaw models list --provider google
```

Default model: google/gemini-3.1-pro-preview
Runtime: google-gemini-cli
Alias: gemini-cli

Gemini 3.1 Pro’s Gemini API model id is gemini-3.1-pro-preview. OpenClaw accepts the shorter google/gemini-3.1-pro as a convenience alias and normalizes it before provider calls.

Environment variables:

OPENCLAW_GEMINI_OAUTH_CLIENT_ID
OPENCLAW_GEMINI_OAUTH_CLIENT_SECRET

(Or the GEMINI_CLI_* variants.)

google-gemini-cli/* model refs are legacy compatibility aliases. New configs should use google/* model refs plus the google-gemini-cli runtime when they want local Gemini CLI execution.

Capabilities

Capability	Supported
Chat completions	Yes
Image generation	Yes
Music generation	Yes
Text-to-speech	Yes
Realtime voice	Yes (Google Live API)
Image understanding	Yes
Audio transcription	Yes
Video understanding	Yes
Web search (Grounding)	Yes
Thinking/reasoning	Yes (Gemini 2.5+ / Gemini 3+)
Gemma 4 models	Yes

Web search

The bundled gemini web-search provider uses Gemini Google Search grounding. Configure a dedicated search key under plugins.entries.google.config.webSearch, or let it reuse models.providers.google.apiKey after GEMINI_API_KEY:

{
  plugins: {
    entries: {
      google: {
        config: {
          webSearch: {
            apiKey: "AIza...", // optional if GEMINI_API_KEY or models.providers.google.apiKey is set
            baseUrl: "https://generativelanguage.googleapis.com/v1beta", // falls back to models.providers.google.baseUrl
            model: "gemini-2.5-flash",
          },
        },
      },
    },
  },
}

Credential precedence is dedicated webSearch.apiKey, then GEMINI_API_KEY, then models.providers.google.apiKey. webSearch.baseUrl is optional and exists for operator proxies or compatible Gemini API endpoints; when omitted, Gemini web search reuses models.providers.google.baseUrl. See Gemini search for the provider-specific tool behavior.

Image generation

The bundled google image-generation provider defaults to google/gemini-3.1-flash-image-preview.

Also supports google/gemini-3-pro-image-preview
Generate: up to 4 images per request
Edit mode: enabled, up to 5 input images
Geometry controls: size, aspectRatio, and resolution

To use Google as the default image provider:

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "google/gemini-3.1-flash-image-preview",
      },
    },
  },
}

Video generation

The bundled google plugin also registers video generation through the shared video_generate tool.

Default video model: google/veo-3.1-fast-generate-preview
Modes: text-to-video, image-to-video, and single-video reference flows
Supports aspectRatio (16:9, 9:16) and resolution (720P, 1080P); audio output is not supported by Veo today
Supported durations: 4, 6, or 8 seconds (other values snap to the nearest allowed value)

To use Google as the default video provider:

{
  agents: {
    defaults: {
      videoGenerationModel: {
        primary: "google/veo-3.1-fast-generate-preview",
      },
    },
  },
}

Music generation

The bundled google plugin also registers music generation through the shared music_generate tool.

Default music model: google/lyria-3-clip-preview
Also supports google/lyria-3-pro-preview
Prompt controls: lyrics and instrumental
Output format: mp3 by default, plus wav on google/lyria-3-pro-preview
Reference inputs: up to 10 images
Session-backed runs detach through the shared task/status flow, including action: "status"

To use Google as the default music provider:

{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}

Text-to-speech

The bundled google speech provider uses the Gemini API TTS path with gemini-3.1-flash-tts-preview.

Default voice: Kore
Auth: messages.tts.providers.google.apiKey, models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY
Output: WAV for regular TTS attachments, Opus for voice-note targets, PCM for Talk/telephony
Voice-note output: Google PCM is wrapped as WAV and transcoded to 48 kHz Opus with ffmpeg

Google’s batch Gemini TTS path returns generated audio in the completed generateContent response. For lowest-latency spoken conversations, use the Google realtime voice provider backed by the Gemini Live API instead of batch TTS.

To use Google as the default TTS provider:

{
  messages: {
    tts: {
      auto: "always",
      provider: "google",
      providers: {
        google: {
          model: "gemini-3.1-flash-tts-preview",
          speakerVoice: "Kore",
          audioProfile: "Speak professionally with a calm tone.",
        },
      },
    },
  },
}

Gemini API TTS uses natural-language prompting for style control. Set audioProfile to prepend a reusable style prompt before the spoken text. Set speakerName when your prompt text refers to a named speaker.

Gemini API TTS also accepts expressive square-bracket audio tags in the text, such as [whispers] or [laughs]. To keep tags out of the visible chat reply while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]] block:

Here is the clean reply text.

[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]

Realtime voice

The bundled google plugin registers a realtime voice provider backed by the Gemini Live API for backend audio bridges such as Voice Call and Google Meet.

Setting	Config path	Default
Model	`plugins.entries.voice-call.config.realtime.providers.google.model`	`gemini-2.5-flash-native-audio-preview-12-2025`
Voice	`...google.voice`	`Kore`
Temperature	`...google.temperature`	(unset)
VAD start sensitivity	`...google.startSensitivity`	(unset)
VAD end sensitivity	`...google.endSensitivity`	(unset)
Silence duration	`...google.silenceDurationMs`	(unset)
Activity handling	`...google.activityHandling`	Google default, `start-of-activity-interrupts`
Turn coverage	`...google.turnCoverage`	Google default, `only-activity`
Disable auto VAD	`...google.automaticActivityDetectionDisabled`	`false`
Session resumption	`...google.sessionResumption`	`true`
Context compression	`...google.contextWindowCompression`	`true`
API key	`...google.apiKey`	Falls back to `models.providers.google.apiKey`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY`

Example Voice Call realtime config:

{
  plugins: {
    entries: {
      "voice-call": {
        enabled: true,
        config: {
          realtime: {
            enabled: true,
            provider: "google",
            providers: {
              google: {
                model: "gemini-2.5-flash-native-audio-preview-12-2025",
                speakerVoice: "Kore",
                activityHandling: "start-of-activity-interrupts",
                turnCoverage: "only-activity",
              },
            },
          },
        },
      },
    },
  },
}

For maintainer live verification, run OPENAI_API_KEY=... GEMINI_API_KEY=... node --import tsx scripts/dev/realtime-talk-live-smoke.ts. The smoke also covers OpenAI backend/WebRTC paths; the Google leg mints the same constrained Live API token shape used by Control UI Talk, opens the browser WebSocket endpoint, sends the initial setup payload, and waits for setupComplete.

Advanced configuration

Direct Gemini cache reuse

For direct Gemini API runs (api: "google-generative-ai"), OpenClaw passes a configured cachedContent handle through to Gemini requests.

Configure per-model or global params with either cachedContent or legacy cached_content
If both are present, cachedContent wins
Example value: cachedContents/prebuilt-context
Gemini cache-hit usage is normalized into OpenClaw cacheRead from upstream cachedContentTokenCount

{
  agents: {
    defaults: {
      models: {
        "google/gemini-2.5-pro": {
          params: {
            cachedContent: "cachedContents/prebuilt-context",
          },
        },
      },
    },
  },
}

Gemini CLI usage notes

When using the google-gemini-cli OAuth provider, OpenClaw uses Gemini CLI stream-json output by default and normalizes usage from the final stats payload. Legacy --output-format json overrides still use the JSON parser.

Streamed reply text comes from assistant message events.
For legacy JSON output, reply text comes from the CLI JSON response field.
Usage falls back to stats when the CLI leaves usage empty.
stats.cached is normalized into OpenClaw cacheRead.
If stats.input is missing, OpenClaw derives input tokens from stats.input_tokens - stats.cached.

Environment and daemon setup

If the Gateway runs as a daemon (launchd/systemd), make sure GEMINI_API_KEY is available to that process (for example, in ~/.openclaw/.env or via env.shellEnv).

Model selection

Choosing providers, model refs, and failover behavior.

Image generation

Shared image tool parameters and provider selection.

Video generation

Shared video tool parameters and provider selection.

Music generation

Shared music tool parameters and provider selection.

Google (Gemini)

Getting started

Capabilities

Web search

Image generation

Video generation

Music generation

Text-to-speech

Realtime voice

Advanced configuration

Related