Skip to content

Image generation

The image_generate tool lets the agent create and edit images using your configured providers. In chat sessions, image generation runs asynchronously: OpenClaw records a background task, returns the task id immediately, and wakes the agent when the provider finishes. The completion agent must send generated images through the message tool. If the requester session is inactive and some generated images are still missing from message-tool delivery, OpenClaw sends an idempotent direct fallback with only the missing images.

  1. Configure auth

    Set an API key for at least one provider (for example OPENAI_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY) or sign in with OpenAI Codex OAuth.

  2. Pick a default model (optional)

    {
    agents: {
    defaults: {
    imageGenerationModel: {
    primary: "openai/gpt-image-2",
    timeoutMs: 180_000,
    },
    },
    },
    }

    Codex OAuth uses the same openai/gpt-image-2 model ref. When an openai-codex OAuth profile is configured, OpenClaw routes image requests through that OAuth profile instead of first trying OPENAI_API_KEY. Explicit models.providers.openai config (API key, custom/Azure base URL) opts back into the direct OpenAI Images API route.

  3. Ask the agent

    “Generate an image of a friendly robot mascot.”

    The agent calls image_generate automatically. No tool allow-listing needed - it is enabled by default when a provider is available. The tool returns a background task id, then the completion agent sends the generated attachment through the message tool when it is ready.

GoalModel refAuth
OpenAI image generation with API billingopenai/gpt-image-2OPENAI_API_KEY
OpenAI image generation with Codex subscription authopenai/gpt-image-2OpenAI Codex OAuth
OpenAI transparent-background PNG/WebPopenai/gpt-image-1.5OPENAI_API_KEY or OpenAI Codex OAuth
DeepInfra image generationdeepinfra/black-forest-labs/FLUX-1-schnellDEEPINFRA_API_KEY
OpenRouter image generationopenrouter/google/gemini-3.1-flash-image-previewOPENROUTER_API_KEY
LiteLLM image generationlitellm/gpt-image-2LITELLM_API_KEY
Google Gemini image generationgoogle/gemini-3.1-flash-image-previewGEMINI_API_KEY or GOOGLE_API_KEY

The same image_generate tool handles text-to-image and reference-image editing. Use image for one reference or images for multiple references. Provider-supported output hints such as quality, outputFormat, and background are forwarded when available and reported as ignored when a provider does not support them. Bundled transparent-background support is OpenAI-specific; other providers may still preserve PNG alpha if their backend emits it.

ProviderDefault modelEdit supportAuth
ComfyUIworkflowYes (1 image, workflow-configured)COMFY_API_KEY or COMFY_CLOUD_API_KEY for cloud
DeepInfrablack-forest-labs/FLUX-1-schnellYes (1 image)DEEPINFRA_API_KEY
falfal-ai/flux/devYes (model-specific limits)FAL_KEY
Googlegemini-3.1-flash-image-previewYesGEMINI_API_KEY or GOOGLE_API_KEY
LiteLLMgpt-image-2Yes (up to 5 input images)LITELLM_API_KEY
MiniMaximage-01Yes (subject reference)MINIMAX_API_KEY or MiniMax OAuth (minimax-portal)
OpenAIgpt-image-2Yes (up to 4 images)OPENAI_API_KEY or OpenAI Codex OAuth
OpenRoutergoogle/gemini-3.1-flash-image-previewYes (up to 5 input images)OPENROUTER_API_KEY
Vydragrok-imagineNoVYDRA_API_KEY
xAIgrok-imagine-imageYes (up to 5 images)XAI_API_KEY

Use action: "list" to inspect available providers and models at runtime:

/tool image_generate action=list

Use action: "status" to inspect the active image-generation task for the current session:

/tool image_generate action=status
CapabilityComfyUIDeepInfrafalGoogleMiniMaxOpenAIVydraxAI
Generate (max count)Workflow-defined4449414
Edit / reference1 image (workflow)1 imageFlux: 1; GPT: 10; NB2: 14Up to 5 images1 image (subject ref)Up to 5 images-Up to 5 images
Size control--Up to 4K--
Aspect ratio----
Resolution (1K/2K/4K)-----1K, 2K
Image generation prompt. Required for `action: "generate"`. Use `"status"` to inspect the active session task or `"list"` to inspect available providers and models at runtime. Provider/model override (e.g. `openai/gpt-image-2`). Use `openai/gpt-image-1.5` for transparent OpenAI backgrounds. Single reference image path or URL for edit mode. Multiple reference images for edit mode (up to 5 on supporting providers). Size hint: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `3840x2160`. Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. Resolution hint. Quality hint when the provider supports it. Output format hint when the provider supports it. Background hint when the provider supports it. Use `transparent` with `outputFormat: "png"` or `"webp"` for transparency-capable providers. Number of images to generate (1-4). Optional provider request timeout in milliseconds. When Codex calls `image_generate` through dynamic tools, this per-call value still overrides the configured default and is capped at 600000 ms. Output filename hint. OpenAI-only hints: `background`, `moderation`, `outputCompression`, and `user`.
{
agents: {
defaults: {
imageGenerationModel: {
primary: "openai/gpt-image-2",
timeoutMs: 180_000,
fallbacks: [
"openrouter/google/gemini-3.1-flash-image-preview",
"google/gemini-3.1-flash-image-preview",
"fal/fal-ai/flux/dev",
],
},
},
},
}

OpenClaw tries providers in this order:

  1. model parameter from the tool call (if the agent specifies one).
  2. imageGenerationModel.primary from config.
  3. imageGenerationModel.fallbacks in order.
  4. Auto-detection - auth-backed provider defaults only:
    • current default provider first;
    • remaining registered image-generation providers in provider-id order.

If a provider fails (auth error, rate limit, etc.), the next configured candidate is tried automatically. If all fail, the error includes details from each attempt.

Per-call model overrides are exact

A per-call model override tries only that provider/model and does not continue to configured primary/fallback or auto-detected providers.

Auto-detection is auth-aware

A provider default only enters the candidate list when OpenClaw can actually authenticate that provider. Set agents.defaults.mediaGenerationAutoProviderFallback: false to use only explicit model, primary, and fallbacks entries.

Timeouts

Set agents.defaults.imageGenerationModel.timeoutMs for slow image backends. A per-call timeoutMs tool parameter overrides the configured default, and configured defaults override plugin-authored provider defaults. Google and OpenRouter hosted image providers use 180 second defaults; xAI and Azure OpenAI image generation use 600 seconds. Codex dynamic-tool calls use a 120 second image_generate bridge default and honor the same timeout budget when configured, bounded by OpenClaw’s 600000 ms dynamic-tool bridge maximum.

Inspect at runtime

Use action: "list" to inspect the currently registered providers, their default models, and auth env-var hints.

OpenAI, OpenRouter, Google, DeepInfra, fal, MiniMax, ComfyUI, and xAI support editing reference images. Pass a reference image path or URL:

"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"

OpenAI, OpenRouter, Google, and xAI support up to 5 reference images via the images parameter. fal supports 1 reference image for Flux image-to-image, up to 10 for GPT Image 2 edits, and up to 14 for Nano Banana 2 edits. MiniMax and ComfyUI support 1.

OpenAI gpt-image-2 (and gpt-image-1.5)

OpenAI image generation defaults to openai/gpt-image-2. If an openai-codex OAuth profile is configured, OpenClaw reuses the same OAuth profile used by Codex subscription chat models and sends the image request through the Codex Responses backend. Legacy Codex base URLs such as https://chatgpt.com/backend-api are canonicalized to https://chatgpt.com/backend-api/codex for image requests. OpenClaw does not silently fall back to OPENAI_API_KEY for that request - to force direct OpenAI Images API routing, configure models.providers.openai explicitly with an API key, custom base URL, or Azure endpoint.

The openai/gpt-image-1.5, openai/gpt-image-1, and openai/gpt-image-1-mini models can still be selected explicitly. Use gpt-image-1.5 for transparent-background PNG/WebP output; the current gpt-image-2 API rejects background: "transparent".

gpt-image-2 supports both text-to-image generation and reference-image editing through the same image_generate tool. OpenClaw forwards prompt, count, size, quality, outputFormat, and reference images to OpenAI. OpenAI does not receive aspectRatio or resolution directly; when possible OpenClaw maps those into a supported size, otherwise the tool reports them as ignored overrides.

OpenAI-specific options live under the openai object:

{
"quality": "low",
"outputFormat": "jpeg",
"openai": {
"background": "opaque",
"moderation": "low",
"outputCompression": 60,
"user": "end-user-42"
}
}

openai.background accepts transparent, opaque, or auto; transparent outputs require outputFormat png or webp and a transparency-capable OpenAI image model. OpenClaw routes default gpt-image-2 transparent-background requests to gpt-image-1.5. openai.outputCompression applies to JPEG/WebP outputs and is ignored for PNG outputs.

The top-level background hint is provider-neutral and currently maps to the same OpenAI background request field when the OpenAI provider is selected. Providers that do not declare background support return it in ignoredOverrides instead of receiving the unsupported parameter.

To route OpenAI image generation through an Azure OpenAI deployment instead of api.openai.com, see Azure OpenAI endpoints.

OpenRouter image models

OpenRouter image generation uses the same OPENROUTER_API_KEY and routes through OpenRouter’s chat completions image API. Select OpenRouter image models with the openrouter/ prefix:

{
agents: {
defaults: {
imageGenerationModel: {
primary: "openrouter/google/gemini-3.1-flash-image-preview",
},
},
},
}

OpenClaw forwards prompt, count, reference images, and Gemini-compatible aspectRatio / resolution hints to OpenRouter. Current built-in OpenRouter image model shortcuts include google/gemini-3.1-flash-image-preview, google/gemini-3-pro-image-preview, and openai/gpt-5.4-image-2. Use action: "list" to see what your configured plugin exposes.

MiniMax dual-auth

MiniMax image generation is available through both bundled MiniMax auth paths:

  • minimax/image-01 for API-key setups
  • minimax-portal/image-01 for OAuth setups
xAI grok-imagine-image

The bundled xAI provider uses /v1/images/generations for prompt-only requests and /v1/images/edits when image or images is present.

  • Models: xai/grok-imagine-image, xai/grok-imagine-image-quality
  • Count: up to 4
  • References: one image or up to five images
  • Aspect ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2
  • Resolutions: 1K, 2K
  • Outputs: returned as OpenClaw-managed image attachments

OpenClaw intentionally does not expose xAI-native quality, mask, user, or extra native-only aspect ratios until those controls exist in the shared cross-provider image_generate contract.

/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1

The same --output-format and --background flags are available on openclaw infer image edit; --openai-background remains as an OpenAI-specific alias. Bundled providers other than OpenAI do not declare explicit background control today, so background: "transparent" is reported as ignored for them.

  • Tools overview - all available agent tools
  • ComfyUI - local ComfyUI and Comfy Cloud workflow setup
  • fal - fal image and video provider setup
  • Google (Gemini) - Gemini image provider setup
  • MiniMax - MiniMax image provider setup
  • OpenAI - OpenAI Images provider setup
  • Vydra - Vydra image, video, and speech setup
  • xAI - Grok image, video, search, code execution, and TTS setup
  • Configuration reference - imageGenerationModel config
  • Models - model configuration and failover