图像生成

image_generate 工具允许代理使用您配置的提供商创建和编辑图像。在聊天会话中，图像生成是异步运行的：OpenClaw 记录一个后台任务，立即返回任务 ID，并在提供商完成时唤醒代理。完成代理遵循会话的正常可见回复模式：配置后自动发送最终回复，或者在会话需要消息工具时发送 message(action="send")。如果请求者会话处于非活动状态或其活动唤醒失败，并且完成回复中仍缺少一些生成的图像，OpenClaw 将发送一个仅包含缺失图像的幂等直接回退。

快速开始

配置身份验证
为至少一个提供商设置 API 密钥（例如 OPENAI_API_KEY、 GEMINI_API_KEY、OPENROUTER_API_KEY）或使用 OpenAI Codex OAuth 登录。
选择默认模型（可选）
```
{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openai/gpt-image-2",
        timeoutMs: 180_000,
      },
    },
  },
}
```
ChatGPT/Codex OAuth 使用相同的 openai/gpt-image-2 模型引用。当配置了 openai OAuth 配置文件时，OpenClaw 会通过该 OAuth 配置文件路由图像请求，而不是首先尝试 OPENAI_API_KEY。显式的 models.providers.openai 配置（API 密钥、custom/Azure 基础 URL）则选择重新使用直接的 OpenAI 图像 API 路由。
询问代理
“生成一个友好的机器人吉祥物图像。”

代理会自动调用 image_generate。无需工具允许列表—— 当提供商可用时，它默认启用。该工具返回一个后台任务 ID，然后完成代理会在准备就绪时通过 message 工具发送生成的附件。

常用路由

目标	模型引用	身份验证
使用 OpenAI 计费的 API 图像生成	`openai/gpt-image-2`	`OPENAI_API_KEY`
通过 OpenAI 订阅身份验证进行 OpenAI 图像生成	`openai/gpt-image-2`	OpenAI ChatGPT/Codex OAuth
OpenAI 透明背景 PNG/WebP	`openai/gpt-image-1.5`	`OPENAI_API_KEY`OpenAIOAuth 或 OpenAI Codex OAuth
DeepInfra 图像生成	`deepinfra/black-forest-labs/FLUX-1-schnell`	`DEEPINFRA_API_KEY`
fal Krea 2 表现力/风格导向生成	`fal/krea/v2/medium/text-to-image`	`FAL_KEY`
OpenRouter 图像生成	`openrouter/google/gemini-3.1-flash-image-preview`	`OPENROUTER_API_KEY`
LiteLLM 图像生成	`litellm/gpt-image-2`	`LITELLM_API_KEY`
Google Gemini 图像生成	`google/gemini-3.1-flash-image-preview`	`GEMINI_API_KEY` 或 `GOOGLE_API_KEY`

同一个 image_generate 工具处理文本生成图像和参考图像编辑。对于单个参考使用 image，对于多个参考使用 images。对于 fal 上的 Krea 2 模型，这些参考将作为样式参考发送，而不是作为编辑输入。提供商支持的输出提示（如 quality、outputFormat 和 background）在可用时会被转发，并在提供商不支持时被报告为已忽略。捆绑的透明背景支持是 OpenAI 特有的；如果其他提供商的后端发出 PNG alpha 通道，它们可能仍会保留该通道。

支持的提供商

提供商	默认模型	编辑支持	身份验证
ComfyUI	`workflow`	是（1 张图片，由工作流配置）	云版本使用 `COMFY_API_KEY` 或 `COMFY_CLOUD_API_KEY`
DeepInfra	`black-forest-labs/FLUX-1-schnell`	是（1 张图片）	`DEEPINFRA_API_KEY`
fal	`fal-ai/flux/dev`	是（特定于模型的限制）	`FAL_KEY`
Google	`gemini-3.1-flash-image-preview`	是	`GEMINI_API_KEY` 或 `GOOGLE_API_KEY`
LiteLLM	`gpt-image-2`	是（最多 5 张输入图片）	`LITELLM_API_KEY`
MiniMax	`image-01`	是（主体参考）	`MINIMAX_API_KEY` 或 MiniMax OAuth（`minimax-portal`）
OpenAI	`gpt-image-2`	是（最多 4 张图片）	`OPENAI_API_KEY` 或 OpenAI ChatGPT/Codex OAuth
OpenRouter	`google/gemini-3.1-flash-image-preview`	是（最多 5 张输入图片）	`OPENROUTER_API_KEY`
Vydra	`grok-imagine`	否	`VYDRA_API_KEY`
xAI	`grok-imagine-image`	是（最多 5 张图片）	`XAI_API_KEY`

使用 action: "list" 在运行时检查可用的提供商和模型：

/tool image_generate action=list

使用 action: "status" 检查当前会话的活动图像生成任务：

/tool image_generate action=status

提供商功能

功能	ComfyUI	DeepInfra	fal	Google	MiniMax	OpenAI	Vydra	xAI
生成（最大数量）	工作流定义	4	4	4	9	4	1	4
编辑 / 参考	1 张图片（工作流）	1 张图片	Flux: 1; GPT: 10; Krea style refs: 10; NB2: 14	最多 5 张图片	1 张图片（主体参考）	最多 5 张图片	-	最多 5 张图片
尺寸控制	-	✓	✓	✓	-	最高 4K	-	-
宽高比	-	-	✓	✓	✓	-	-	✓
分辨率（1K/2K/4K）	-	-	✓	✓	-	-	-	1K, 2K

工具参数

图像生成提示词。`action: "generate"` 必需。使用 `"status"` 检查活跃会话任务，或使用 `"list"` 在运行时检查可用提供商和模型。提供商/模型覆盖（例如 `openai/gpt-image-2`）。使用 `openai/gpt-image-1.5`OpenAI 实现透明 OpenAI 背景。编辑模式下的单个参考图像路径或 URL。编辑模式或风格参考模型的多个参考图像（通过共享工具最多 10 张；提供商特定限制仍然适用）。尺寸提示：`1024x1024`、`1536x1024`、`1024x1536`、`2048x2048`、`3840x2160`。纵横比：`1:1`、`2:3`、`3:2`、`2.35:1`、`3:4`、`4:3`、`4:5`、 `5:4`、`9:16`、`16:9`、`21:9`、`4:1`、`1:4`、`8:1`、`1:8`。提供商会验证其特定模型的子集。分辨率提示。提供商支持时的质量提示。提供商支持时的输出格式提示。提供商支持时的背景提示。对于支持透明度的提供商，请将 `transparent` 与 `outputFormat: "png"` 或 `"webp"` 结合使用。要生成的图像数量 (1-4)。可选的提供商请求超时时间（毫秒）。当 Codex 通过动态工具调用 `image_generate` 时，此每次调用的值仍会覆盖配置的默认值，并上限为 600000 毫秒。输出文件名提示。 OpenAI 专用提示：`background`、`moderation`、`outputCompression` 和 `user`。 fal Krea 2 创造力控制。默认为 `medium`。

配置

模型选择

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openai/gpt-image-2",
        timeoutMs: 180_000,
        fallbacks: ["openrouter/google/gemini-3.1-flash-image-preview", "google/gemini-3.1-flash-image-preview", "fal/fal-ai/flux/dev"],
      },
    },
  },
}

提供商选择顺序

OpenClaw 按以下顺序尝试提供商：

model 参数，来自工具调用（如果代理指定了一个）。
imageGenerationModel.primary，来自配置。
imageGenerationModel.fallbacks，按顺序。
自动检测 - 仅限身份验证支持的提供商默认值：
- 首先是当前的默认提供商；
- 其余已注册的图像生成提供商，按提供商 ID 顺序。

如果提供商失败（身份验证错误、速率限制等），系统会自动尝试下一个配置的候选者。如果全部失败，错误信息将包含每次尝试的详细信息。

每次调用模型覆盖是精确的

每次调用 model 覆盖仅尝试该提供商/模型，且不会继续尝试配置的主/备用或自动检测的提供商。

自动检测具有身份验证感知能力

只有当 OpenClaw 能够实际对该提供商进行身份验证时，提供商默认值才会进入候选列表。设置 agents.defaults.mediaGenerationAutoProviderFallback: false 以仅使用显式的 model、primary 和 fallbacks 条目。

Timeouts

为慢速图像后端设置 agents.defaults.imageGenerationModel.timeoutMs。每次调用的 timeoutMs 工具参数会覆盖配置的默认值，而配置的默认值会覆盖插件编写的提供商默认值。Google 和 OpenRouter 托管的图像提供商使用 180 秒的默认值；xAI 和 Azure OpenAI 图像生成使用 600 秒。Codex dynamic-工具调用使用 120 秒的 image_generate 桥接默认值，并在配置时遵守相同的超时预算，受 OpenClaw 的 600000 ms dynamic-工具桥接最大值限制。

Inspect at runtime

使用 action: "list" 检查当前注册的提供商、其默认模型以及身份验证环境变量提示。

图像编辑

OpenAI、OpenRouter、Google、DeepInfra、fal、MiniMax、ComfyUI 和 xAI 支持编辑参考图像。fal 上的 Krea 2 模型使用相同的 image / images 字段作为样式参考，而不是编辑输入。传递参考图像路径或 URL：

"Generate a watercolor version of this photo" + image: "/path/to/photo.jpg"

OpenAI、OpenRouter、Google 和 xAI 通过 images 参数支持最多 5 张参考图像。fal 支持 1 张 Flux 图生图的参考图像，GPT Image 2 编辑最多 10 张，Krea 2 最多 10 个样式参考，Nano Banana 2 编辑最多 14 张。MiniMax 和 ComfyUI 支持 1 张。

提供商深入解析

OpenAIOpenAI gpt-image-2（和 gpt-image-1.5）

OpenAI 图像生成默认为 openai/gpt-image-2。如果配置了 openaiOAuthOpenClawOAuth OAuth 配置文件，OpenClaw 将重用 Codex 订阅聊天模型所使用的相同 OAuth 配置文件，并通过 Codex Responses 后端发送图像请求。传统的 Codex 基础 URL（例如 https://chatgpt.com/backend-api）会被规范化为 https://chatgpt.com/backend-api/codexOpenClaw 用于图像请求。对于该请求，OpenClaw 不会自动降级使用 OPENAI_API_KEYOpenAIAPI —— 若要强制使用直接 OpenAI Images API 路由，请使用 API 密钥、自定义基础 URL 或 Azure 端点显式配置 models.providers.openaiAPI。

仍然可以显式选择 openai/gpt-image-1.5、openai/gpt-image-1 和 openai/gpt-image-1-mini 模型。使用 gpt-image-1.5 获取透明背景 PNG/WebP 输出；当前的 gpt-image-2API API 会拒绝 background: "transparent"。

gpt-image-2 支持通过同一个 image_generateOpenClaw 工具进行文生图像生成和参考图像编辑。 OpenClaw 会将 prompt、count、size、quality、outputFormatOpenAIOpenAI 和参考图像转发给 OpenAI。OpenAI 不会直接接收 aspectRatio 或 resolutionOpenClaw；如果可能，OpenClaw 会将其映射为支持的 sizeOpenAI，否则该工具会将其报告为已忽略的覆盖参数。

OpenAI 特定的选项位于 openai 对象下：

{
  "quality": "low",
  "outputFormat": "jpeg",
  "openai": {
    "background": "opaque",
    "moderation": "low",
    "outputCompression": 60,
    "user": "end-user-42"
  }
}

openai.background 接受 transparent、opaque 或 auto；透明输出需要 outputFormat png 或 webpOpenAIOpenClaw 以及一个支持透明度的 OpenAI 图像模型。OpenClaw 会将默认的 gpt-image-2 透明背景请求路由到 gpt-image-1.5。 openai.outputCompression 适用于 JPEG/WebP 输出，对于 PNG 输出则被忽略。

顶层 backgroundOpenAI 提示词是提供商中立的，当前在选择 OpenAI 提供商时会映射到同一个 OpenAI backgroundOpenAI 请求字段。不声明支持背景的提供商会将其在 ignoredOverridesOpenAIOpenAI 中返回，而不是接收不支持的参数。

若要将 OpenAI 图像生成路由到 Azure OpenAI 部署而不是 api.openai.comOpenAI，请参阅 Azure OpenAI endpoints。

OpenRouterOpenRouter 图像模型

OpenRouter 图像生成使用相同的 OPENROUTER_API_KEYOpenRouterAPIOpenRouter 并通过 OpenRouter 的聊天补全图像 API 路由。使用 openrouter/ 前缀选择 OpenRouter 图像模型：

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "openrouter/google/gemini-3.1-flash-image-preview",
      },
    },
  },
}
```OpenClaw

OpenClaw 将 `prompt`、`count`、参考图像和
Gemini 兼容的 `aspectRatio` / `resolution`OpenRouterOpenRouter 提示转发给 OpenRouter。
当前的内置 OpenRouter 图像模型快捷方式包括
`google/gemini-3.1-flash-image-preview`、
`google/gemini-3-pro-image-preview` 和 `openai/gpt-5.4-image-2`。使用
`action: "list"` 查看您配置的插件暴露了哪些内容。

fal Krea 2

fal 上的 Krea 2 模型使用 fal 原生的 Krea 架构，而不是 Flux 使用的通用 image_sizeOpenClaw 架构。OpenClaw 发送：

aspect_ratio 用于宽高比提示
creativity，默认为 medium
当提供 image 或 images 时发送 image_style_references

选择 Krea 2 Medium 以获得更快的富有表现力的插图，选择 Krea 2 Large 以获得较慢、更详细的写实和纹理外观：

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "fal/krea/v2/medium/text-to-image",
      },
    },
  },
}

Krea 2 目前每次请求返回一张图像。对于 Krea，首选 aspectRatioOpenClaw；OpenClaw 将 size 映射到最接近支持的 Krea 宽高比，并且对于 Krea 会拒绝 resolution 而不是丢弃它。当您想要原生的 Krea 创意级别时，使用 fal.creativity：

{
  "model": "fal/krea/v2/medium/text-to-image",
  "prompt": "A cyber zine portrait with risograph texture",
  "aspectRatio": "9:16",
  "fal": {
    "creativity": "high"
  }
}

MiniMaxMiniMax 双重认证

MiniMax 图像生成可通过内置的 MiniMax 认证路径使用：

minimax/image-01API 用于 API 密钥设置
minimax-portal/image-01OAuth 用于 OAuth 设置

xAI grok-imagine-image

内置的 xAI 提供商对仅提示词请求使用 /v1/images/generations，当存在 image 或 images 时则使用 /v1/images/edits。

模型：xai/grok-imagine-image、xai/grok-imagine-image-quality
数量：最多 4 个
参考：一个 image 或最多五个 images
宽高比：1:1、16:9、9:16、4:3、3:4、2:3、3:2
分辨率：1K、2KOpenClawOpenClaw
输出：作为 OpenClaw 管理的图像附件返回

OpenClaw 故意不公开 xAI 原生的 quality、mask、 user 或额外的仅原生宽高比，直到这些控件存在于共享的跨提供商 image_generate 契约中。

示例

/tool image_generate action=generate model=openai/gpt-image-2 prompt="A clean editorial poster for OpenClaw image generation" size=3840x2160 count=1

/tool image_generate action=generate model=openai/gpt-image-1.5 prompt="A simple red circle sticker on a transparent background" outputFormat=png background=transparent
```CLI

等效 CLI：

```bash
openclaw infer image generate \
  --model openai/gpt-image-1.5 \
  --output-format png \
  --background transparent \
  --prompt "A simple red circle sticker on a transparent background" \
  --json

/tool image_generate action=generate model=openai/gpt-image-2 prompt="Two visual directions for a calm productivity app icon" size=1024x1024 count=2

/tool image_generate action=generate model=openai/gpt-image-2 prompt="Keep the subject, replace the background with a bright studio setup" image=/path/to/reference.png size=1024x1536

/tool image_generate action=generate model=openai/gpt-image-2 prompt="Combine the character identity from the first image with the color palette from the second" images='["/path/to/character.png","/path/to/palette.jpg"]' size=1536x1024

/tool image_generate action=generate model=fal/krea/v2/medium/text-to-image prompt="An expressive editorial portrait using this color palette and print texture" images='["/path/to/palette.png","/path/to/texture.jpg"]' aspectRatio=9:16 fal='{"creativity":"high"}'

openclaw infer image edit 上同样可以使用 --output-format 和 --background 标志；--openai-background 仍然是 OpenAI 专用的别名。除 OpenAI 之外的打包提供商目前不声明显式的后台控制，因此对它们报告 background: "transparent" 为已忽略。