Inferrs

inferrs 可以在 OpenAI 兼容的 /v1 API 后端提供本地模型服务。OpenClaw 通过通用 openai-completions 路径与 inferrs 配合使用。

属性	值
提供商 ID	`inferrs` （自定义；在 `models.providers.inferrs` 下配置）
插件	无 — `inferrs`OpenClaw 不是内置的 OpenClaw 提供商插件
认证环境变量	可选。如果您的 inferrs 服务器没有认证，可以使用任意值
API	OpenAI 兼容 (OpenAI`openai-completions`)
建议的基础 URL	`http://127.0.0.1:8080/v1` (或您的 inferrs 服务器所在的任何位置)

入门指南

启动带有模型的 inferrs
bash inferrs serve google/gemma-4-E2B-it \ --host 127.0.0.1 \ --port 8080 \ --device metal
验证服务器是否可达
bash curl http://127.0.0.1:8080/health curl http://127.0.0.1:8080/v1/models
OpenClaw添加 OpenClaw 提供商条目
添加一个显式的提供商条目，并将您的默认模型指向它。请参阅下面的完整配置示例。

完整配置示例

此示例在本地 inferrs 服务器上使用 Gemma 4。

{
  agents: {
    defaults: {
      model: { primary: "inferrs/google/gemma-4-E2B-it" },
      models: {
        "inferrs/google/gemma-4-E2B-it": {
          alias: "Gemma 4 (inferrs)",
        },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

按需启动

仅当选中了 inferrs/... 模型时，Inferrs 也可以由 OpenClaw 启动。将 localService 添加到同一个提供商条目中：

{
  models: {
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        timeoutSeconds: 300,
        localService: {
          command: "/opt/homebrew/bin/inferrs",
          args: ["serve", "google/gemma-4-E2B-it", "--host", "127.0.0.1", "--port", "8080", "--device", "metal"],
          healthUrl: "http://127.0.0.1:8080/v1/models",
          readyTimeoutMs: 180000,
          idleStopMs: 0,
        },
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

command 必须是绝对路径。在 Gateway(网关) 主机上使用 which inferrs 并将该路径放入配置中。有关完整的字段参考，请参阅 Local 模型 services。

高级配置

Why requiresStringContent matters

某些 inferrs 聊天补全路由仅接受字符串 messages[].content，不接受结构化的内容部分数组。

compat: {
  requiresStringContent: true
}

OpenClaw 将在发送请求之前把纯文本内容部分扁平化为普通字符串。

Gemma and 工具-schema caveat

某些当前的 inferrs + Gemma 组合接受小型直接的 /v1/chat/completions 请求，但在完整的 OpenClaw 代理运行时轮次中仍然失败。

如果发生这种情况，请先尝试此操作：

compat: {
  requiresStringContent: true,
  supportsTools: false
}

这将为模型禁用 OpenClaw 的工具架构表面，并可以减少对严格的本地后端的提示压力。

如果微小的直接请求仍然有效，但正常的 OpenClaw 代理轮次继续在 inferrs 内部崩溃，那么剩余的问题通常来自上游模型/服务器行为，而不是 OpenClaw 的传输层。

Manual smoke test

配置完成后，测试这两个层：

curl http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'

openclaw infer model run \
  --model inferrs/google/gemma-4-E2B-it \
  --prompt "What is 2 + 2? Reply with one short sentence." \
  --json

如果第一条命令有效但第二条失败，请检查下面的故障排除部分。

Proxy-style behavior

inferrsOpenAI 被视为一个代理风格的 OpenAI 兼容 /v1OpenAIOpenAI 后端，而不是原生的 OpenAI 端点。

原生 OpenAI 专用的请求整形在此处不适用
没有 service_tier，没有响应 storeOpenAIOpenClaw，没有提示缓存提示，也没有 OpenAI 推理兼容负载整形
隐藏的 OpenClaw 归因标头（originator，version，User-Agent）不会在自定义 inferrs 基础 URL 上注入

故障排除

curl /v1/models fails

inferrs 未运行，无法访问，或未绑定到预期的主机/端口。请确保服务器已启动并在您配置的地址上监听。

messages[].content expected a string

在模型条目中设置 compat.requiresStringContent: true。有关详细信息，请参阅上面的 requiresStringContent 部分。

Direct /v1/chat/completions calls pass but openclaw infer 模型 run fails

尝试设置 compat.supportsTools: false 以禁用工具架构表面。请参阅上面的 Gemma 工具架构说明。

inferrs still crashes on larger agent turns

如果 OpenClaw 不再出现架构错误，但 inferrs 在较大的 Agent 轮次中仍然崩溃，请将其视为上游 inferrs 或模型的限制。请减少提示压力或切换到不同的本地后端或模型。