Inferrs

inferrs 可透過 OpenAI 相容的 /v1 API 提供本地模型。OpenClaw 可透過通用 openai-completions 路徑與 inferrs 搭配使用。

屬性	值
供應商 ID	`inferrs` (自訂；在 `models.providers.inferrs` 下設定)
外掛程式	無 — `inferrs` 不是內建的 OpenClaw 提供者外掛程式
Auth 環境變數	選用。如果您的 inferrs 伺服器沒有驗證，則任何值皆可
API	OpenAI 相容 (`openai-completions`)
建議的基礎 URL	`http://127.0.0.1:8080/v1` (或您的 inferrs 伺服器所在的任何位置)

開始使用

啟動包含模型的 inferrs
bash inferrs serve google/gemma-4-E2B-it \ --host 127.0.0.1 \ --port 8080 \ --device metal
驗證伺服器是否可連線
bash curl http://127.0.0.1:8080/health curl http://127.0.0.1:8080/v1/models
新增 OpenClaw 提供者項目
新增明確的提供者項目，並將您的預設模型指向該項目。請參閱下方的完整設定範例。

完整設定範例

此範例在本地 inferrs 伺服器上使用 Gemma 4。

{
  agents: {
    defaults: {
      model: { primary: "inferrs/google/gemma-4-E2B-it" },
      models: {
        "inferrs/google/gemma-4-E2B-it": {
          alias: "Gemma 4 (inferrs)",
        },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

隨選啟動

Inferrs 也可以僅在選取 inferrs/... 模型時，由 OpenClaw 啟動。將 localService 新增至相同的提供者項目：

{
  models: {
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        timeoutSeconds: 300,
        localService: {
          command: "/opt/homebrew/bin/inferrs",
          args: ["serve", "google/gemma-4-E2B-it", "--host", "127.0.0.1", "--port", "8080", "--device", "metal"],
          healthUrl: "http://127.0.0.1:8080/v1/models",
          readyTimeoutMs: 180000,
          idleStopMs: 0,
        },
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

command 必須是絕對路徑。在 Gateway 主機上使用 which inferrs 並將該路徑放入設定中。若要查看完整的欄位參考，請參閱 Local model services。

進階設定

為什麼 requiresStringContent 很重要

某些 inferrs 聊天完成路由僅接受字串 messages[].content，而不接受結構化的內容部分陣列。

compat: {
  requiresStringContent: true
}

OpenClaw 會在發送請求之前，將純文字內容部分扁平化為純字串。

Gemma 與 tool-schema 注意事項

某些目前的 inferrs + Gemma 組合接受小型直接的 /v1/chat/completions 請求，但在完整的 OpenClaw agent-runtime 回合中仍然會失敗。

如果發生這種情況，請先嘗試此方法：

compat: {
  requiresStringContent: true,
  supportsTools: false
}

這會停用該模型的 OpenClaw 工具架構表面，並能減少對嚴格本機後端的提示詞壓力。

如果小型直接請求仍然有效，但正常的 OpenClaw agent 回合繼續在 inferrs 內部崩潰，則剩餘問題通常是上游模型/伺服器的行為，而不是 OpenClaw 的傳輸層。

手動冒煙測試

設定完成後，請測試這兩層：

curl http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'

openclaw infer model run \
  --model inferrs/google/gemma-4-E2B-it \
  --prompt "What is 2 + 2? Reply with one short sentence." \
  --json

如果第一個指令有效但第二個失敗，請檢查下方的疑難排解章節。

Proxy 樣式行為

inferrs 被視為一個 Proxy 樣式的 OpenAI 相容 /v1 後端，而非原生的 OpenAI 端點。

原生僅限 OpenAI 的請求塑形在此不適用
沒有 service_tier，沒有 Responses store，沒有 prompt-cache 提示，也沒有 OpenAI reasoning-compat payload塑形
隱藏的 OpenClaw 歸因標頭 (originator, version, User-Agent) 不會注入到自訂的 inferrs 基礎 URL 上

疑難排解

curl /v1/models fails

inferrs 未運行、無法連接，或未綁定至預期的主機/埠位。請確保伺服器已啟動，並監聽您設定的位址。

messages[].content expected a string

在模型條目中設定 compat.requiresStringContent: true。詳情請參閱上方的 requiresStringContent 區段。

Direct /v1/chat/completions calls pass but openclaw infer model run fails

請嘗試設定 compat.supportsTools: false 以停用工具架構介面。請參閱上方的 Gemma 工具架構說明。

inferrs still crashes on larger agent turns

如果 OpenClaw 不再收到架構錯誤，但 inferrs 在較大的 Agent 輪次中仍然當機，請將其視為上游 inferrs 或模型的限制。請減少提示詞壓力，或切換至不同的本地後端或模型。