Skip to content

Prometheus 指標

OpenClaw 可以透過官方 diagnostics-prometheus 外掛程式公開診斷指標。它會監聽受信任的內部診斷,並在以下位置呈現 Prometheus 文字端點:

GET /api/diagnostics/prometheus

內容類型為 text/plain; version=0.0.4; charset=utf-8,即標準的 Prometheus 格式。

有關追蹤、日誌、OTLP 推送和 OpenTelemetry GenAI 語意屬性,請參閱 OpenTelemetry 匯出

  1. 安裝外掛程式

    Terminal window
    openclaw plugins install clawhub:@openclaw/diagnostics-prometheus
  2. 啟用外掛程式

    {
    plugins: {
    allow: ["diagnostics-prometheus"],
    entries: {
    "diagnostics-prometheus": { enabled: true },
    },
    },
    diagnostics: {
    enabled: true,
    },
    }
  3. 重新啟動 Gateway

    HTTP 路由是在外掛程式啟動時註冊的,因此請在啟用後重新載入。

  4. 抓取受保護的路由

    發送您的 operator 用戶端使用的相同 gateway 驗證:

    Terminal window
    curl -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
    http://127.0.0.1:18789/api/diagnostics/prometheus
  5. 連接 Prometheus

    prometheus.yml
    scrape_configs:
    - job_name: openclaw
    scrape_interval: 30s
    metrics_path: /api/diagnostics/prometheus
    authorization:
    credentials_file: /etc/prometheus/openclaw-gateway-token
    static_configs:
    - targets: ["openclaw-gateway:18789"]

指標類型標籤
openclaw_run_completed_totalcounterchannelmodeloutcomeprovidertrigger
openclaw_run_duration_secondshistogramchannel, model, outcome, provider, trigger
openclaw_model_call_totalcounterapi, error_category, model, outcome, provider, transport
openclaw_model_call_duration_seconds直方圖api, error_category, model, outcome, provider, transport
openclaw_model_tokens_total計數器agent, channel, model, provider, token_type
openclaw_gen_ai_client_token_usage直方圖model, provider, token_type
openclaw_model_cost_usd_total計數器agent, channel, model, provider
openclaw_tool_execution_total計數器error_category, outcome, params_kind, tool
openclaw_tool_execution_duration_seconds直方圖error_category, outcome, params_kind, tool
openclaw_harness_run_total計數器channel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_harness_run_duration_seconds直方圖channel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_message_processed_total計數器channel, outcome, reason
openclaw_message_processed_duration_seconds直方圖channel, outcome, reason
openclaw_message_delivery_started_total計數器channel, delivery_kind
openclaw_message_delivery_totalcounterchannel, delivery_kind, error_category, outcome
openclaw_message_delivery_duration_secondshistogramchannel, delivery_kind, error_category, outcome
openclaw_talk_event_totalcounterbrain, event_type, mode, provider, transport
openclaw_talk_event_duration_secondshistogrambrain, event_type, mode, provider, transport
openclaw_talk_audio_byteshistogrambrain, event_type, mode, provider, transport
openclaw_queue_lane_size儀表lane
openclaw_queue_lane_wait_seconds直方圖lane
openclaw_session_state_total計數器reason, state
openclaw_session_queue_depthgaugestate
openclaw_session_recovery_total計數器action, active_work_kind, state, status
openclaw_session_recovery_age_secondshistogramaction, active_work_kind, state, status
openclaw_memory_bytesgaugekind
openclaw_memory_rss_bytes直方圖
openclaw_memory_pressure_total計數器level, reason
openclaw_telemetry_exporter_total計數器exporter, reason, signal, status
openclaw_prometheus_series_dropped_total計數器
有界的、低基數標籤

Prometheus 標籤保持有界且基數較低。匯出器不會發出原始診斷標識符,例如 runIdsessionKeysessionIdcallIdtoolCallId、訊息 ID、聊天 ID 或提供者請求 ID。

標籤值會被編輯,且必須符合 OpenClaw 的低基數字元原則。不符合原則的值將根據指標類型替換為 unknownothernone

序列上限與溢出計數

匯出器將記憶體中保留的時間序列上限設定為 2048 個序列,合併計算計數器、儀表和直方圖。超出此上限的新序列將被丟棄,並且 openclaw_prometheus_series_dropped_total 每次遞增一。

請監控此計數器,作為上游屬性正在洩漏高基數值的明確信號。匯出器絕不會自動解除上限;如果數值上升,請修復來源而不是停用上限。

Prometheus 輸出中從不出現的內容
  • 提示文字、回應文字、工具輸入、工具輸出、系統提示
  • 對話紀錄、音訊載荷、通話 ID、房間 ID、交接權杖、輪次 ID 和原始工作階段 ID
  • 原始提供者請求 ID(僅在跨度上使用有界雜湊(如適用)—— 從不在指標上使用)
  • 工作階段金鑰和工作階段 ID
  • 主機名稱、檔案路徑、機密值
# Tokens per minute, split by provider
sum by (provider) (rate(openclaw_model_tokens_total[1m]))
# Spend (USD) over the last hour, by model
sum by (model) (increase(openclaw_model_cost_usd_total[1h]))
# 95th percentile model run duration
histogram_quantile(
0.95,
sum by (le, provider, model)
(rate(openclaw_run_duration_seconds_bucket[5m]))
)
# Queue wait time SLO (95p under 2s)
histogram_quantile(
0.95,
sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
) < 2
# Dropped Prometheus series (cardinality alarm)
increase(openclaw_prometheus_series_dropped_total[15m]) > 0

在 Prometheus 和 OpenTelemetry 匯出之間選擇

Section titled “在 Prometheus 和 OpenTelemetry 匯出之間選擇”

OpenClaw 獨立支援這兩種介面。您可以執行其中之一、同時執行兩者,或都不執行。

  • Pull 模型:Prometheus 抓取 /api/diagnostics/prometheus
  • 不需要外部收集器。
  • 透過正常的 Gateway 認證進行驗證。
  • 介面僅包含指標(沒有追蹤或日誌)。
  • 最適合已標準化使用 Prometheus + Grafana 的技術堆疊。
Empty response body
  • 檢查設定中的 diagnostics.enabled: true
  • 確認外掛已啟用並使用 openclaw plugins list --enabled 載入。
  • 產生一些流量;計數器和直方圖僅在至少發生一次事件後才會輸出行。
401 / unauthorized

此端點需要 Gateway 操作員範圍(帶有 gatewayRuntimeScopeSurface: "trusted-operator"auth: "gateway")。使用 Prometheus 用於任何其他 Gateway 操作員路由的相同 token 或密碼。沒有公開的未經驗證模式。

`openclaw_prometheus_series_dropped_total` is climbing

有新的屬性超過了 2048 個序列的上限。檢查最近的指標中是否有意外的高基數標籤,並從源頭修復它。匯出器會刻意捨棄新序列,而不是無聲地重寫標籤。

Prometheus shows stale series after a restart

該外掛僅在記憶體中保持狀態。Gateway 重啟後,計數器會重設為零,儀表會從下一次報告的值重新開始。使用 PromQL rate()increase() 來乾淨地處理重設。