Skip to content

Prometheus metrics

OpenClaw can expose diagnostics metrics through the official diagnostics-prometheus plugin. It listens to trusted internal diagnostics and renders a Prometheus text endpoint at:

GET /api/diagnostics/prometheus

Content type is text/plain; version=0.0.4; charset=utf-8, the standard Prometheus exposition format.

For traces, logs, OTLP push, and OpenTelemetry GenAI semantic attributes, see OpenTelemetry export.

  1. Install the plugin

    Terminal window
    openclaw plugins install clawhub:@openclaw/diagnostics-prometheus
  2. Enable the plugin

    {
    plugins: {
    allow: ["diagnostics-prometheus"],
    entries: {
    "diagnostics-prometheus": { enabled: true },
    },
    },
    diagnostics: {
    enabled: true,
    },
    }
  3. Restart the Gateway

    The HTTP route is registered at plugin startup, so reload after enabling.

  4. Scrape the protected route

    Send the same gateway auth your operator clients use:

    Terminal window
    curl -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
    http://127.0.0.1:18789/api/diagnostics/prometheus
  5. Wire Prometheus

    prometheus.yml
    scrape_configs:
    - job_name: openclaw
    scrape_interval: 30s
    metrics_path: /api/diagnostics/prometheus
    authorization:
    credentials_file: /etc/prometheus/openclaw-gateway-token
    static_configs:
    - targets: ["openclaw-gateway:18789"]
MetricTypeLabels
openclaw_run_completed_totalcounterchannel, model, outcome, provider, trigger
openclaw_run_duration_secondshistogramchannel, model, outcome, provider, trigger
openclaw_model_call_totalcounterapi, error_category, model, outcome, provider, transport
openclaw_model_call_duration_secondshistogramapi, error_category, model, outcome, provider, transport
openclaw_model_tokens_totalcounteragent, channel, model, provider, token_type
openclaw_gen_ai_client_token_usagehistogrammodel, provider, token_type
openclaw_model_cost_usd_totalcounteragent, channel, model, provider
openclaw_skill_used_totalcounteractivation, agent, skill, source
openclaw_tool_execution_totalcountererror_category, outcome, params_kind, tool, tool_owner, tool_source
openclaw_tool_execution_duration_secondshistogramerror_category, outcome, params_kind, tool, tool_owner, tool_source
openclaw_harness_run_totalcounterchannel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_harness_run_duration_secondshistogramchannel, error_category, harness, model, outcome, phase, plugin, provider
openclaw_message_received_totalcounterchannel, source
openclaw_message_dispatch_started_totalcounterchannel, source
openclaw_message_dispatch_completed_totalcounterchannel, outcome, reason, source
openclaw_message_dispatch_duration_secondshistogramchannel, outcome, reason, source
openclaw_message_processed_totalcounterchannel, outcome, reason
openclaw_message_processed_duration_secondshistogramchannel, outcome, reason
openclaw_message_delivery_started_totalcounterchannel, delivery_kind
openclaw_message_delivery_totalcounterchannel, delivery_kind, error_category, outcome
openclaw_message_delivery_duration_secondshistogramchannel, delivery_kind, error_category, outcome
openclaw_talk_event_totalcounterbrain, event_type, mode, provider, transport
openclaw_talk_event_duration_secondshistogrambrain, event_type, mode, provider, transport
openclaw_talk_audio_byteshistogrambrain, event_type, mode, provider, transport
openclaw_queue_lane_sizegaugelane
openclaw_queue_lane_wait_secondshistogramlane
openclaw_session_state_totalcounterreason, state
openclaw_session_queue_depthgaugestate
openclaw_session_turn_created_totalcounteragent, channel, trigger
openclaw_session_recovery_totalcounteraction, active_work_kind, state, status
openclaw_session_recovery_age_secondshistogramaction, active_work_kind, state, status
openclaw_memory_bytesgaugekind
openclaw_memory_rss_byteshistogramnone
openclaw_memory_pressure_totalcounterlevel, reason
openclaw_telemetry_exporter_totalcounterexporter, reason, signal, status
openclaw_prometheus_series_dropped_totalcounternone
Bounded, low-cardinality labels

Prometheus labels stay bounded and low-cardinality. The exporter does not emit raw diagnostic identifiers such as runId, sessionKey, sessionId, callId, toolCallId, message IDs, chat IDs, or provider request IDs.

Label values are redacted and must match OpenClaw’s low-cardinality character policy. Values that fail the policy are replaced with unknown, other, or none, depending on the metric. Labels that look like scoped agent session keys are also replaced with unknown.

Series cap and overflow accounting

The exporter caps retained time series in memory at 2048 series across counters, gauges, and histograms combined. New series beyond that cap are dropped, and openclaw_prometheus_series_dropped_total increments by one each time.

Watch this counter as a hard signal that an attribute upstream is leaking high-cardinality values. The exporter never lifts the cap automatically; if it climbs, fix the source rather than disabling the cap.

What never appears in Prometheus output
  • prompt text, response text, tool inputs, tool outputs, system prompts
  • Talk transcripts, audio payloads, call ids, room ids, handoff tokens, turn ids, and raw session ids
  • raw provider request IDs (only bounded hashes, where applicable, on spans — never on metrics)
  • session keys and session IDs
  • hostnames, file paths, secret values
# Tokens per minute, split by provider
sum by (provider) (rate(openclaw_model_tokens_total[1m]))
# Spend (USD) over the last hour, by model
sum by (model) (increase(openclaw_model_cost_usd_total[1h]))
# 95th percentile model run duration
histogram_quantile(
0.95,
sum by (le, provider, model)
(rate(openclaw_run_duration_seconds_bucket[5m]))
)
# Queue wait time SLO (95p under 2s)
histogram_quantile(
0.95,
sum by (le, lane) (rate(openclaw_queue_lane_wait_seconds_bucket[5m]))
) < 2
# Skill usage, split by bounded source
sum by (skill, source) (increase(openclaw_skill_used_total[24h]))
# Dropped Prometheus series (cardinality alarm)
increase(openclaw_prometheus_series_dropped_total[15m]) > 0

Choosing between Prometheus and OpenTelemetry export

Section titled “Choosing between Prometheus and OpenTelemetry export”

OpenClaw supports both surfaces independently. You can run either, both, or neither.

  • Pull model: Prometheus scrapes /api/diagnostics/prometheus.
  • No external collector required.
  • Authenticated through normal Gateway auth.
  • Surface is metrics only (no traces or logs).
  • Best for stacks already standardized on Prometheus + Grafana.
Empty response body
  • Check diagnostics.enabled: true in config.
  • Confirm the plugin is enabled and loaded with openclaw plugins list --enabled.
  • Generate some traffic; counters and histograms only emit lines after at least one event.
401 / unauthorized

The endpoint requires the Gateway operator scope (auth: "gateway" with gatewayRuntimeScopeSurface: "trusted-operator"). Use the same token or password Prometheus uses for any other Gateway operator route. There is no public unauthenticated mode.

`openclaw_prometheus_series_dropped_total` is climbing

A new attribute is exceeding the 2048-series cap. Inspect recent metrics for an unexpectedly high-cardinality label and fix it at the source. The exporter intentionally drops new series instead of silently rewriting labels.

Prometheus shows stale series after a restart

The plugin keeps state in memory only. After a Gateway restart, counters reset to zero and gauges restart at their next reported value. Use PromQL rate() and increase() to handle resets cleanly.