Prompt Caching
Prompt caching
Section titled “Prompt caching”Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (cacheWrite), and later matching requests can read them back (cacheRead).
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
For Anthropic pricing details, see: https://docs.anthropic.com/docs/build-with-claude/prompt-caching
Primary knobs
Section titled “Primary knobs”cacheRetention (global default, model, and per-agent)
Section titled “cacheRetention (global default, model, and per-agent)”Set cache retention as a global default for all models:
agents: defaults: params: cacheRetention: "long" # none | short | longOverride per-model:
agents: defaults: models: "anthropic/claude-opus-4-6": params: cacheRetention: "short" # none | short | longPer-agent override:
agents: list: - id: "alerts" params: cacheRetention: "none"Config merge order:
agents.defaults.params(global default — applies to all models)agents.defaults.models["provider/model"].params(per-model override)agents.list[].params(matching agent id; overrides by key)
Legacy cacheControlTtl
Section titled “Legacy cacheControlTtl”Legacy values are still accepted and mapped:
5m->short1h->long
Prefer cacheRetention for new config.
contextPruning.mode: "cache-ttl"
Section titled “contextPruning.mode: "cache-ttl"”Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
agents: defaults: contextPruning: mode: "cache-ttl" ttl: "1h"See Session Pruning for full behavior.
Heartbeat keep-warm
Section titled “Heartbeat keep-warm”Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.
agents: defaults: heartbeat: every: "55m"Per-agent heartbeat is supported at agents.list[].heartbeat.
Provider behavior
Section titled “Provider behavior”Anthropic (direct API)
Section titled “Anthropic (direct API)”cacheRetentionis supported.- With Anthropic API-key auth profiles, OpenClaw seeds
cacheRetention: "short"for Anthropic model refs when unset.
Amazon Bedrock
Section titled “Amazon Bedrock”- Anthropic Claude model refs (
amazon-bedrock/*anthropic.claude*) support explicitcacheRetentionpass-through. - Non-Anthropic Bedrock models are forced to
cacheRetention: "none"at runtime.
OpenRouter Anthropic models
Section titled “OpenRouter Anthropic models”For openrouter/anthropic/* model refs, OpenClaw injects Anthropic cache_control on system/developer prompt blocks to improve prompt-cache reuse.
Other providers
Section titled “Other providers”If the provider does not support this cache mode, cacheRetention has no effect.
Tuning patterns
Section titled “Tuning patterns”Mixed traffic (recommended default)
Section titled “Mixed traffic (recommended default)”Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:
agents: defaults: model: primary: "anthropic/claude-opus-4-6" models: "anthropic/claude-opus-4-6": params: cacheRetention: "long" list: - id: "research" default: true heartbeat: every: "55m" - id: "alerts" params: cacheRetention: "none"Cost-first baseline
Section titled “Cost-first baseline”- Set baseline
cacheRetention: "short". - Enable
contextPruning.mode: "cache-ttl". - Keep heartbeat below your TTL only for agents that benefit from warm caches.
Cache diagnostics
Section titled “Cache diagnostics”OpenClaw exposes dedicated cache-trace diagnostics for embedded agent runs.
diagnostics.cacheTrace config
Section titled “diagnostics.cacheTrace config”diagnostics: cacheTrace: enabled: true filePath: "~/.openclaw/logs/cache-trace.jsonl" # optional includeMessages: false # default true includePrompt: false # default true includeSystem: false # default trueDefaults:
filePath:$OPENCLAW_STATE_DIR/logs/cache-trace.jsonlincludeMessages:trueincludePrompt:trueincludeSystem:true
Env toggles (one-off debugging)
Section titled “Env toggles (one-off debugging)”OPENCLAW_CACHE_TRACE=1enables cache tracing.OPENCLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonloverrides output path.OPENCLAW_CACHE_TRACE_MESSAGES=0|1toggles full message payload capture.OPENCLAW_CACHE_TRACE_PROMPT=0|1toggles prompt text capture.OPENCLAW_CACHE_TRACE_SYSTEM=0|1toggles system prompt capture.
What to inspect
Section titled “What to inspect”- Cache trace events are JSONL and include staged snapshots like
session:loaded,prompt:before,stream:context, andsession:after. - Per-turn cache token impact is visible in normal usage surfaces via
cacheReadandcacheWrite(for example/usage fulland session usage summaries).
Quick troubleshooting
Section titled “Quick troubleshooting”- High
cacheWriteon most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings. - No effect from
cacheRetention: confirm model key matchesagents.defaults.models["provider/model"]. - Bedrock Nova/Mistral requests with cache settings: expected runtime force to
none.
Related docs: