ds4
ds4 serves DeepSeek V4 Flash from a local
Metal backend with an OpenAI-compatible /v1 API. OpenClaw connects to ds4
through the generic openai-completions provider family.
ds4 is not a bundled OpenClaw provider plugin. Configure it under
models.providers.ds4, then select ds4/deepseek-v4-flash.
- Provider id:
ds4 - Plugin: none
- API: OpenAI-compatible Chat Completions (
openai-completions) - Suggested base URL:
http://127.0.0.1:18000/v1 - Model id:
deepseek-v4-flash - Tool calls: supported through OpenAI-style
toolsandtool_calls - Reasoning: DeepSeek-style
thinkingandreasoning_effort
Requirements
Section titled “Requirements”- macOS with Metal support.
- A working ds4 checkout with
ds4-serverand the DeepSeek V4 Flash GGUF file. - Enough memory for the context you choose. Larger
--ctxvalues allocate more KV memory when the server starts.
Quickstart
Section titled “Quickstart”Start ds4-server
Replace `
` with your ds4 checkout path.
```bash/ds4-server
—model/ds4flash.gguf
—host 127.0.0.1
—port 18000
—ctx 32768
—tokens 128 ```Verify the OpenAI-compatible endpoint
Terminal window curl http://127.0.0.1:18000/v1/modelsThe response should include
deepseek-v4-flash.Add the OpenClaw provider config
Add the config from Full config, then run a one-shot model check:
Terminal window openclaw infer model run \--local \--model ds4/deepseek-v4-flash \--thinking off \--prompt "Reply with exactly: openclaw-ds4-ok" \--json
Full config
Section titled “Full config”Use this config when ds4 is already running on 127.0.0.1:18000.
{ agents: { defaults: { model: { primary: "ds4/deepseek-v4-flash" }, models: { "ds4/deepseek-v4-flash": { alias: "DS4 local", }, }, }, }, models: { mode: "merge", providers: { ds4: { baseUrl: "http://127.0.0.1:18000/v1", apiKey: "ds4-local", api: "openai-completions", timeoutSeconds: 300, models: [ { id: "deepseek-v4-flash", name: "DeepSeek V4 Flash (ds4)", reasoning: true, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 32768, maxTokens: 128, compat: { supportsUsageInStreaming: true, supportsReasoningEffort: true, maxTokensField: "max_tokens", supportsStrictMode: false, thinkingFormat: "deepseek", supportedReasoningEfforts: ["low", "medium", "high", "xhigh"], }, }, ], }, }, },}Keep contextWindow aligned with the ds4-server --ctx value. Keep maxTokens
aligned with --tokens unless you intentionally want OpenClaw to request less
output than the server default.
On-demand startup
Section titled “On-demand startup”OpenClaw can start ds4 only when a ds4/... model is selected. Add
localService to the same provider entry:
{ models: { providers: { ds4: { baseUrl: "http://127.0.0.1:18000/v1", apiKey: "ds4-local", api: "openai-completions", timeoutSeconds: 300, localService: { command: "<DS4_DIR>/ds4-server", args: [ "--model", "<DS4_DIR>/ds4flash.gguf", "--host", "127.0.0.1", "--port", "18000", "--ctx", "32768", "--tokens", "128", ], cwd: "<DS4_DIR>", healthUrl: "http://127.0.0.1:18000/v1/models", readyTimeoutMs: 300000, idleStopMs: 0, }, models: [ { id: "deepseek-v4-flash", name: "DeepSeek V4 Flash (ds4)", reasoning: true, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 32768, maxTokens: 128, compat: { supportsUsageInStreaming: true, supportsReasoningEffort: true, maxTokensField: "max_tokens", supportsStrictMode: false, thinkingFormat: "deepseek", supportedReasoningEfforts: ["low", "medium", "high", "xhigh"], }, }, ], }, }, },}command must be an absolute executable path. Shell lookup and ~ expansion are
not used. See Local model services for every
localService field.
Think Max
Section titled “Think Max”ds4 applies Think Max only when both conditions are true:
ds4-serverstarts with--ctx 393216or higher.- The request uses
reasoning_effort: "max"or the equivalent ds4 effort field.
If you run that large context, update both the server flags and OpenClaw model metadata:
{ contextWindow: 393216, maxTokens: 384000, compat: { supportsUsageInStreaming: true, supportsReasoningEffort: true, maxTokensField: "max_tokens", supportsStrictMode: false, thinkingFormat: "deepseek", supportedReasoningEfforts: ["low", "medium", "high", "xhigh", "max"], },}Start with a direct HTTP check:
curl http://127.0.0.1:18000/v1/chat/completions \ -H 'content-type: application/json' \ -d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: ds4-ok"}],"max_tokens":16,"stream":false,"thinking":{"type":"disabled"}}'Then test OpenClaw model routing:
openclaw infer model run \ --local \ --model ds4/deepseek-v4-flash \ --thinking off \ --prompt "Reply with exactly: openclaw-ds4-ok" \ --jsonFor a full agent and tool-call smoke, use a context of at least 32768:
openclaw agent \ --local \ --session-id ds4-tool-smoke \ --model ds4/deepseek-v4-flash \ --thinking off \ --message "Use the shell command pwd once, then reply exactly: tool-ok <output>" \ --json \ --timeout 240Expected result:
executionTrace.winnerProviderisds4executionTrace.winnerModelisdeepseek-v4-flashtoolSummary.callsis at least1finalAssistantVisibleTextstarts withtool-ok
Troubleshooting
Section titled “Troubleshooting”curl /v1/models cannot connect
ds4 is not running or not bound to the host and port in baseUrl. Start
ds4-server, then retry:
curl http://127.0.0.1:18000/v1/models500 prompt exceeds context
The configured --ctx is too small for the OpenClaw turn. Raise
ds4-server --ctx, then update models.providers.ds4.models[].contextWindow
to match. Full agent turns with tools need substantially more context than a
direct one-message curl request.
Think Max does not activate
ds4 only uses Think Max when --ctx is at least 393216 and the request
asks for reasoning_effort: "max". Smaller contexts fall back to high
reasoning.
The first request is slow
ds4 has a cold Metal residency and model warmup phase. Use
localService.readyTimeoutMs: 300000 when OpenClaw starts the server on
demand.
Related
Section titled “Related”Start local model servers on demand before model requests.
Choose and operate local model backends.
Configure provider refs, auth, and failover.
Native DeepSeek provider behavior and thinking controls.