網頁擷取

web_fetch 工具會執行單純的 HTTP GET 請求並提取可讀內容（HTML 轉 markdown 或純文字）。它不會執行 JavaScript。

對於重度依賴 JS 的網站或需要登入的頁面，請改用網頁瀏覽器。

快速開始

web_fetch 預設為啟用 — 無需額外設定。代理程式可以立即呼叫它：

await web_fetch({ url: "https://example.com/article" });

工具參數

參數	類型	描述
`url`	`string`	要擷取的 URL（必填，僅限 http/https）
`extractMode`	`string`	`"markdown"`（預設）或 `"text"`
`maxChars`	`number`	將輸出截斷至此字符數

運作原理

擷取
傳送帶有類似 Chrome User-Agent 和 Accept-Language 標頭的 HTTP GET。封鎖私人/內部主機名稱並重新檢查重新導向。
提取
在 HTML 回應上執行 Readability（主要內容提取）。
後備（可選）
如果 Readability 失敗且已設定 Firecrawl，則透過 Firecrawl API 以繞過機器人模式重試。
快取
結果會被快取 15 分鐘（可設定），以減少對相同 URL 的重複擷取。

設定

{
  tools: {
    web: {
      fetch: {
        enabled: true, // default: true
        maxChars: 50000, // max output chars
        maxCharsCap: 50000, // hard cap for maxChars param
        maxResponseBytes: 2000000, // max download size before truncation
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
        maxRedirects: 3,
        readability: true, // use Readability extraction
        userAgent: "Mozilla/5.0 ...", // override User-Agent
      },
    },
  },
}

Firecrawl 後備

如果 Readability 提取失敗，web_fetch 可以退而求其次使用 Firecrawl 來繞過機器人偵測並獲得更好的提取效果：

{
  tools: {
    web: {
      fetch: {
        firecrawl: {
          enabled: true,
          apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
          baseUrl: "https://api.firecrawl.dev",
          onlyMainContent: true,
          maxAgeMs: 86400000, // cache duration (1 day)
          timeoutSeconds: 60,
        },
      },
    },
  },
}

tools.web.fetch.firecrawl.apiKey 支援 SecretRef 物件。

限制與安全性

maxChars 被限制為 tools.web.fetch.maxCharsCap
回應主體在解析前上限為 maxResponseBytes；超過大小的回應會被截斷並顯示警告
私有/內部主機名稱被封鎖
重新導向會被檢查並由 maxRedirects 限制
web_fetch 為盡力而為 — 某些網站需要 Web Browser

工具設定檔

如果您使用工具設定檔或允許列表，請新增 web_fetch 或 group:web：

{
  tools: {
    allow: ["web_fetch"],
    // or: allow: ["group:web"]  (includes both web_fetch and web_search)
  },
}

網頁擷取

網頁擷取

快速開始

工具參數

運作原理

設定

Firecrawl 後備

限制與安全性

工具設定檔

相關