Zhipu Web Page Reader

Fetch and parse web page content via Zhipu AI's Reader API (/paas/v4/reader), using lightweight cURL. Returns parsed page content in Markdown or plain text format, along with metadata like title and description.

Quick Start

Basic cURL Usage

curl --request POST 
  --url https://open.bigmodel.cn/api/paas/v4/reader 
  --header "Authorization: Bearer $ZHIPU_API_KEY" 
  --header 'Content-Type: application/json' 
  --data '{
    "url": "https://www.example.com"
  }'

Script Usage

A wrapper shell script is provided for convenience.

# Basic Fetch (returns Markdown by default)
bash scripts/zhipu_fetch.sh --url "https://www.example.com"

# Fetch as plain text, no cache
bash scripts/zhipu_fetch.sh 
  --url "https://docs.python.org/3/" 
  --format text 
  --no-cache

# Fetch with image and link summaries
bash scripts/zhipu_fetch.sh 
  --url "https://news.example.com/article" 
  --images-summary 
  --links-summary

# Fetch without images, disable GFM
bash scripts/zhipu_fetch.sh 
  --url "https://blog.example.com/post" 
  --no-images 
  --no-gfm

API Parameter Reference

Parameter	Type	Required	Default	Description
`url`	string	✅	-	URL of the web page to fetch
`timeout`	integer	-	`20`	Request timeout in seconds
`no_cache`	boolean	-	`false`	Disable caching (`true`/`false`)
`return_format`	string	-	`markdown`	Return format: `markdown` or `text`
`retain_images`	boolean	-	`true`	Retain images in output (`true`/`false`)
`no_gfm`	boolean	-	`false`	Disable GitHub Flavored Markdown (`true`/`false`)
`keep_img_data_url`	boolean	-	`false`	Keep image data URLs (`true`/`false`)
`with_images_summary`	boolean	-	`false`	Include images summary (`true`/`false`)
`with_links_summary`	boolean	-	`false`	Include links summary (`true`/`false`)

Response Structure

The API returns JSON with the parsed page content.

{
  "id": "task-id",
  "created": 1704067200,
  "request_id": "request-id",
  "model": "model-name",
  "reader_result": {
    "title": "Page Title",
    "description": "Brief page description",
    "url": "https://www.example.com",
    "content": "Parsed page content (Markdown or text)",
    "external": {
      "stylesheet": {}
    },
    "metadata": {
      "keywords": "page, keywords",
      "viewport": "width=device-width",
      "description": "Meta description",
      "format-detection": "telephone=no"
    }
  }
}

Key Response Fields

Field	Description
`reader_result.content`	Main parsed content (body text, images, links)
`reader_result.title`	Page title
`reader_result.description`	Brief page description
`reader_result.url`	Original page URL
`reader_result.metadata`	Page metadata (keywords, viewport, etc.)

Common Use Cases

Scenario	Command
Read a documentation page	`--url <doc_url>`
Extract text only (no images)	`--url <url> --no-images --format text`
Force fresh fetch (bypass cache)	`--url <url> --no-cache`
Get content with all summaries	`--url <url> --images-summary --links-summary`
Long page with extended timeout	`--url <url> --timeout 60`

Environment Requirements

Environment variable ZHIPU_API_KEY must be configured.
curl command must be available in your system path.

zhipu-web-fetch

Installation