Tavily Crawl

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

Get your API key at https://tavily.com and add to your OpenClaw config:

{
  "skills": {
    "entries": {
      "tavily-crawl": {
        "enabled": true,
        "apiKey": "tvly-YOUR_API_KEY_HERE"
      }
    }
  }
}

Or set in environment variable:

export TAVILY_API_KEY="tvly-YOUR_API_KEY_HERE"

Quick Start

Using the Script

node {baseDir}/scripts/crawl.mjs "https://docs.example.com"
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --output ./docs
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 --limit 50

Examples

# Basic crawl
node {baseDir}/scripts/crawl.mjs "https://docs.example.com"

# Deeper crawl with limits
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --limit 50

# Save to files
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" --depth 2 --output ./docs

# Focused crawl with path filters
node {baseDir}/scripts/crawl.mjs "https://example.com" --depth 2 
  --select "/docs/.*" --exclude "/blog/.*"

# With semantic instructions
node {baseDir}/scripts/crawl.mjs "https://docs.example.com" 
  --instructions "Find API documentation" --chunks 3

Options

Option	Description	Default
`--depth <n>`	Crawl depth (1-5)	1
`--breadth <n>`	Links per page	20
`--limit <n>`	Total pages cap	50
`--output <dir>`	Save pages to directory	-
`--instructions <text>`	Natural language guidance	-
`--chunks <n>`	Chunks per page (1-5, requires instructions)	-
`--depth-mode <mode>`	Extract depth: `basic` or `advanced`	`basic`
`--select <pattern>`	Regex pattern to include	-
`--exclude <pattern>`	Regex pattern to exclude	-
`--timeout <sec>`	Max wait time (10-150 seconds)	150
`--json`	Output raw JSON	false

Depth vs Performance

Depth	Typical Pages	Time
1	10-50	Seconds
2	50-500	Minutes
3	500-5000	Many minutes

Start with --depth 1 and increase only if needed.

Crawl for Context vs Data Collection

For agentic use (feeding results into context): Always use --instructions + --chunks. This returns only relevant chunks instead of full pages, preventing context window explosion.

For data collection (saving to files): Omit --chunks to get full page content.

Tips

Always use --chunks for agentic workflows - prevents context explosion when feeding results to LLMs
Omit --chunks only for data collection - when saving full pages to files
Start conservative (--depth 1, --limit 20) and scale up
Use path patterns to focus on relevant sections
Always set a --limit to prevent runaway crawls

liang-tavily-crawl

Installation