Curated Search Skill

Summary

Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.

External Endpoints

This skill does not call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (npm run crawl) and respect the configured domain whitelist.

Security & Privacy

Search is fully local – After the index is built, all queries run offline; no data leaves your machine.
Crawling is optional and whitelist‑scoped – The crawler only accesses domains you explicitly list in config.yaml. It respects robots.txt and configurable delays.
No telemetry – No usage data is transmitted externally.
Configuration is read from local config.yaml and the index file in data/.

Model Invocation Note

The curated-search.search tool is invoked only when the user explicitly calls it. It does not run autonomously. OpenClaw calls the tool handler (scripts/search.js) when the user asks to search the curated index.

Trust Statement

By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.

Tool: curated-search.search

Search the curated index.

Parameters

Name	Type	Required	Default	Description
`query`	string	yes	—	Search query terms
`limit`	number	no	5	Maximum results (capped by config.max_limit, typically 100)
`domain`	string	no	null	Filter to specific domain (e.g., `docs.python.org`)
`min_score`	number	no	0.0	Minimum relevance score (0.0–1.0); filters out low-quality matches
`offset`	number	no	0	Pagination offset (skip first N results)

Response

JSON array of result objects:

[
  {
    "title": "Python Tutorial",
    "url": "https://docs.python.org/3/tutorial/",
    "snippet": "Python is an easy to learn, powerful programming language...",
    "domain": "docs.python.org",
    "score": 0.87,
    "crawled_at": 1707712345678
  }
]

Fields: - title — Document title (cleaned) - url — Source URL (canonical) - snippet — Excerpt (~200 chars) from content - domain — Hostname of source - score — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range) - crawled_at — Unix timestamp when page was crawled

Example Agent Calls

search CuratedSearch for "python tutorial"
search CuratedSearch for "async await" limit=3 domain=developer.mozilla.org
search CuratedSearch for "linux man page" min_score=0.3

Errors

If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:

{
  "error": "index_not_found",
  "message": "Search index not found. The index has not been built yet.",
  "suggestion": "Run the crawler first: npm run crawl",
  "details": { "path": "data/index.json" }
}

Common error codes:

Code	Meaning	Suggested Fix
`config_missing`	Configuration file not found	Specify `--config` path or ensure config.yaml exists
`config_invalid`	YAML parsing failed	Check syntax in config.yaml
`config_missing_index_path`	`index.path` not set	Add index.path to config
`index_not_found`	Index file missing	Run `npm run crawl` to build index
`index_corrupted`	Index file incompatible or corrupted	Rebuild index with `npm run crawl`
`index_init_failed`	Unexpected index initialization error	Check permissions, reinstall dependencies
`missing_query`	No query provided	Provide `--query` argument
`query_too_long`	Query exceeds 1000 characters	Shorten the query
`limit_exceeded`	Limit > config.max_limit	Use a smaller limit
`invalid_domain`	Domain filter malformed	Use format like `docs.python.org`
`conflicting_flags`	Mutually exclusive flags used (e.g., `--stats` with `--query`)	Use flags correctly
`stats_failed`	Could not retrieve index stats	Ensure index is accessible
`search_failed`	Search execution threw an error	Check query and index integrity

Configuration

Edit config.yaml in the skill directory. Key sections:

domains — whitelist of allowed domains (required)
seeds — starting URLs for crawling
crawl — depth, delay, timeout, max_documents
content — min_content_length, max_content_length
index — path to index files
search — default_limit, max_limit, min_score

See README.md for full configuration docs.

Support

Full documentation: README.md
Technical specs: specs/
Build plan: PLAN.md
Contributor guide: CONTRIBUTING.md
Issues: Report on GitHub (or via OpenClaw maintainers)