SkillHub

rupert-web-scraper

v1.0.0

提取并导出结构化网页数据(文本、表格、图片等),支持JSON、CSV、Markdown或SQL格式,遵守道德与合法的爬取规范。

Sourced from ClawHub, Authored by rupertnt034

Installation

Please help me install the skill `rupert-web-scraper` from SkillHub official store. npx skills add rupertnt034/rupert-web-scraper

Web Scraper Skill

Overview

Extract data from websites efficiently and ethically.

Capabilities

1. Data Extraction

  • Extract text content
  • Pull structured data
  • Capture tables
  • Get images/media

2. Formats

  • JSON output
  • CSV export
  • Markdown
  • SQL inserts

3. Features

  • Rate limiting
  • Caching
  • Retry logic
  • Error handling
  • Proxy support

4. Ethical Scraping

  • Respect robots.txt
  • Rate limits
  • User agent rotation
  • Legal compliance

Usage

Commands

  • scrape [URL] for [data]
  • extract [element] from [URL]
  • get table from [URL]
  • crawl [website] depth [n]
  • export [URL] to [format]

Examples

Input: "scrape example.com for product names and prices" Output:

{
  "products": [
    {"name": "Product A", "price": "$19.99"},
    {"name": "Product B", "price": "$29.99"}
  ]
}

Configuration

Rate Limits

  • Default: 1 request/second
  • Configurable: 0.1-10 req/s
  • Respect site limits

Output Options

  • JSON (default)
  • CSV
  • Markdown
  • SQL
  • Custom template

Best Practices

  1. Always identify yourself
  2. Cache responses
  3. Handle errors gracefully
  4. Stay within legal bounds
  5. Don't overwhelm servers