SkillHub

scrapling-web-scraper

v1.0.0

Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically. Use when you need to scrape protected websites, extract data from dynamic JavaScript SPAs, or bypass anti-bot detection systems. Supports three modes - basic (f...

Sourced from ClawHub, Authored by zhengxinjipai

Installation

Please help me install the skill `scrapling-web-scraper` from SkillHub official store. npx skills add zhengxinjipai/scrapling-web-scraper

Scrapling Web Scraping

Zero-bot-detection web scraping for OpenClaw. Bypass Cloudflare, handle JavaScript-heavy sites, and adapt to website changes automatically.

Quick Start

# Install Scrapling
pip install "scrapling[all]"
scrapling install

# Basic usage
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com

# Bypass Cloudflare
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://protected-site.com --mode stealth --cloudflare

# Extract specific data
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://example.com --selector ".product-title"

# JavaScript-heavy sites
python3 /root/.openclaw/skills/scrapling-web-scraping/scrapling_tool.py https://spa-app.com --mode dynamic --wait ".content-loaded"

Usage with OpenClaw

Natural Language Commands

Basic scraping:

"用Scrapling抓取 https://example.com 的标题和所有链接"

Bypass protection:

"用隐身模式抓取 https://protected-site.com,绕过Cloudflare"

Extract data:

"抓取 https://shop.com 的商品名称和价格,CSS选择器是 .product"

Dynamic content:

"抓取 https://spa-app.com,等待 .data-loaded 元素加载完成"

Python Code

# Basic scraping
from scrapling.fetchers import Fetcher
page = Fetcher.get('https://example.com')
title = page.css('title::text').get()

# Bypass Cloudflare
from scrapling.fetchers import StealthyFetcher
page = StealthyFetcher.fetch('https://protected.com', 
                              headless=True, 
                              solve_cloudflare=True)

# JavaScript sites
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch('https://spa-app.com', 
                             headless=True, 
                             network_idle=True)

Features

Feature Command Description
Basic Scrape --mode basic Fast HTTP requests
Stealth Mode --mode stealth Bypass Cloudflare/anti-bot
Dynamic Mode --mode dynamic Handle JavaScript sites
CSS Selectors --selector ".class" Extract specific elements
JSON Output --json Machine-readable output

Examples

1. Scrape with CSS Selector

python3 scrapling_tool.py https://quotes.toscrape.com --selector ".quote .text" --json

2. Bypass Cloudflare

python3 scrapling_tool.py https://nopecha.com/demo/cloudflare --mode stealth --cloudflare

3. Wait for Dynamic Content

python3 scrapling_tool.py https://spa-app.com --mode dynamic --wait ".loaded" --json

CLI Reference

python3 scrapling_tool.py URL [options]

Options:
  --mode {basic,stealth,dynamic}  Scraping mode (default: basic)
  --selector, -s CSS_SELECTOR     Extract specific elements
  --cloudflare                    Solve Cloudflare (stealth mode only)
  --wait SELECTOR                 Wait for element (dynamic mode only)
  --json, -j                      Output as JSON

Advanced: Custom Scripts

Create custom scraping scripts in /root/.openclaw/skills/scrapling-web-scraping/:

from scrapling.fetchers import StealthyFetcher

# Your custom scraper
def scrape_products(url):
    page = StealthyFetcher.fetch(url, headless=True)
    products = []
    for item in page.css('.product'):
        products.append({
            'name': item.css('.name::text').get(),
            'price': item.css('.price::text').get(),
            'link': item.css('a::attr(href)').get()
        })
    return products

Notes

  • Requires Python 3.10+
  • First run: scrapling install to download browsers
  • Respect website Terms of Service
  • Use responsibly

Created: 2026-03-05 by 老二 Source: https://github.com/D4Vinci/Scrapling