moochmaniac-firecrawl-scraper
v1.0.0Web scraping, crawling, and search via Firecrawl API. Converts web pages to clean markdown/JSON optimized for AI consumption. Use when you need to extract content from websites, crawl entire sites, or search and scrape web results. Handles JavaScript-heavy sites, dynamic content, and provides struct...
Installation
Firecrawl Scraper
Professional web scraping powered by Firecrawl API. Converts websites to clean, AI-ready markdown or structured JSON.
When to Use
- Extract content from web pages for analysis
- Scrape documentation sites or knowledge bases
- Crawl entire websites systematically
- Search the web and get scraped content
- Parse JavaScript-heavy or dynamic sites
- Convert HTML to clean markdown for LLM processing
- Competitive research or content aggregation
Quick Start
Scrape a single page:
python3 scripts/scrape.py https://example.com
Crawl a website:
python3 scripts/scrape.py --crawl https://docs.example.com --depth 2 --limit 10
Search and scrape:
python3 scripts/scrape.py --search "AI agent frameworks" --limit 5
Check crawl status:
python3 scripts/scrape.py --crawl-status abc123
Commands
Scrape (Single Page)
Extract content from a single URL:
python3 scripts/scrape.py <url> [options]
Options:
- --formats markdown,html,screenshot — Output formats (default: markdown)
- --full — Include full page (no main content extraction)
- --json — Output raw JSON response
Examples:
# Basic scrape
python3 scripts/scrape.py https://docs.example.com
# Get HTML and markdown
python3 scripts/scrape.py https://site.com --formats markdown,html
# Full page (no content filtering)
python3 scripts/scrape.py https://site.com --full
# JSON output
python3 scripts/scrape.py https://site.com --json
Crawl (Entire Website)
Systematically crawl and scrape multiple pages:
python3 scripts/scrape.py --crawl <url> [options]
Options:
- --depth N — Maximum crawl depth (default: 2)
- --limit N — Maximum pages to crawl (default: 10)
- --json — Output raw JSON response
Examples:
# Basic crawl
python3 scripts/scrape.py --crawl https://docs.site.com
# Deep crawl with limit
python3 scripts/scrape.py --crawl https://blog.com --depth 3 --limit 50
# Shallow crawl
python3 scripts/scrape.py --crawl https://site.com --depth 1 --limit 5
Note: Crawl returns a job ID. Use --crawl-status to check progress and retrieve results.
Search (Web Search + Scrape)
Search the web and get scraped content from results:
python3 scripts/scrape.py --search <query> [options]
Options:
- --limit N — Number of results (default: 5)
- --json — Output raw JSON response
Examples:
# Search and scrape
python3 scripts/scrape.py --search "WordPress security best practices"
# More results
python3 scripts/scrape.py --search "AI agents 2026" --limit 10
# JSON output
python3 scripts/scrape.py --search "casino bonuses" --json
Crawl Status
Check status of a crawl job:
python3 scripts/scrape.py --crawl-status <job-id>
Returns JSON with:
- Status: scraping, completed, failed
- Progress: Pages scraped
- Data: Scraped content (when completed)
- Credits used
Output Formats
Markdown (default): Clean, LLM-ready text with preserved structure - Headings, links, lists, code blocks maintained - No HTML noise or styling artifacts - Perfect for RAG, summarization, analysis
HTML: Full HTML source (useful for parsing specific elements)
Screenshot: Base64-encoded PNG of rendered page
JSON: Structured data extraction (custom schemas supported)
Features
Smart Content Extraction: - Automatically identifies main content - Removes navigation, ads, footers - Preserves document structure
JavaScript Support: - Handles SPAs (React, Vue, Angular) - Waits for dynamic content to load - 96% web coverage
Anti-Bot Handling: - Proxy management built-in - Rate limiting handled automatically - CAPTCHA avoidance
Caching: - Smart caching reduces credits - Configurable cache behavior
API Key Setup
The script looks for the Firecrawl API key in:
1. workspace/secrets/firecrawl_api_key (OpenClaw workspace)
2. secrets/firecrawl_api_key (relative to current directory)
3. FIRECRAWL_API_KEY environment variable
Current key is stored at: workspace/secrets/firecrawl_api_key
Credits & Pricing
- Scrape: 1 credit per page
- Crawl: 1 credit per page crawled
- Search: 1 credit per result scraped
- Screenshot: +1 credit
- Advanced features: May use additional credits
Free tier: 500 credits
Paid plans: Starting at $16/month (3,000 credits)
Use Cases
Documentation Extraction:
python3 scripts/scrape.py --crawl https://docs.framework.com --depth 2 --limit 50
Competitive Research:
python3 scripts/scrape.py --search "top casino affiliate sites" --limit 10
Content Migration:
python3 scripts/scrape.py https://old-site.com/page1 --formats markdown
News Monitoring:
python3 scripts/scrape.py --search "WordPress security updates" --limit 5
Blog Scraping:
python3 scripts/scrape.py --crawl https://blog.site.com --depth 1 --limit 20
Tips
- Start with low
--limitvalues to test - Use
--depth 1for blog homepages (gets all posts) --depth 2-3for documentation sites--searchis faster than manual crawling for research- Check
--crawl-statusregularly for long crawls - Use
--jsonfor programmatic processing - Markdown format is best for LLM consumption
Comparison to Other Tools
vs web_fetch tool: - Firecrawl: Better JS support, cleaner output, handles complex sites - web_fetch: Faster, simpler, no API credits needed - Use Firecrawl for: Modern sites, heavy JS, need high-quality markdown - Use web_fetch for: Simple pages, quick checks, no credit usage
vs browser tool: - Firecrawl: Optimized for scraping, structured output, no browser management - browser: Full control, visual interaction, debugging - Use Firecrawl for: Content extraction at scale - Use browser for: Interactive tasks, testing, visual verification