SkillHub

tra-extract-text

v1.0.0

Extract text content from web pages using trafilatura CLI. Use when user wants to extract readable text, markdown, or raw HTML from a URL. Triggers on requests like: "extract text from URL", "scrape web page content", "get article text", "convert web page to markdown", "trafilatura".

Sourced from ClawHub, Authored by goog

Installation

Please help me install the skill `tra-extract-text` from SkillHub official store. npx skills add goog/tra-extract-text

tra-extract-text

Extract text from web pages using the trafilatura command-line tool.

Installation

pip install trafilatura

Usage

Basic text extraction (Markdown)

trafilatura -u URL --markdown

Extract raw text (no formatting)

trafilatura -u URL --text

Output to file

trafilatura -u URL --markdown > output.md
trafilatura -u URL --text > output.txt

CLI Options

Option Description
-u, --url Target URL (required)
--markdown Output as Markdown (default)
--text Output as plain text
--html Output as HTML
--json Output as JSON
--xml Output as XML
-o, --output Write to file instead of stdout
--with-metadata Include metadata (title, author, date)
--license Show license info

Examples

Extract a Medium article to markdown:

trafilatura -u "https://medium.com/example/article" --markdown

Extract and save:

trafilatura -u "https://news.example.com/article" --markdown -o article.md

Extract with metadata:

trafilatura -u "https://example.com/post" --markdown --with-metadata