tra-extract-text

Extract text from web pages using the trafilatura command-line tool.

Installation

pip install trafilatura

trafilatura -u URL --markdown

trafilatura -u URL --text

trafilatura -u URL --markdown > output.md
trafilatura -u URL --text > output.txt

Option	Description
`-u, --url`	Target URL (required)
`--markdown`	Output as Markdown (default)
`--text`	Output as plain text
`--html`	Output as HTML
`--json`	Output as JSON
`--xml`	Output as XML
`-o, --output`	Write to file instead of stdout
`--with-metadata`	Include metadata (title, author, date)
`--license`	Show license info

Extract a Medium article to markdown:

trafilatura -u "https://medium.com/example/article" --markdown

Extract and save:

trafilatura -u "https://news.example.com/article" --markdown -o article.md

Extract with metadata:

trafilatura -u "https://example.com/post" --markdown --with-metadata