youtube-transcript-yt-dlp
v1.0.5Extract YouTube video transcripts from existing captions (manual or auto-generated) using yt-dlp, with optional timestamps and local SQLite caching. Use when the user asks for a YouTube transcript, captions, subtitles, or wants to turn a YouTube link into text for summarization/search.
Installation
YouTube Transcript (Captions-Only)
This skill extracts transcripts from existing YouTube captions.
Primary behavior
- Prefer manual subtitles when available.
- Fall back to auto-generated captions.
- Output either:
- JSON segments (default) or
- plain text (--text)
- Cache results locally in SQLite for speed.
Reliability behavior
- If YouTube blocks anonymous access (bot-check), provide cookies.txt.
- If yt-dlp reports no captions for a video, the script tries a fallback:
1) YouTube’s transcript panel (youtubei get_transcript) when accessible
This published version intentionally does not call third-party transcript providers.
Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.
Cookies: Cookies are treated as secrets.
- The script supports --cookies / YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory.
- Store cookies under ~/.config/yt-transcript/.
Path safety: This skill restricts --cookies and --cache paths to approved directories.
- cookies allowed under: ~/.config/yt-transcript/
- cache allowed under: {baseDir}/cache/ and ~/.config/yt-transcript/
How to run
Script path:
- {baseDir}/scripts/yt_transcript.py
Typical usage:
- python3 {baseDir}/scripts/yt_transcript.py <youtube_url_or_id>
- python3 {baseDir}/scripts/yt_transcript.py <url> --lang en
- python3 {baseDir}/scripts/yt_transcript.py <url> --text
- python3 {baseDir}/scripts/yt_transcript.py <url> --no-ts
Cookies (optional, but often required on VPS IPs):
- python3 {baseDir}/scripts/yt_transcript.py <url> --cookies /path/to/youtube-cookies.txt
- or set env var: YT_TRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt
Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.
Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.
What the script returns
JSON mode (default)
A JSON object:
- video_id: 11-char id
- lang: chosen language
- source: manual | auto | panel
- segments: list of { start, duration, text } (or text-only when --no-ts)
Text mode (--text)
A newline-separated transcript.
- By default timestamps are included as [12.34s].
- Use --no-ts to output only the text lines.
Caching
Default cache DB:
- {baseDir}/cache/transcripts.sqlite
Cache key includes:
- video_id, lang, source, include_timestamp, format
Cookie handling (important)
- Cookies must be in Netscape cookies.txt format.
- Treat cookies as secrets.
- Never commit / publish cookies to ClawHub.
Recommended local path (ignored by git/publish):
- {baseDir}/cache/youtube-cookies.txt (chmod 600)
Notes (safety + reliability)
- Only accept a YouTube URL or an 11-character video ID.
- Do not forward arbitrary user-provided flags into the command.
- If
yt-dlpis missing, instruct the user to install it (recommended): - install pipx
pipx install yt-dlp- ensure
yt-dlpis on PATH