SkillHub

youtube-transcript-yt-dlp

v1.0.5

Extract YouTube video transcripts from existing captions (manual or auto-generated) using yt-dlp, with optional timestamps and local SQLite caching. Use when the user asks for a YouTube transcript, captions, subtitles, or wants to turn a YouTube link into text for summarization/search.

Sourced from ClawHub, Authored by Subhadip Sarkar

Installation

Please help me install the skill `youtube-transcript-yt-dlp` from SkillHub official store. npx skills add ItzSubhadip/youtube-transcript-yt-dlp

YouTube Transcript (Captions-Only)

This skill extracts transcripts from existing YouTube captions.

Primary behavior - Prefer manual subtitles when available. - Fall back to auto-generated captions. - Output either: - JSON segments (default) or - plain text (--text) - Cache results locally in SQLite for speed.

Reliability behavior - If YouTube blocks anonymous access (bot-check), provide cookies.txt. - If yt-dlp reports no captions for a video, the script tries a fallback: 1) YouTube’s transcript panel (youtubei get_transcript) when accessible

This published version intentionally does not call third-party transcript providers.

Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.

Cookies: Cookies are treated as secrets. - The script supports --cookies / YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory. - Store cookies under ~/.config/yt-transcript/.

Path safety: This skill restricts --cookies and --cache paths to approved directories. - cookies allowed under: ~/.config/yt-transcript/ - cache allowed under: {baseDir}/cache/ and ~/.config/yt-transcript/

How to run

Script path: - {baseDir}/scripts/yt_transcript.py

Typical usage: - python3 {baseDir}/scripts/yt_transcript.py <youtube_url_or_id> - python3 {baseDir}/scripts/yt_transcript.py <url> --lang en - python3 {baseDir}/scripts/yt_transcript.py <url> --text - python3 {baseDir}/scripts/yt_transcript.py <url> --no-ts

Cookies (optional, but often required on VPS IPs): - python3 {baseDir}/scripts/yt_transcript.py <url> --cookies /path/to/youtube-cookies.txt - or set env var: YT_TRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt

Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.

Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.

What the script returns

JSON mode (default)

A JSON object: - video_id: 11-char id - lang: chosen language - source: manual | auto | panel - segments: list of { start, duration, text } (or text-only when --no-ts)

Text mode (--text)

A newline-separated transcript. - By default timestamps are included as [12.34s]. - Use --no-ts to output only the text lines.

Caching

Default cache DB: - {baseDir}/cache/transcripts.sqlite

Cache key includes: - video_id, lang, source, include_timestamp, format

  • Cookies must be in Netscape cookies.txt format.
  • Treat cookies as secrets.
  • Never commit / publish cookies to ClawHub.

Recommended local path (ignored by git/publish): - {baseDir}/cache/youtube-cookies.txt (chmod 600)

Notes (safety + reliability)

  • Only accept a YouTube URL or an 11-character video ID.
  • Do not forward arbitrary user-provided flags into the command.
  • If yt-dlp is missing, instruct the user to install it (recommended):
  • install pipx
  • pipx install yt-dlp
  • ensure yt-dlp is on PATH