SkillHub

video-understanding

v1.1.0

Analyze videos with Google Gemini multimodal AI. Download from any URL (Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, 1000+ sites) and get transcripts, descriptions, and answers to questions. Use when asked to watch, analyze, summarize, or transcribe a video, or answer questions about video co...

Sourced from ClawHub, Authored by bill492

Installation

Please help me install the skill `video-understanding` from SkillHub official store. npx skills add bill492/video-understanding

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

  • yt-dlpbrew install yt-dlp / pip install yt-dlp
  • ffmpegbrew install ffmpeg (for merging video+audio streams)
  • GEMINI_API_KEY environment variable

Default Output

Returns structured JSON: - transcript — Verbatim transcript with [MM:SS] timestamps - description — Visual description (people, setting, UI, text on screen, flow) - summary — 2-3 sentence summary - duration_seconds — Estimated duration - speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

uv run {baseDir}/scripts/analyze_video.py "<video-url>"

Ask a question (adds "answer" field)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"

Override prompt entirely

uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw

Download only (no analysis)

uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4

Options

Flag Description Default
-q / --question Question to answer (added to default fields) none
-p / --prompt Override entire prompt (ignores -q) structured JSON
-m / --model Gemini model gemini-2.5-flash
-o / --output Save output to file stdout
--keep Keep downloaded video file false
--download-only Download only, skip analysis false
--max-size Max file size in MB 500
--raw Raw text output instead of JSON false

How It Works

  1. YouTube URLs → Passed directly to Gemini (no download needed)
  2. All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
  3. Gemini analyzes video with structured prompt → returns JSON
  4. Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

  • Use -q for targeted questions on top of the full analysis
  • YouTube is fastest (no download step)
  • Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
  • The script auto-installs Python dependencies via uv