Faster-Whisper

Specific Model: ./scripts/transcribe audio.mp3 --model large-v3-turbo
Word Timestamps: ./scripts/transcribe audio.mp3 --word-timestamps
JSON Output: ./scripts/transcribe audio.mp3 --json
VAD (Silence Removal): ./scripts/transcribe audio.mp3 --vad

High-performance local speech-to-text using faster-whisper.

Setup

Execute the setup script to create a virtual environment and install dependencies. It will automatically detect NVIDIA GPUs for CUDA acceleration.

./setup.sh

Requirements: - Python 3.10 or later - ffmpeg (installed on the system)

Use the transcription script to process audio files.

./scripts/transcribe audio.mp3

No GPU detected: Ensure NVIDIA drivers and CUDA are correctly installed. CPU transcription is significantly slower.
OOM Error: Use a smaller model (e.g., small or base) or use --compute-type int8.