5/21/2026 at 5:45:03 AM
If someone is interested, this is my supershort zsh/bash scripts that I keep in .zshrc for doing the same thing using plain whisper.cpp, ffmpeg and yt-dlp (`brew install whisper-cpp yt-dlp` for Mac); I output it in vtt format (subtitles) though, but it's easy enough to change it to txt. yt_to_srt() {
local url="$1"
local output_base="$2"
local language="${3:-en}"
yt-dlp -x --audio-format wav --postprocessor-args "-ar 16000" -o "$output_base.wav" "$url"
whisper-cli --language "$language" --model "$WHISPER_MODEL" --split-on-word --max-len 65 --output-vtt --output-file "$output_base" --file "$output_base.wav"
rm "$output_base.wav"
}
file_to_srt() {
local filepath="$1"
local language="${2:-en}"
local filename=$(basename "$filepath")
local filename_no_ext="${filename%.*}"
local output_base="$filename_no_ext"
local temp_wav="$output_base.wav"
ffmpeg -i "$filepath" -vn -acodec pcm_s16le -ar 16000 -ac 1 "$temp_wav"
whisper-cli --language "$language" --model "$WHISPER_MODEL" --split-on-word --max-len 65 --output-vtt --output-file "$output_base" --file "$temp_wav"
rm "$temp_wav"
}
plus additional bootstrap script for large-v3-turbo model from my chez-moi dotfiles: #!/bin/bash
# Download whisper.cpp models from Hugging Face (runs once per machine).
set -euo pipefail
MODELS_DIR="$HOME/whisper-models"
BASE_URL="https://huggingface.co/ggerganov/whisper.cpp/resolve/main"
MODELS=("ggml-large-v3-turbo.bin" "ggml-tiny.bin")
mkdir -p "$MODELS_DIR"
for model in "${MODELS[@]}"; do
if [ ! -f "$MODELS_DIR/$model" ]; then
echo "Downloading $model..."
curl -L --progress-bar -o "$MODELS_DIR/$model" "$BASE_URL/$model"
else
echo "$model already exists, skipping."
fi
done
echo "Whisper models ready at $MODELS_DIR"
by piotrrojek
5/21/2026 at 7:28:49 AM
yt-dlp can download auto-subtitles and regular subtitles, why not do that and fall back to whisper?by ramon156
5/21/2026 at 11:58:21 AM
To be frank I didn't know there's such an option :-)by piotrrojek
5/21/2026 at 1:16:59 PM
In my experience Whisper is several orders of magnitude slower though.by ranger_danger