I use Generate-subtitles as an alternative or substitute to YouTube closed captions (CC) since it does not always work as expected.
When creating video contents, it comes very handy to have a high quality generated transcripts to start with. My favorite tools are Subtitle Edit and Aegisub. SE provides a great online version and works with .SRT format which fits great into my Adobe Premiere Pro workflow. It also has built-in Auto Translation and Whisper support. Aegisub works with .ASS format which fits into NixieVideoKit automated workflow.
Today, I’m focusing on installing Generate-subtitles for my mini server.
whisper
Before working with generate-subtitles, we need to install whisper first.
Install requirements if needed
sudo apt install python3 python3-pip ffmpeg
pip install torch torchvision torchaudio
Install whisper
pip install git+https://github.com/openai/whisper.git
or pip install -U openai-whisper
However, I did both install methods and even pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
couldn’t get correct response when whisper -h
.
After some troubleshooting, it ends up installed to the wrong path. So to make things right:
sudo cp -rf /home/$username/.local/bin/* /usr/local/bin
Use a test file to trigger model download and remove test files:
wget https://github.com/openai/whisper/raw/f296bcd3fac41525f1c5ab467062776f8e13e4d0/tests/jfk.flac
whisper jfk.flac --model tiny
rm -rf jfk.*
Refer to the Model Card to choose a model. My 4GB VRAM can only handle up to small
, although I can still use large
with CPU and patient by:
whisper jfk.flac --model large --device cpu --language en
If having problem downloading Large-v2
model, try manually download with these links and put it into place:
cd /home/$username/.cache/whisper/
wget https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
generate-subtitles
The official scripts works great:
# install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.2/install.sh | bash
# setup nvm
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
nvm install 14
nvm use 14
sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp #download yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp # Make executable
git clone https://github.com/mayeaux/generate-subtitles
cd generate-subtitles
npm install
npm start
Now the basic functions are enabled.
Enabling translation
To enable translation:
sudo apt-get install python3.9-dev -y
pip3 install --upgrade distlib
apt-get install pkg-config libicu-dev
pip3 install libretranslate
Run libretranslate --host 192.168.x.x
to start downloading language models and then it will start service on host:5000
so that can be accessed from LAN.
Create config file using nano .env
CONCURRENT_AMOUNT=1
LIBRETRANSLATE='http://192.168.x.x:5000'
UPLOAD_FILE_SIZE_LIMIT_IN_MB=500
MULTIPLE_GPUS=false
FILES_PASSWORD=password
NODE_ENV='development'
Hacks for CPU mode
A dirty workaround to toggle CPU mode by editing nano transcribe/transcribe-wrapped.js
if (multipleGpusEnabled) {
arguments.push('--device', 'cpu');
}
Now change it to MULTIPLE_GPUS=true
in .env
to enable CPU mode when smaller models are not capable and need to use large
.
Although I can add the translation dropdown options, link pasting and do some UI clean up. It is enough for my use case now and I have to stop here for another project.