I use Generate-subtitles as an alternative or substitute to YouTube closed captions (CC) since it does not always work as expected.

When creating video contents, it comes very handy to have a high quality generated transcripts to start with. My favorite tools are Subtitle Edit and Aegisub. SE provides a great online version and works with .SRT format which fits great into my Adobe Premiere Pro workflow. It also has built-in Auto Translation and Whisper support. Aegisub works with .ASS format which fits into NixieVideoKit automated workflow.

Today, I’m focusing on installing Generate-subtitles for my mini server.


Before working with generate-subtitles, we need to install whisper first.

Install requirements if needed

sudo apt install python3 python3-pip ffmpeg

pip install torch torchvision torchaudio

Install whisper

pip install git+https://github.com/openai/whisper.git

or pip install -U openai-whisper

However, I did both install methods and even pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git couldn’t get correct response when whisper -h.

After some troubleshooting, it ends up installed to the wrong path. So to make things right:

sudo cp -rf /home/$username/.local/bin/* /usr/local/bin

Use a test file to trigger model download and remove test files:

wget https://github.com/openai/whisper/raw/f296bcd3fac41525f1c5ab467062776f8e13e4d0/tests/jfk.flac

whisper jfk.flac --model tiny

rm -rf jfk.*

Refer to the Model Card to choose a model. My 4GB VRAM can only handle up to small, although I can still use large with CPU and patient by:

whisper jfk.flac --model large --device cpu --language en

If having problem downloading Large-v2 model, try manually download with these links and put it into place:

cd /home/$username/.cache/whisper/

wget https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt


The official scripts works great:

# install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.2/install.sh | bash

# setup nvm
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

nvm install 14
nvm use 14

sudo curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp #download yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp  # Make executable

git clone https://github.com/mayeaux/generate-subtitles
cd generate-subtitles
npm install
npm start

Now the basic functions are enabled.

Enabling translation

To enable translation:

sudo apt-get install python3.9-dev -y
pip3 install --upgrade distlib
apt-get install pkg-config libicu-dev
pip3 install libretranslate

Run libretranslate --host 192.168.x.x to start downloading language models and then it will start service on host:5000 so that can be accessed from LAN.

Create config file using nano .env


Hacks for CPU mode

A dirty workaround to toggle CPU mode by editing nano transcribe/transcribe-wrapped.js

      if (multipleGpusEnabled) {
          arguments.push('--device', 'cpu');

Now change it to MULTIPLE_GPUS=true in .env to enable CPU mode when smaller models are not capable and need to use large.

Although I can add the translation dropdown options, link pasting and do some UI clean up. It is enough for my use case now and I have to stop here for another project.