Easily run OpenAI’s Whisper to transcribe speech

Whisper is an OpenAI model that approaches human level accuracy in speech recognition. Check Open AI’s blog for more information.

As the vast majority of ML projects, the code for Whisper is in Python, which is a bit of a pain to run. Specially so when there are external dependencies to take care of (as is the case with Whisper).

Thankfully, dependency management is just the problem that Nix aims to solve. With a bit of help from this example, I wrote a Nix thingy that allows you to easily tame all the dependencies and run Whisper easily.

After installing Nix and enabling Flakes, simply clone this fork of Whisper. Then, inside the cloned directory, run nix flake develop. This will download all the necessary dependencies (Python or otherwise). Finally, to actually run the transcriber, run python -m whisper --model base <audio file>. This will run the base model of Whisper, which has 74 million parameters and is able to run about 1GB of VRAM. Other than base, there are 4 more models available, each aiming for a different balance of size, accuracy, and speed. Check the README on Whisper’s repo for more on the models that are available.