Purpose
This guide will walk you through installing and using OpenAI’s Whisper to transcribe WAV files locally on Linux.
Pre-Requisites
- A Debian-based Linux distribution should already be installed.
- While other Linux distributions work, this guide focuses on Debian-based distributions.
- You can also use Windows Subsystem for Linux (WSL) on a Windows computer.
- Prepare a WAV file (1-5 minutes in length) for transcription.
- The following dependencies should be installed:
- python3
- Required for running Python packages and scripts.
- pipx
- This is a Python package manager that will be used for installing Whisper into an isolated environment. This will minimize interference with other Python packages.
- ffmpeg
- Required by whisper for processing audio files.
- python3
Recommended
- Although selecting the best GPU for LLM workloads is beyond the scope of this tutorial, the author used the following GPUs to develop this tutorial:
- NVIDIA Quadro RTX 4000 w/8 GB of VRAM
- NVIDIA RTX A2000 w/8GB of VRAM
- If you are making use of a GPU:
- You must ensure that the latest non-free drivers are installed.
- Use nvidia-smi to monitor GPU usage.
Steps to Install OpenAI Whisper
1.) Install openai-whisper using pipx
pipx install openai-whisper
2.) You will likely receive a message indicating the following:
NOTE: ‘/home/username/.local/bin’ is not on your PATH environment variable. These apps will not be globally accessible until your PATH is updated. Run `pipx ensurepath` to automatically add it, or manually modify your PATH in your shell’s config file (i.e. ~/.bashrc).
Issue the following command to add ‘/home/username/.local/bin’ to your PATH environment variable.
pipx ensurepath
3.) Re-launch your terminal session.
4.) Confirm successful whisper installation by initiating the help command.
whisper --help
Steps to Test Transcription Capabilities
Begin transcription of your WAV file. Ensure that your terminal session is in the same directory as your WAV file.
When using a GPU:
whisper your-audio-file.wav --model turbo --device cuda
When using CPU only:
whisper your-audio-file.wav --model turbo --device cpu
- Replace your-audio-file.wav with the name of your WAV file.
- You can change the model size (e.g., ‘large’) based on your system’s performance and transcription speed needs.
- Review OpenAI’s Whisper model card on GitHub for a comparison of each option available.
- The author finds the ‘turbo’ model performs well on 8 GB VRAM GPUs, transcribing a 55-minute WAV file in approximately 10 minutes.
- –device cuda indicates the CUDA cores of your GPU should be utilized. You can remove it for CPU-only transcription, but be prepared for slower transcriptions.