AndyMelton.net
How to Set Up OpenAI Whisper for Locally Hosted Audio Transcription on Linux
Purpose

This guide will walk you through installing and using OpenAI’s Whisper to transcribe WAV files locally on Linux.

Pre-Requisites
Required

  • A Debian-based Linux distribution should already be installed.
    • While other Linux distributions work, this guide focuses on Debian-based distributions.
    • You can also use Windows Subsystem for Linux (WSL) on a Windows computer.
  • Prepare a WAV file (1-5 minutes in length) for transcription.
  • The following dependencies should be installed:
    • python3
      • Required for running Python packages and scripts.
    • pipx
      • This is a Python package manager that will be used for installing Whisper into an isolated environment. This will minimize interference with other Python packages.
    • ffmpeg
      • Required by whisper for processing audio files.

Recommended

  • Although selecting the best GPU for LLM workloads is beyond the scope of this tutorial, the author used the following GPUs to develop this tutorial:
    • NVIDIA Quadro RTX 4000 w/8 GB of VRAM
    • NVIDIA RTX A2000 w/8GB of VRAM
  • If you are making use of a GPU:
    • You must ensure that the latest non-free drivers are installed.
    • Use nvidia-smi to monitor GPU usage.
Steps to Install OpenAI Whisper

1.) Install openai-whisper using pipx

pipx install openai-whisper

2.) You will likely receive a message indicating the following:

NOTE: ‘/home/username/.local/bin’ is not on your PATH environment variable. These apps will not be globally accessible until your PATH is updated. Run `pipx ensurepath` to automatically add it, or manually modify your PATH in your shell’s config file (i.e. ~/.bashrc).

Issue the following command to add ‘/home/username/.local/bin’ to your PATH environment variable.

pipx ensurepath

3.) Re-launch your terminal session.

4.) Confirm successful whisper installation by initiating the help command.

whisper --help
Steps to Test Transcription Capabilities

Begin transcription of your WAV file. Ensure that your terminal session is in the same directory as your WAV file.

When using a GPU:

whisper your-audio-file.wav --model turbo --device cuda

When using CPU only:

whisper your-audio-file.wav --model turbo --device cpu
  • Replace your-audio-file.wav with the name of your WAV file.
  • You can change the model size (e.g., ‘large’) based on your system’s performance and transcription speed needs.
    • Review OpenAI’s Whisper model card on GitHub for a comparison of each option available.
    • The author finds the ‘turbo’ model performs well on 8 GB VRAM GPUs, transcribing a 55-minute WAV file in approximately 10 minutes.
  • –device cuda indicates the CUDA cores of your GPU should be utilized. You can remove it for CPU-only transcription, but be prepared for slower transcriptions.