Whisper AI Tutorial

Table of Contents

I tried out voice-to-text AI and spent an afternoon compiling the information below.

Whisper Desktop and the official open-source Whisper.

Computer specs:

Pros and Cons
#

	Pros	Cons
Whisper Desktop	Convenient, easy to download and set up	Not updated for a long time, poor Spanish transcription accuracy
Official Whisper	High accuracy	Hard to download, no UI

Download the latest zip from GitHub and extract it.

Then go to Hugging Face to download a model (the ggml-medium.bin model is the most stable; q5_0 and q8_0 did not work).

Open the application, choose Model Path, select the downloaded .bin model file, and click OK.

Use Transcribe File to pick an audio file (mp3 or m4a).

In Output Format, choose the output file type and whether to include timestamps.

After setting everything, click Transcribe.

Set up the basic environment (see the above site for a full illustrated guide):
Python 3.12.7; git version 2.48.1.windows.1; Pytorch 2.6.0+cu118; Cuda 11.8
Download and configure ffmpeg
Install Whisper by opening cmd and entering:
```
pip install git+https://github.com/openai/whisper.git
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
```
Once done, the installation is complete. If any problems arise, paste the warning into ChatGPT for further troubleshooting.

In cmd, cd to the folder with your audio (e.g., cd desktop \example)
```
cd desktop
```
Run:
```
whisper filename.mp4 --device cuda
```
The above command uses CUDA and auto-detects the transcription language; the process looks like this:

The official version works well and lets you use the latest models.

【 YouTube AI 上字幕教學｜如何使用免費自動字幕 (逐字稿) 生成軟體 WhisperDesktop｜OpenAI Whisper 教學】posted by 2025,1,23 ( https://notesstartup.com/youtube-ai-subtitle-tutorial/ )

【 OpenAI 免費開源語音辨識系統– Whisper 安裝簡介及原理】posted by M.H. 2023,4,25 ( https://ithelp.ithome.com.tw/articles/10311957 )

Author

David Chang