Transcribe audio files and YouTube videos using OpenAI's Whisper API with automatic subtitle generation.
This application transcribes any audio file or YouTube video into text with timestamps. Upload an MP3, WAV, or other audio file (or provide a YouTube URL) and receive back a full transcript, plain text, and subtitle files. The system handles files up to 25MB automatically by splitting them into chunks for processing.
Before you begin, you'll need:
- Python 3.8 or higher - Check with
python --version - FFmpeg - Required for audio processing
- OpenAI API Key - Required for transcription
Use the provided setup scripts for automatic installation:
setup.batchmod +x setup.sh
./setup.shThese scripts will:
- Check for Python and FFmpeg
- Create a virtual environment
- Install all dependencies
- Help configure your OpenAI API key
- Create necessary directories
- Generate a run script for easy launching
After setup, run the application with:
- Windows:
run.bat - Mac/Linux:
./run.sh
If you prefer to set up manually or the scripts don't work for your system:
git clone https://github.com/yourusername/youtube-whisper.git
cd youtube-whisperOr download and extract the ZIP file.
First, create a virtual environment to keep dependencies isolated:
# Create virtual environment
python -m venv venv
# Activate it
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activateThen install the required packages:
pip install -r requirements.txtOr install manually:
pip install flask openai yt-dlp pydubFFmpeg is required for audio processing.
Check if you have it:
ffmpeg -versionIf not installed:
Windows:
- Download from ffmpeg.org ↗
- Extract the ZIP to
C:\ffmpeg - Add
C:\ffmpeg\binto your system PATH:- Right-click "This PC" → Properties → Advanced System Settings
- Click "Environment Variables"
- Under "System Variables", find "Path", click Edit
- Add
C:\ffmpeg\bin - Click OK and restart your terminal
- Detailed guide: FFmpeg Windows Installation ↗
Mac:
# Using Homebrew (install from brew.sh first)
brew install ffmpegLinux (Ubuntu/Debian):
sudo apt update
sudo apt install ffmpegFor other systems, see FFmpeg Download Page ↗
You need an API key from OpenAI to use the Whisper transcription service.
Get your API key:
- Go to OpenAI API Keys ↗
- Sign in or create an account (requires payment method)
- Click "Create new secret key"
- Copy the key that starts with
sk- - Save it somewhere safe - you won't see it again
- See OpenAI API Quickstart ↗ for detailed setup
Set the API key in your environment:
Windows (Command Prompt):
set OPENAI_API_KEY=sk-your-key-hereWindows (PowerShell):
$env:OPENAI_API_KEY="sk-your-key-here"Mac/Linux:
export OPENAI_API_KEY="sk-your-key-here"To make it permanent:
- Windows: Add it to your system environment variables
- Mac/Linux: Add the export line to
~/.bashrcor~/.zshrc
Verify it's set:
# Mac/Linux:
echo $OPENAI_API_KEY
# Windows:
echo %OPENAI_API_KEY%The application needs directories to store files:
mkdir -p data/audio data/transcripts data/subtitlesOn Windows:
mkdir data\audio
mkdir data\transcripts
mkdir data\subtitles-
Make sure your virtual environment is activated:
# Windows: venv\Scripts\activate # Mac/Linux: source venv/bin/activate
-
Start the application:
python app.py
-
Open your browser to:
http://localhost:5000 -
Use the interface:
- Select "Audio File" or "YouTube URL" from the dropdown
- For audio files: Upload any audio file up to 25MB
- For YouTube: Paste the video URL
- Select the language
- Click "Process"
- Download your results (JSON, TXT, or SRT)
For age-restricted, private, or member-only YouTube videos, you need to provide your browser cookies.
-
Install a browser extension for cookie export:
Recommended (exports directly in Netscape format):
- Chrome/Edge: Get cookies.txt LOCALLY ↗ - Open source, safe alternative
- Firefox: cookies.txt ↗
Alternative options:
- Cookie-Editor ↗ - Works on Chrome, Firefox, Edge (select Netscape export format)
- EditThisCookie ↗ - Exports in JSON format (requires conversion)
-
Export your YouTube cookies:
- Go to YouTube.com and sign in to your account
- Navigate to the private video you want to download
- Click the cookie extension icon in your browser
- For "Get cookies.txt LOCALLY": Click the download button
- For "Cookie-Editor": Click Export → Select "Netscape" format
- For "EditThisCookie": Click Export (you'll get JSON format - see note below)
- Copy all the text that appears
-
Use the cookies in the application:
- Select "YouTube URL" as input type
- Paste your YouTube URL
- Paste the entire cookie text into the "Cookies" field
- Process as normal
Note about cookie formats:
- yt-dlp requires Netscape format (starts with
# Netscape HTTP Cookie File) - If your extension exports JSON format, you'll need to convert it or use a different extension
- "Get cookies.txt LOCALLY" is recommended as it exports directly in the correct format
Important notes about cookies:
- Cookies contain your login session - keep them private
- They expire when you log out of YouTube
- If videos stop working, export fresh cookies
- Never share your cookies file with others
- Cookie format requirements: See yt-dlp cookie documentation ↗
- Must be in Netscape format (first line:
# Netscape HTTP Cookie File) - Some extensions export JSON format which needs conversion
- Must be in Netscape format (first line:
flowchart TD
A[Input Audio/URL] --> B{Source Type}
B -->|Audio File| C[Upload File]
B -->|YouTube URL| D[Download Audio]
C --> E{File Size Check}
D --> E
E -->|Under 25MB| F[Send to Whisper API]
E -->|Over 25MB| G[Split into Chunks]
G --> H[Transcribe Each Chunk]
H --> I[Merge Transcripts]
F --> J[Generate Outputs]
I --> J
J --> K[JSON + TXT + SRT Files]
The process:
- Audio is extracted from YouTube or uploaded directly
- Large files are automatically split into 10-minute chunks
- Each chunk is transcribed using OpenAI's Whisper API
- Timestamps are preserved and adjusted when merging
- Three output formats are generated with full timestamp data
OpenAI charges for Whisper API usage:
- Current pricing: See OpenAI API Pricing ↗ (Audio models section)
- At time of writing: $0.006 per minute of audio
- Example: A 1-hour video costs ~$0.36, a 10-minute video costs ~$0.06
- You can set spending limits in your OpenAI account dashboard ↗
The application generates three file types:
- JSON - Complete transcript with timestamp data for each segment
- TXT - Plain text transcript without timestamps
- SRT - Subtitle file compatible with video editors and players (SRT format spec ↗)
Files are saved in:
data/
├── transcripts/ # JSON and TXT files
└── subtitles/ # SRT files
- Make sure you've set the API key in your environment
- Check it's set:
echo $OPENAI_API_KEY(Mac/Linux) orecho %OPENAI_API_KEY%(Windows) - Make sure you activated the virtual environment
- Check your internet connection
- For private videos, make sure your cookies are fresh
- Try with a public YouTube video to test
- Make sure you activated the virtual environment
- Install dependencies:
pip install -r requirements.txt
- Install FFmpeg (see Step 3 above)
- Restart your terminal after installation
- Check it's in PATH:
ffmpeg -version
- Check Python version:
python --version(needs 3.8+) - Make sure port 5000 is not in use
- Try a different port:
python app.py --port 5001
- Files over 25MB are split into chunks
- Each 10-minute chunk takes 30-60 seconds typically
- Check the processing log in the browser for progress
youtube-whisper/
├── app.py # Main application
├── requirements.txt # Python dependencies
├── README.md # This file
├── .gitignore # Git ignore rules
└── data/ # Generated files
├── audio/ # Temporary audio files
├── transcripts/ # JSON and TXT outputs
└── subtitles/ # SRT subtitle files
- Your audio files are processed locally and sent only to OpenAI's API
- Temporary audio files are deleted after processing (unless KEEP_AUDIO_FILES is set)
- YouTube cookies contain your login session - never share them
- Your OpenAI API key is never stored by the application