-
Notifications
You must be signed in to change notification settings - Fork 19
Add chapters to video transcripts #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add create_chapters() function to generate YouTube-style timestamps using OpenAI - Integrate chapters section into markdown output between summary and transcript - Add comprehensive rate limiting to avoid YouTube API quota issues - Implement get_video_transcript_with_retry() with exponential backoff - Add robust error handling for quota exceeded and API failures - Improve transcript validation and filtering ([Music], [Applause], etc) - Fix Japanese language code from 'jp' to 'ja' - Increase batch size from 10 to 50 for better efficiency - Add progress indicators and better logging throughout
- Document new chapters/timestamps generation feature - Add comprehensive usage examples with command-line options - Document output format and file structure - Add Features & Reliability section covering rate limiting and error handling - Clarify OpenAI API key requirements and AI-enhanced features - Document multi-language transcript support
Update pydantic_core from 2.39.0 to 2.33.2 to match the version required by pydantic 2.11.9, resolving pip installation error.
Major changes: - Upgrade youtube-transcript-api from 0.6.3 to 1.2.3 (fixes empty response issue) - Update code to use new 1.x API (YouTubeTranscriptApi().fetch()) - Remove deprecated fallback to old static methods - Enhance error handling for 429 rate limits with 60-120s delays - Detect XML parse errors as potential rate limiting - Increase max retries from 3 to 5 attempts Root cause: YouTube changed their API and version 0.6.3 was returning empty responses (not rate limiting). The library can now successfully fetch transcripts and generate chapters.
Reduced unnecessary delays now that transcript API is fixed: - Remove initial 5-15s startup delay - Reduce pagination delays from 3-8s/5-15s to 1-3s - Reduce inter-video delays from 10-30s to 2-5s - Reduce API call separation from 10-20s to 2-4s Keep essential protections: - YouTube Data API quota error handling (60s retry) - 429 rate limit detection and handling (60-120s retry) - XML parse error detection - Small delays to avoid API hammering Result: ~3-5x faster processing while maintaining API safety.
- Detect IP block errors specifically (vs rate limiting) - Stop retries immediately when IP is blocked (no point retrying) - Add comprehensive troubleshooting section to README - Provide clear workaround options for users - Import TooManyRequests exception for better error handling IP blocks are different from rate limits and require different solutions like waiting 24-48 hours, switching networks, or using cookie auth.
|
@avillela , @danielgblanco , and @reese-lee , would you take a look at this PR when you get a chance? Thank you! |
reese-lee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this. It's interesting to see the differences between some of the old AI-generated transcripts and the new one.
I think it works in general for most videos, but I did notice that with the Humans of OTel interviews, it summarizes everyone's thoughts instead of leaving them individual, whereas the whole point of a video like that IS to showcase the individuals.
|
Thanks @reese-lee , yeah, I think human-generated summaries/chapters are still the best... but in my experience they often don't get done. I think the AI-generated ones are a good starting point at least! Some videos will probably still be exceptions, and it's hard to include that in a general prompt for all videos. |
|
Hi @nicolevanderhoeven, we reviewed some of the summarized transcripts, and have a couple questions:
|
- Add clean_ai_preamble() function to remove conversational preamble lines - Remove phrases like 'Sure! Here are the key moments...' from chapter output - Preserve only actual timestamp lines in generated transcripts
- Insert chapter timestamps inline in cleaned transcripts at matching text positions - Add video duration constraint to prevent AI from generating timestamps beyond video length - Improve transcript cleanup to preserve exact wording and speaker names - Add post-processing validation to verify chapter timestamps match content - Limit timestamp corrections to ±60 second window around original time - Use window-based matching with chapter titles for accurate timestamp placement - Lower AI temperature for more accurate and deterministic results
- Move chapter limit to CRITICAL INSTRUCTIONS section - Use stronger mandatory language (MUST, NO MORE than 10) - Add reminder at end of prompt to reinforce limit - Prevents AI from generating 19-24 chapters per video
- Replace segment-based windowing with simpler 10-second intervals - Remove complex window_key logic and seen_times tracking - Create windows directly at 10-second intervals within search range - Makes timestamp finding more predictable and easier to debug
- Test timeline building with various intervals and edge cases - Test filtering of [Music] and [Applause] markers - Test timestamp parsing and formatting utilities - Test roundtrip conversions and precision handling
- Add venv/, .venv/, env/, ENV/ directories - Add .env file (for API keys/secrets) - Prevents accidentally committing large virtual environments
|
@reese-lee Thanks for reviewing! I've just done another pass to address your comments.
Sure is! I've just added it here.
Oops, I just put in some logic to add the timestamps of the chapters within the transcript itself, but now I'm rereading this and thinking that you meant the descriptions, not just the timestamps. Before I change this, I just wanted to double-check what you're asking for here. Currently (here's an example of a generated transcript, there is a chapter at the beginning ( Would you prefer for that line to look like: ### Guest introduction: Diana
**Reese:** Diana, welcome. And thank you guys.instead? Do you want the timestamp there at that point too, or just the heading for the chapter description? |
|
@nicolevanderhoeven I think the example you had there would be great. Having the chapters in the transcripts as different sections would be great :) So, instead of Something like this? |
| videos = [] | ||
| next_page_token = None | ||
| page_count = 0 | ||
| max_pages = 1 # Limit to 1 page (50 videos max) to avoid pagination issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only 1, and we want to limit this scrape to 50 videos for the whole channel, then do we need the while loop? If getting all videos for a channel is challenging, I'd opt for removing this method (and associated docs) and thinking of how we can add it safely.
| if i < len(videos) - 1: # Don't sleep after the last video | ||
| delay = random.uniform(2, 5) # Random delay between 2-5 seconds | ||
| print(f"Waiting {delay:.1f} seconds before next video...") | ||
| time.sleep(delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to wait? Is it related to limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is a proactive delay to avoid hitting the YouTube rate limits. It might not be such a big deal if people are just running the script to fetch a few new videos, but I ran into rate limits a lot when regenerating the transcripts for 44 videos. I recommend keeping the delay even if it does slow down generation.
|
There is quite a lot of code related to text transformation contributed on this PR. Considering we already have I don't have access to ChatGPT to test this, but I've created a Gemini Gem with the same system prompt we're currently using here, and added the following to the prompt: With that, I got the following result with the latest OTel Night in Berlin: I'm not against having any necessary code here, but considering we already use ChatGPT for the transcript cleanup, I think we can use it to create these chapters and link back to the seconds? |
- Add timeline skeleton to chapter generation prompt showing actual timestamps every 30s - Remove skip logic for 00:00:00 chapter heading in transcript insertion - Reuse existing time-to-text mapping logic for consistency - Fixes issue where chapter timestamps could be off by several minutes
- Pass video summary to chapter generation to guide topic selection - Add explicit guidance to skip small talk not mentioned in summary - Fix timeline sampling to cover entire video duration dynamically * Was limited to 40 samples (20 minutes) regardless of video length * Now uses ~55 samples distributed across full video duration * Sample interval adjusts based on video length (min 15s) - Update prompt to emphasize reviewing ENTIRE video before selecting chapters - Add better logging showing sample count, interval, and video duration - Fixes issues with: * Chapters for unimportant small talk (e.g., weather discussion) * Missing chapters in second half of longer videos * AI only seeing beginning of video
- Remove unused TooManyRequests import and defensive fallback - Add TestYouTubeAPIErrorHandling class to validate exception API contract - Tests ensure YouTube API exceptions are importable and catchable - Will catch breaking changes if youtube-transcript-api is upgraded
…ards - Add max_pages parameter (default 100) to prevent infinite loops - Add max_retries (3) for quota exceeded errors - Track page_count and retry_count for better control flow - Make loop condition explicit: while page_count < max_pages - Reset retry_count on successful requests - Raise RuntimeError when max retries exceeded This addresses reviewer feedback about the risks of using while True loops.
- Extract video ID fetching logic into _fetch_playlist_video_ids - Extract video details fetching logic into _fetch_video_details_batch - Simplify main get_playlist_videos function to coordinate the two helpers - Each function now has a single, clear responsibility - Improve code readability and maintainability
- Removed 200ms delay in _fetch_video_details_batch - Removed 1-3 second delays in _fetch_playlist_video_ids and get_channel_videos - Removed 2-4 second delay between playlist and channel fetches in main - Fixed retry bug in _fetch_video_details_batch where continue would skip to next batch instead of retrying the failed one - Added proper retry loop with max_retries tracking per batch - 403 quota handlers now properly catch and retry after 60 second wait
- Add explanatory comment in transcripts.py explaining that the 2-5s delay prevents YouTube rate limiting (429 errors) - Update README to accurately reflect actual delays used (2-5s, not 10-30s) - Clarify distinction between proactive delays and reactive retry waits
- Condense multiline IP blocking error messages to 2 lines - Shorten rate limiting error messages - Simplify XML parse error messages - All changes reference README for detailed troubleshooting
- Instruct AI to output only the chapter list without preamble text - Prevent conversational phrases like 'Here are the chapters' or 'Sure, I'll help' - Addresses reviewer feedback to handle this in the prompt rather than post-processing
- Extract _build_time_to_text_mapping() for building time-to-text mapping - Extract _get_window_texts() for getting text from time windows - Extract _extract_key_words() for keyword extraction with filtering - Extract _calculate_line_score() for scoring line matches - Extract _find_best_insertion_line() for finding best insertion points - Extract _build_transcript_with_chapters() for final transcript assembly - Main function now acts as orchestrator (36 lines, down from 113) - Fix syntax error on line 318 (invalid character in return statement) Benefits: improved testability, readability, and maintainability
|
Agree with @danielgblanco. Don't think we need this atm. |
This PR adds chapters generation to the video transcript Python script originally added here, adding a new function to use OpenAI to create chapters (timestamped sections of the video). When added to the YouTube description of a video, these chapters allow viewers to skip to relevant portions of the video and also give everyone a better idea of what the video entails.
In addition, I made the following improvements to video transcription:
env.exampleto show what the.envfile should look like.