Workflow discussion #7344
Replies: 14 comments
-
What language do you transcribe/translate from? I think the workflow has quite a bit of dependency to the language. I don't need much editing for English audios. But for Japanese I have spent 8hrs for each 1hr transciption. |
Beta Was this translation helpful? Give feedback.
-
I transcribe English audio into English subs. Mostly for podcasts. I manually change the colour for the guest speaker. The main problem I have is when the transcription generates large character counts per subtitle. Using the Break/split long subtitles it throws the timing off significantly. Using the CMD commands for character limits for transcribing gives strange results that appear very different to the input character limit. The transcribe also has a habit of ending or starting the subtitle one word shy of a sentence or one word into a sentence. The word accuracy is very good, the timing and duration less so. If the guest has a heavy accent, I've spent up to 10hrs adjusting for a 1hr podcast. I don't know why or how the transcribe can vary so much in it's subtitle generation. Sometimes it's rather good requiring minimal adjustments. Other times it's just a mess and re-transcribing or using another model yields similar results on that particular interview. |
Beta Was this translation helpful? Give feedback.
-
So I don't know if I'm doing something wrong or need to adjust settings to improve workflow. |
Beta Was this translation helpful? Give feedback.
-
You might want to check out |
Beta Was this translation helpful? Give feedback.
-
Vad is on by default? --vad_filter VAD_FILTER, -vad VAD_FILTER I can't find the option for demucs, where is it located? |
Beta Was this translation helpful? Give feedback.
-
Stable and beta have the same results for me in regards to transcribing to subtitles. |
Beta Was this translation helpful? Give feedback.
-
Short answer: Long answer: Not sure if we can have kind of a consensus here, but in my opinion, Whipser-like tools can save you a huge typing work, which is 100% true. But it doesn't mean that time can always be saved. Yes, sometimes it could take even more time to fix if you really have to! For the time being, we have to admit that human work cannot be skipped after Whisper work for sure, maybe the result can get better and better in the future as AI improves, but you still need to check the result carefully many many years ahead (if quality matters). I think everybody loves AND hates Whisper because it indeed can save huge time with very good quality on both text and timecodes, BUT, it's only for some very typical cases, such as those with clear voices, short sentences, and distinctive breaks/gaps. But hey, it's very rare to have, at least to me. XD That's why we have to spend time to further tweak the result, and guess what, sometimes it would take even longer than doing it in the traditional way from scratch... As a captioner for 10+ years, I highly recommend that you take a collaboration workflow. Normally, you should have a transcriptionist, a transcript reviewer, a captioner, and a caption reviewer in the team, just you happen to play all four roles yourself in your case. However, I really don't recommend that you do the four jobs in parallel, in that case, believe it or not, 1+1<2, I hope you get my point. The reason is simple, you have to constantly move your hand between the mouse and the keyboard here or there, as well as the cursor focus. Having that said, you could still review the transcript while making timecodes (as long as the transcript is of high quality). My best practice is to always have the reviewed transcript ready before making captions. A 1-hour video would cost me 2 hours or so for that (aided by paid service plus manual review, which doesn't have to be done in SE). And then making the timecodes for the ready-to-caption text in SE. Normally it would take me another hour (or less) with some tricks and a faster play rate. So, for me, 3 hours in total (from scratch). But in many cases, it would also cost me around 4 hours (or more) on a typical broadcast source because it's usually quite speech-intensive, unlike movies. However, if you insist on using Whiper in the workflow, and you are confident about the result, you may want to try this way:
BTW, as you know, there are tons of customizable shortcuts out there. I'm sure you could find something useful. If not, try convincing our lovely dev to add, and good luck. Personally, I've been doing some research on how to improve different workflows with SE. It takes time though, but let's see how it goes. Stay tuned. :) |
Beta Was this translation helpful? Give feedback.
-
Hi LeonCheung, Thanks for your input. I was curious how others worked to see if workflow could be sped up. We could just develop our own methods but sometimes a hive mind can create better results as there can be significant cross over between users. For me, whisper works rather well and certainly beats hand typing as I'm not quite quick enough to match the conversations. My workflow currently is as follows: I get the raw files from my client. I use whisper to generate a raw transcript that I use as reference and sometimes before editing the podcast is more accurate. I then edit the podcasts with intro, music, pre-roll baked in ad, conversation, outro with music and previously on and send it back to the client for approval. Once approved I bring it back into SE and use whisper again because I don't know how to import from the raw transcript as the timing gets messed up, and there is now extra content to be done. I'd love to be able to import sections that are used often like baked in adds. Tweak timing and separate speakers and general clean up. Sometimes whisper gives large chunks, and it takes a while to edit. Sometimes it's super easy and minor tweaks are required. I figure that's just the nature of whisper currently. I was hoping people may have quicker methods for the clean up. For guests I set a colour shortcut, highlight the first subtitle, make it yellow so I have reference to go back to, then the last subtitle before the host speaks and highlight them all and bulk change to yellow. This saved me a significant portion of time compared to previously. The biggest challenge I have is sending words back and forward to the next subtitle to the last subtitle. Just due to the subtitle ending in an odd spot for viewing. Am I using the wrong software? No idea it's the only software I know. I've only been doing this since June, so very much a rookie. I'll try your method on the next video. Sounds interesting. Thank you for your input. I'll have to spend more time studying the shortcuts menu to see how I can better use them to save time. |
Beta Was this translation helpful? Give feedback.
-
Ok, just updating as crawling through the shortcuts I found some solutions to my problems that others may not be aware of. Is there an equivalent to "Move text after cursor position to next subtitle and go to next" that goes to the previous subtitle? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Yes, to tweak the Whisper result. Sorry I haven't picked up all the lingo yet. I use split text, then sometimes the words don't match the waveform. This means moving text to the previous one or forward. (I guess you say "up" and "down" because of the list. I'm still thinking left and right based on timeline) Adjusting the timing is a mouse drag per subtitle. The most recent project was really bad, the audio seemed clean, but perhaps the guests cadence was hard for Whisper?
The duration of the project changes. So the timing doesn't align anymore. The raw transcript I generate from the raw files. After they are edited, extra footage gets added to the video so the duration becomes longer so the timecodes no longer match. There is an added intro, pre roll add, mid roll add, graphics, outro and exit clip. That's the video that gets sent away for approval. The raw transcript get used. I use Chat GTP with a plugin to reference the transcript via rentry to assist with show notes, and titles ect... It's good to help me brainstorm. This gets done before the edit starts so the thumbnail editor has the text to work with. Sometimes the client wants further editing or different version of sponsors used. This again changes the duration so until approval, I don't have a working file to start the subtitle process with. Raw file might be 1hr long and the final may be 1hr 10min long. The podcast my client works on is a bit technical in nature, so I do check every line to ensure correct references get made. Whisper has surprised me a fair bit as some references are obscure to me, but it gets it right almost 75% of the time. I'll have a play with the "insert subtitle here" and see, because the wording in the sponsor spots don't change. Most of this is probably user error. But the above shortcut list has already helped to speed me up on the project I was working on this week. I just need to consider the shortcuts to use to reduce that hand off mouse and back again that you mentioned. I really do appreciate your help. Thank you. |
Beta Was this translation helpful? Give feedback.
-
You are welcome. ;)
Hah, right. I was talking about list view when mentioning up/down. I'm not quite a Whisper user, and normally I prefer to review raw transcript in external text editor, not sure I can answer that, sorry. But I believe the accuracy would get better and better in the future.
I see. Yes, I experienced it often as well and I don't like such a workflow either. Maybe you'll find the set start and offset the rest shortcut useful here. Also, I hope you'll also learn something useful here if you haven't checked it. |
Beta Was this translation helpful? Give feedback.
-
Thanks LeonCheung, I had seen many of those videos but I missed some. I just completed my first project using some shortcuts. While I haven't built up memory of them fully yet, I cut my subtitle adjustment in HALF! I need to tweak them slightly as my left hand was getting a bit cramped reaching some shortcuts. But that's massive progress! The biggest saver has been grabbing the last word from last, sending last word to next, and grab first word from next. So much faster than, copy, paste, auto break, adjust timing. |
Beta Was this translation helpful? Give feedback.
-
Other that using a version of Whisper that creates the best time codes, and in addition to the suggestions above, I would suggest this: Try to improve the subtitles created by Whisper as much as possible automatically, before starting to edit them manually. This is what I usually do: [1] Run "Tools/Batch convert" with the "srt" file with subtitles, with these 5 options:
[2] Open the result of [1] in two copies of SE, to automatically split lines that are too long, and, after that, to check (and correct, if necessary) each of those automatic splits. Choose in both copies "Tools/Break/Split long lines" (no checkmark at "Split at line breaks"). See, for how to do this splitting, the first comment here. Close the second copy of SE after all splitted subtitles have been checked (and manually corrected if necessary). [3] Select all subtitles in the list, and click on "Auto Br" to auto balance the lines that have been split in [2]. After that, undo the selection of all subtitles in the list. [4] Choose "Tools/Beautify time codes". Make sure that "Use FFprobe to extract..." is checked, and that "Snap cues to shot changes" is not checked. ("Edit profile" button: Make sure that the profile/preset is "SDI", which can be recognized by the gap of 4 in "General".) Click on "Extract time codes". When the time codes have been loaded, click on "OK" in the "Beautify time codes" dialog box. [5] Save the result of [4] as a "srt" file with another name than the earlier version of this "srt" file. [6] Run (again) "Tools/Batch convert" with the "srt" file that resulted from [5], this time with these 6 options:
[7] Start the actual manual editing/checking/improving of the subtitles with the result from [6]. You may, of course, have preferences other than those incorporated into the approach above. If so, you can incorporate these other preferences into the approach. |
Beta Was this translation helpful? Give feedback.
-
Hi Guys,
I'm a freelance editor and for one particular client, I do their subtitles for their videos as well.
I've been using the program for a few months now and it's mostly great. However it's the longest part of my editing process.
The primary time consuming part is adjusting the timing of the subtitles, in particular where it splits the subtitles. Often a single word before the end of a sentence or pause. I typically cut and paste the word, then adjust the timing to suit before moving on. It often does it when I split a long subtitle as well.
I tried "Break/split long lines" and it mostly worked well to cut sometimes long subtitles into smaller subtitles when whisper decides to have an essay instead of nicely flowing subtitles. However it leaves a lot of clean up to do.
On a 1hr video length, it takes me about 4hrs on average of adjusting if their language is clear, more if whisper had a hard time understanding their accent. This time includes correcting grammar, spelling and checking names and places have been correctly translated.
I'm curious how long it takes others here to edit the subtitles and if you have any tips on speeding up the process?
The biggest help for me was, using keyboard shortcuts for changing the colour of the subtitles per speaker, but the timing and sentence adjustment takes me the longest.
Beta Was this translation helpful? Give feedback.
All reactions