Workflow discussion #7344

Cyprien-22 · 2023-09-07T13:44:21Z

Cyprien-22
Sep 7, 2023

Hi Guys,

I'm a freelance editor and for one particular client, I do their subtitles for their videos as well.

I've been using the program for a few months now and it's mostly great. However it's the longest part of my editing process.

The primary time consuming part is adjusting the timing of the subtitles, in particular where it splits the subtitles. Often a single word before the end of a sentence or pause. I typically cut and paste the word, then adjust the timing to suit before moving on. It often does it when I split a long subtitle as well.

I tried "Break/split long lines" and it mostly worked well to cut sometimes long subtitles into smaller subtitles when whisper decides to have an essay instead of nicely flowing subtitles. However it leaves a lot of clean up to do.

On a 1hr video length, it takes me about 4hrs on average of adjusting if their language is clear, more if whisper had a hard time understanding their accent. This time includes correcting grammar, spelling and checking names and places have been correctly translated.

I'm curious how long it takes others here to edit the subtitles and if you have any tips on speeding up the process?

The biggest help for me was, using keyboard shortcuts for changing the colour of the subtitles per speaker, but the timing and sentence adjustment takes me the longest.

dgoryeo · 2023-09-28T22:58:07Z

dgoryeo
Sep 28, 2023

What language do you transcribe/translate from? I think the workflow has quite a bit of dependency to the language. I don't need much editing for English audios. But for Japanese I have spent 8hrs for each 1hr transciption.

0 replies

Cyprien-22 · 2023-09-29T00:28:44Z

Cyprien-22
Sep 29, 2023
Author

I transcribe English audio into English subs.

Mostly for podcasts. I manually change the colour for the guest speaker.

The main problem I have is when the transcription generates large character counts per subtitle. Using the Break/split long subtitles it throws the timing off significantly. Using the CMD commands for character limits for transcribing gives strange results that appear very different to the input character limit.

The transcribe also has a habit of ending or starting the subtitle one word shy of a sentence or one word into a sentence.

The word accuracy is very good, the timing and duration less so. If the guest has a heavy accent, I've spent up to 10hrs adjusting for a 1hr podcast.

I don't know why or how the transcribe can vary so much in it's subtitle generation. Sometimes it's rather good requiring minimal adjustments. Other times it's just a mess and re-transcribing or using another model yields similar results on that particular interview.

0 replies

Cyprien-22 · 2023-09-29T01:18:53Z

Cyprien-22
Sep 29, 2023
Author

So I don't know if I'm doing something wrong or need to adjust settings to improve workflow.

0 replies

dgoryeo · 2023-09-29T01:46:43Z

dgoryeo
Sep 29, 2023

You might want to check out stable-ts. It does a good job on keeping accurate timing. I'd recommend to turn the vad and demucs options on.

0 replies

Cyprien-22 · 2023-09-29T03:02:50Z

Cyprien-22
Sep 29, 2023
Author

Vad is on by default?

--vad_filter VAD_FILTER, -vad VAD_FILTER
Enable the voice activity detection (VAD) to filter out parts of the audio without speech.
(default: True)

I can't find the option for demucs, where is it located?

0 replies

Cyprien-22 · 2023-09-29T04:13:42Z

Cyprien-22
Sep 29, 2023
Author

Stable and beta have the same results for me in regards to transcribing to subtitles.

0 replies

LeonCheung · 2023-09-29T09:03:06Z

LeonCheung
Sep 29, 2023

Short answer:

Long answer:

Not sure if we can have kind of a consensus here, but in my opinion, Whipser-like tools can save you a huge typing work, which is 100% true. But it doesn't mean that time can always be saved. Yes, sometimes it could take even more time to fix if you really have to!

For the time being, we have to admit that human work cannot be skipped after Whisper work for sure, maybe the result can get better and better in the future as AI improves, but you still need to check the result carefully many many years ahead (if quality matters). I think everybody loves AND hates Whisper because it indeed can save huge time with very good quality on both text and timecodes, BUT, it's only for some very typical cases, such as those with clear voices, short sentences, and distinctive breaks/gaps. But hey, it's very rare to have, at least to me. XD That's why we have to spend time to further tweak the result, and guess what, sometimes it would take even longer than doing it in the traditional way from scratch...

As a captioner for 10+ years, I highly recommend that you take a collaboration workflow. Normally, you should have a transcriptionist, a transcript reviewer, a captioner, and a caption reviewer in the team, just you happen to play all four roles yourself in your case.

However, I really don't recommend that you do the four jobs in parallel, in that case, believe it or not, 1+1<2, I hope you get my point. The reason is simple, you have to constantly move your hand between the mouse and the keyboard here or there, as well as the cursor focus. Having that said, you could still review the transcript while making timecodes (as long as the transcript is of high quality).

My best practice is to always have the reviewed transcript ready before making captions. A 1-hour video would cost me 2 hours or so for that (aided by paid service plus manual review, which doesn't have to be done in SE). And then making the timecodes for the ready-to-caption text in SE. Normally it would take me another hour (or less) with some tricks and a faster play rate. So, for me, 3 hours in total (from scratch). But in many cases, it would also cost me around 4 hours (or more) on a typical broadcast source because it's usually quite speech-intensive, unlike movies.

However, if you insist on using Whiper in the workflow, and you are confident about the result, you may want to try this way:

Insert an empty line first, long enough to cover the whole video length, then make timecodes by splitting that empty line while playing the video (you can set a shortcut for this too), until all are done.
Select all lines, open the context menu > Selected lines > Audio> Audio to text. Then wait patiently for the result.
Go through the whole thing for quality control.

BTW, as you know, there are tons of customizable shortcuts out there. I'm sure you could find something useful. If not, try convincing our lovely dev to add, and good luck.

Personally, I've been doing some research on how to improve different workflows with SE. It takes time though, but let's see how it goes. Stay tuned. :)

0 replies

Cyprien-22 · 2023-09-29T09:25:30Z

Cyprien-22
Sep 29, 2023
Author

Hi LeonCheung,

Thanks for your input.

I was curious how others worked to see if workflow could be sped up. We could just develop our own methods but sometimes a hive mind can create better results as there can be significant cross over between users.

For me, whisper works rather well and certainly beats hand typing as I'm not quite quick enough to match the conversations.

My workflow currently is as follows:

I get the raw files from my client. I use whisper to generate a raw transcript that I use as reference and sometimes before editing the podcast is more accurate.

I then edit the podcasts with intro, music, pre-roll baked in ad, conversation, outro with music and previously on and send it back to the client for approval.

Once approved I bring it back into SE and use whisper again because I don't know how to import from the raw transcript as the timing gets messed up, and there is now extra content to be done. I'd love to be able to import sections that are used often like baked in adds.

Tweak timing and separate speakers and general clean up. Sometimes whisper gives large chunks, and it takes a while to edit. Sometimes it's super easy and minor tweaks are required. I figure that's just the nature of whisper currently.

I was hoping people may have quicker methods for the clean up.

For guests I set a colour shortcut, highlight the first subtitle, make it yellow so I have reference to go back to, then the last subtitle before the host speaks and highlight them all and bulk change to yellow. This saved me a significant portion of time compared to previously.

The biggest challenge I have is sending words back and forward to the next subtitle to the last subtitle. Just due to the subtitle ending in an odd spot for viewing.

Am I using the wrong software? No idea it's the only software I know. I've only been doing this since June, so very much a rookie.

I'll try your method on the next video. Sounds interesting. Thank you for your input. I'll have to spend more time studying the shortcuts menu to see how I can better use them to save time.

0 replies

Cyprien-22 · 2023-09-29T10:39:58Z

Cyprien-22
Sep 29, 2023
Author

Ok, just updating as crawling through the shortcuts I found some solutions to my problems that others may not be aware of.

Is there an equivalent to "Move text after cursor position to next subtitle and go to next" that goes to the previous subtitle?

0 replies

LeonCheung · 2023-09-29T16:09:32Z

LeonCheung
Sep 29, 2023

Yes, some of them are especially useful if used properly. Here are my most commonly used ones:

The biggest challenge I have is sending words back and forward to the next subtitle to the last subtitle. Just due to the subtitle ending in an odd spot for viewing.

Is there an equivalent to "Move text after cursor position to next subtitle and go to next" that goes to the previous subtitle?
Per your last question: no, or not yet. But you can do it similarly with those in the orange box above. Just hit multiple times for moving multiple words, although they don't work well with languages like CJK yet due to the mechanism under the hood. :( (they should be moved/fetched per character instead)

Hmm, but I think I know what you want to do with it - to tweak the Whisper result, right? In that case, not only do you need to move texts up or down, but you also have to tweak the timecodes accordingly to resync everything. So, a better solution here would be having a shortcut called something like "split at cursor/video position and merge the first part with previous subtitle" although you could do it with two existing shortcuts for now.

Once approved I bring it back into SE and use whisper again because I don't know how to import from the raw transcript as the timing gets messed up, and there is now extra content to be done.
Not sure why and how the timing gets messed up after approval. Why not just send them the version with timecodes? 😂

Am I using the wrong software? No idea it's the only software I know. I've only been doing this since June, so very much a rookie.
Well, there's not a perfect tool for everyone, but you can always pick the perfect one for yourself. SE is the one for me, and hopefully for you as well. :)

I'd love to be able to import sections that are used often like baked in ads.
Maybe this is what you are looking for:

0 replies

Cyprien-22 · 2023-09-29T16:48:47Z

Cyprien-22
Sep 29, 2023
Author

Hmm, but I think I know what you want to do with it - to tweak the Whisper result, right? In that case, not only do you need to move texts up or down, but you also have to tweak the timecodes accordingly to resync everything. So, a better solution here would be having a shortcut called something like "split at cursor/video position and merge the first part with previous subtitle" although you could do it with two existing shortcuts for now.

Yes, to tweak the Whisper result. Sorry I haven't picked up all the lingo yet. I use split text, then sometimes the words don't match the waveform. This means moving text to the previous one or forward. (I guess you say "up" and "down" because of the list. I'm still thinking left and right based on timeline) Adjusting the timing is a mouse drag per subtitle. The most recent project was really bad, the audio seemed clean, but perhaps the guests cadence was hard for Whisper?

Not sure why and how the timing gets messed up after approval. Why not just send them the version with timecodes?

The duration of the project changes. So the timing doesn't align anymore. The raw transcript I generate from the raw files.

After they are edited, extra footage gets added to the video so the duration becomes longer so the timecodes no longer match. There is an added intro, pre roll add, mid roll add, graphics, outro and exit clip. That's the video that gets sent away for approval.

The raw transcript get used. I use Chat GTP with a plugin to reference the transcript via rentry to assist with show notes, and titles ect... It's good to help me brainstorm. This gets done before the edit starts so the thumbnail editor has the text to work with.

Sometimes the client wants further editing or different version of sponsors used. This again changes the duration so until approval, I don't have a working file to start the subtitle process with. Raw file might be 1hr long and the final may be 1hr 10min long.

The podcast my client works on is a bit technical in nature, so I do check every line to ensure correct references get made. Whisper has surprised me a fair bit as some references are obscure to me, but it gets it right almost 75% of the time.

I'll have a play with the "insert subtitle here" and see, because the wording in the sponsor spots don't change.

Most of this is probably user error. But the above shortcut list has already helped to speed me up on the project I was working on this week. I just need to consider the shortcuts to use to reduce that hand off mouse and back again that you mentioned.

I really do appreciate your help. Thank you.

0 replies

LeonCheung · 2023-09-29T17:34:10Z

LeonCheung
Sep 29, 2023

You are welcome. ;)

Yes, to tweak the Whisper result. Sorry I haven't picked up all the lingo yet. I use split text, then sometimes the words don't match the waveform. This means moving text to the previous one or forward. (I guess you say "up" and "down" because of the list. I'm still thinking left and right based on timeline) Adjusting the timing is a mouse drag per subtitle. The most recent project was really bad, the audio seemed clean, but perhaps the guests cadence was hard for Whisper?

Hah, right. I was talking about list view when mentioning up/down. I'm not quite a Whisper user, and normally I prefer to review raw transcript in external text editor, not sure I can answer that, sorry. But I believe the accuracy would get better and better in the future.

The duration of the project changes. So the timing doesn't align anymore.

I see. Yes, I experienced it often as well and I don't like such a workflow either. Maybe you'll find the set start and offset the rest shortcut useful here.

Also, I hope you'll also learn something useful here if you haven't checked it.

0 replies

Cyprien-22 · 2023-10-04T23:34:15Z

Cyprien-22
Oct 4, 2023
Author

Thanks LeonCheung, I had seen many of those videos but I missed some.

I just completed my first project using some shortcuts. While I haven't built up memory of them fully yet, I cut my subtitle adjustment in HALF! I need to tweak them slightly as my left hand was getting a bit cramped reaching some shortcuts. But that's massive progress!

The biggest saver has been grabbing the last word from last, sending last word to next, and grab first word from next. So much faster than, copy, paste, auto break, adjust timing.

0 replies

oep42 · 2023-10-09T16:30:16Z

oep42
Oct 9, 2023

Other that using a version of Whisper that creates the best time codes, and in addition to the suggestions above, I would suggest this:

Try to improve the subtitles created by Whisper as much as possible automatically, before starting to edit them manually.

This is what I usually do:

[1] Run "Tools/Batch convert" with the "srt" file with subtitles, with these 5 options:

Remove text for HI
Fix common errors
Auto balance lines
Apply minimum gap between subtitles
Merge short lines (Do not check "Only merge continuation lines" here.)

[2] Open the result of [1] in two copies of SE, to automatically split lines that are too long, and, after that, to check (and correct, if necessary) each of those automatic splits. Choose in both copies "Tools/Break/Split long lines" (no checkmark at "Split at line breaks"). See, for how to do this splitting, the first comment here. Close the second copy of SE after all splitted subtitles have been checked (and manually corrected if necessary).

[3] Select all subtitles in the list, and click on "Auto Br" to auto balance the lines that have been split in [2]. After that, undo the selection of all subtitles in the list.

[4] Choose "Tools/Beautify time codes". Make sure that "Use FFprobe to extract..." is checked, and that "Snap cues to shot changes" is not checked. ("Edit profile" button: Make sure that the profile/preset is "SDI", which can be recognized by the gap of 4 in "General".) Click on "Extract time codes". When the time codes have been loaded, click on "OK" in the "Beautify time codes" dialog box.

[5] Save the result of [4] as a "srt" file with another name than the earlier version of this "srt" file.

[6] Run (again) "Tools/Batch convert" with the "srt" file that resulted from [5], this time with these 6 options:

Remove text for HI
Fix common errors
Auto balance lines
Apply minimum gap between subtitles
Merge short lines (Do not check "Only merge continuation lines" here.)
Adjust durations ("Ad seconds" = "0.100") [You must re-enable this option for each batch run in which you want to use it.]

[7] Start the actual manual editing/checking/improving of the subtitles with the result from [6].

You may, of course, have preferences other than those incorporated into the approach above. If so, you can incorporate these other preferences into the approach.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow discussion #7344

{{title}}

Replies: 14 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Workflow discussion #7344

Cyprien-22 Sep 7, 2023

Replies: 14 comments

dgoryeo Sep 28, 2023

Cyprien-22 Sep 29, 2023 Author

Cyprien-22 Sep 29, 2023 Author

dgoryeo Sep 29, 2023

Cyprien-22 Sep 29, 2023 Author

Cyprien-22 Sep 29, 2023 Author

LeonCheung Sep 29, 2023

Cyprien-22 Sep 29, 2023 Author

Cyprien-22 Sep 29, 2023 Author

LeonCheung Sep 29, 2023

Cyprien-22 Sep 29, 2023 Author

LeonCheung Sep 29, 2023

Cyprien-22 Oct 4, 2023 Author

oep42 Oct 9, 2023

Cyprien-22
Sep 7, 2023

dgoryeo
Sep 28, 2023

Cyprien-22
Sep 29, 2023
Author

Cyprien-22
Sep 29, 2023
Author

dgoryeo
Sep 29, 2023

Cyprien-22
Sep 29, 2023
Author

Cyprien-22
Sep 29, 2023
Author

LeonCheung
Sep 29, 2023

Cyprien-22
Sep 29, 2023
Author

Cyprien-22
Sep 29, 2023
Author

LeonCheung
Sep 29, 2023

Cyprien-22
Sep 29, 2023
Author

LeonCheung
Sep 29, 2023

Cyprien-22
Oct 4, 2023
Author

oep42
Oct 9, 2023