Replies: 13 comments
-
Oh, I just figured one big flaw, and that is: if I have a video an hour long, for example, and I have a single mistake, or two (typo, or whatever, TTS did a poor job on some part), I'll have to redo everything from scratch. And that is:
So, how about optimized interface that would work in this fashion: If we messed something, or something got messed up in the process with interpunction and pauses, we can make changes, and refresh TTS only on that sentence. We should be able to play everything from SE, inspect how everything sounds, perfect it, and when we are done, SE should do the magic of merging everything in a single audio file, like it currently does. It sounds and looks much better and simpler in my head, hopefully you get the grasp of what I was saying. :D Oh, and I am honored to be mentioned in *FIXED part in updates :) |
Beta Was this translation helpful? Give feedback.
-
What do you mean by I think line breaks should be removed in normal situations before doing text to speech... The list of voices for ElevenLabs will be in the It's a good idea with a preview before merging, but somewhat complex too. But perhaps something where voices can be removed. |
Beta Was this translation helpful? Give feedback.
-
Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.5/SubtitleEditBeta.zip |
Beta Was this translation helpful? Give feedback.
-
Oh, wow, that was blazing fast, and it works 👍 Rough, but does the job. I also found two, hmmm, let's say 'missing options'.
I mean, if subtitle has a line break/two lines, last word from line 1 and first word from line 2 are being seen as one word.
Or ignore line breaks and process like there is no line breaks automatically? This way we don't have to edit subtitle to get lines with no breaks. EDIT: I've just noticed that there's some strange bug in 'regenerate' option, not sure how to reproduce it or give a 'recipe' for reproduction of the issue, but when I went and edited one line that I've tried to regenerate, only by removing line break and creating a single line sentence, that part became silent, and I couldn't regenerate it again no matter what. I've tried shortening entire line to a single word, no difference, I've tried changing the voice, nope. Just stays silent. That was with ElevenLabs engine. Yes I had enough credits. With Piper, it works good, but quality is nowhere near ElevenLabs. Could be related to how you ask EL for that single line? Tried multiple times, edit doesn't do well at all with ElevenLabs, latest I had was that sound of all other lines is glitching, going too fast. Also noticed that Piper likes to randomly hard cut last letter in last word, around 50-100ms at the end, and adds a pop sound. Regenerate solves that problem. :) |
Beta Was this translation helpful? Give feedback.
-
Well, that would be incredibly weird because they have 1000s of voices, and only few dozens is pulled from their library. If I enter my API code, and pull voices manually through their interface, I get 3 new voices from the VoiceLab that can be found in their website, that I have picked from 1000s of voices. But in SE it doesn't pull those voices. I've tried deleting eleven-labs-voices.json so that it can pull new one, but still nothing in there. It is creating one and the same file, doesn't refresh it. If I create response manually, and copy result in elevenlabs json, I get those 3 extra voices. |
Beta Was this translation helpful? Give feedback.
-
Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.5/SubtitleEditBeta.zip Continuing a workflow could be done by adding more and more voices. Can you try to download and overwrite the eleven-labs-voices.json file with the file from the api endpoint - does that give more voices?
Is the re-generating still working for this problem? (it could also be caused by silence trimming made by SE) |
Beta Was this translation helpful? Give feedback.
-
Yes, nice addition.
I am not sure if I understand correctly what you wrote, but I'd like to be able to load the project, and just keep where I left off, fixing and regenerating the lines that I feel are awkwardly accented, or having weird gurgling in voice, or similar. Which is strange, because Piper has no issues with regenerating sentences (EDIT: yes it does, here's the error when I tried to do a final mix of audio). ElevenLabs is freezing on regenerate, and regenerate function is useless for that part at the moment.
Yes, I can and I did, but it's all one single line for some reason, and yes, it gives me three more voices that I've created. But regenerate on ElevenLabs is still crashing when I try to edit something. |
Beta Was this translation helpful? Give feedback.
-
Beta updated with a few minor fixes - it's should be possible to re-generate with elevenlabs https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.5/SubtitleEditBeta.zip |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Could be for version two... will take some time to develop.
Hm, I cannot re-create this. |
Beta Was this translation helpful? Give feedback.
-
Yes, ELabs regenerating single lines and merging them together now works without errors, as far as I've tested.
It would be of great help to see 'regenerate' window opened as it is, but without any TTS generated, so we can pick if we want a single line generated, several lines, or we want everything. This improves usability scenarios through the roof. We could play with different voices, do voiceovers on entire movies and do complex tasks, returning back to generate that single line of text that wasn't perfect... |
Beta Was this translation helpful? Give feedback.
-
If you use ASSA format with actors, you can map actors to voices: |
Beta Was this translation helpful? Give feedback.
-
You misunderstood me. The point was having 'Review audio clips' window opened, and no audio being generated. At the moment, we can change voices per line even without ASSA, which is supercool. |
Beta Was this translation helpful? Give feedback.
-
Using 4.0.5 beta, and absolutely blown away with TTS integrated in the program. It's insanely close to what I've desired, almost like you've been reading my mind on how it should work. Ok, 'nuff of this 'magic' stuff, now teh issues... :)
Not sure if it's doing it on other languages or engines, but here's what is going on, in short, TTS doesn't respect line breaks.
ElevenLabs TTS
American Rachel
best heard when 'the' is at the end of the first row,
Example
'We first washed the
orange with baking soda.'
I also have a question, since I am using free version of ElevenLabs just to test the quality, does selection of voices and languages increase with paid account, and can we use our own trained voice in the software?
13.mp4
Beta Was this translation helpful? Give feedback.
All reactions