-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'waveform' must be provided as a (channel, time) torch Tensor. #363
Comments
This might be because VideoLingo splits the input video into many smaller segments, each with a duration of no more than 30 minutes (1800 seconds), and then utilizes Whisper for recognition. However, when your video’s duration is close to 1800 seconds, the second segment generated from the split might contain no audio at all, causing Whisper to fail in recognition. Here is the error message I encountered:
Before an official fix is released, I’d like to offer a temporary solution. In the
to
In that case, simply change the |
note: You will see Progress if working correctly
WhisperX processing error: 'waveform' must be provided as a (channel, time) torch Tensor.
The text was updated successfully, but these errors were encountered: