You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May I ask why the RTF of TTS is only 0.09 for a 12-seconds sentence? I use fastspeech2_HIFiGAN model and GPU is A2000 (8.0 capability). I thought it should be 50x speedup at least. Because the paper of fastpeech2 says it has 50x than transformer and HifiGAN says it speed up 1000x. So can anyone tells me what's wrong?
Thank you!
The text was updated successfully, but these errors were encountered:
Based on my experiments, it should be a bit more faster.
Using Nvidia T4, CFS2 + HiFiGAN V1 resulted in RTF = 0.008 (250 utts averaged.)
Could you paste your pseudo code to calculate RTF?
I tests 3 parts of the inference for a Chinese 24s sentence: part 1: preprocess : 387ms text = self.preprocess_fn("<dummy>", dict(text=text))["text"] part2: CFS2 model: 701ms
part3: HiFiGAN: 24ms
RTF: sum-parts / 24s = 0.046(yes, for longer sentences, RTF is faster because of FastSpeech)
It seems the preprocess&CFS2 costs most of the time.
May I ask why the RTF of TTS is only 0.09 for a 12-seconds sentence? I use fastspeech2_HIFiGAN model and GPU is A2000 (8.0 capability). I thought it should be 50x speedup at least. Because the paper of fastpeech2 says it has 50x than transformer and HifiGAN says it speed up 1000x. So can anyone tells me what's wrong?
Thank you!
The text was updated successfully, but these errors were encountered: