Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GERATING DIVERSE AND NATURAL TEXT-TO-SPEECH SAMPLES USING A QUANTIZED FINE-GRAINED VAE AND AUTOREGRESSIVE PROSODY PRIOR #5

Open
supikiti opened this issue Sep 16, 2020 · 1 comment

Comments

@supikiti
Copy link
Owner

supikiti commented Sep 16, 2020

リンク

https://arxiv.org/pdf/2002.03788.pdf

どんなもの?

  • VQ-VAE TTSを提案

先行研究と比べてどこがすごい?

  • 時系列間の潜在変数の変化をモデリングすることにより高品質な音声合成を実現

技術と手法のキモはどこ?

  • 以下の2段階学習で構成されている
  1. VQ-VAEで韻律を離散的な潜在変数として抽出
  2. 韻律の潜在変数とTacotronのEncoderによるembeddingを入力とするARを学習

どうやって有効だと検証した?

  • 客観評価と主観評価により評価

議論はある?

次に読むべき論文

@supikiti
Copy link
Owner Author

@supikiti supikiti added the VAE label Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant