@@ -463,7 +463,7 @@ The following list contains speech corpora supported by this script collection.
463
463
- [German Speechdata Package Version 2 (German, 148 hours)](http://www.repository.voxforge1.org/downloads/de/german-speechdata-package-v2.tar.gz):
464
464
+ Unpack the archive such that the directories `dev`, `test`, and `train` are
465
465
direct subdirectories of `<~/.speechrc:speech_arc>/gspv2`.
466
- + Then run run the script `./gspv2_to_vf .py` to convert the corpus to the VoxForge
466
+ + Then run run the script `./import_gspv2 .py` to convert the corpus to the VoxForge
467
467
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/gspv2`.
468
468
469
469
- [Noise](http://goofy.zamia.org/zamia-speech/corpora/noise.tar.xz):
@@ -474,35 +474,35 @@ The following list contains speech corpora supported by this script collection.
474
474
+ Download the set of 360 hours "clean" speech tarball
475
475
+ Unpack the archive such that the directory `LibriSpeech` is a direct
476
476
subdirectory of `<~/.speechrc:speech_arc>`.
477
- + Then run run the script `./librispeech_to_vf .py` to convert the corpus to the VoxForge
477
+ + Then run run the script `./import_librispeech .py` to convert the corpus to the VoxForge
478
478
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/librispeech`.
479
479
480
480
- [The LJ Speech Dataset (English, 24 hours)](https://keithito.com/LJ-Speech-Dataset/):
481
481
+ Download the tarball
482
482
+ Unpack the archive such that the directory `LJSpeech-1.1` is a direct
483
483
subdirectory of `<~/.speechrc:speech_arc>`.
484
- + Then run run the script `ljspeech_to_vf .py` to convert the corpus to the VoxForge
484
+ + Then run run the script `import_ljspeech .py` to convert the corpus to the VoxForge
485
485
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/lindajohnson-11`.
486
486
487
487
- [Mozilla Common Voice German (German, 140 hours)](https://voice.mozilla.org/en/datasets):
488
488
+ Download `de.tar.gz`
489
489
+ Unpack the archive such that the directory `cv_de` is a direct
490
490
subdirectory of `<~/.speechrc:speech_arc>`.
491
- + Then run run the script `./mozde_to_vf .py` to convert the corpus to the VoxForge
491
+ + Then run run the script `./import_mozde .py` to convert the corpus to the VoxForge
492
492
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/cv_de`.
493
493
494
494
- [Mozilla Common Voice V1 (English, 252 hours)](https://voice.mozilla.org/en/data):
495
495
+ Download `cv_corpus_v1.tar.gz`
496
496
+ Unpack the archive such that the directory `cv_corpus_v1` is a direct
497
497
subdirectory of `<~/.speechrc:speech_arc>`.
498
- + Then run run the script `./mozcv1_to_vf .py` to convert the corpus to the VoxForge
498
+ + Then run run the script `./import_mozcv1 .py` to convert the corpus to the VoxForge
499
499
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/cv_corpus_v1`.
500
500
501
501
- [Munich Artificial Intelligence Laboratories GmbH (M-AILABS) Speech Dataset (English, 147 hours, German, 237 hours)](http://www.m-ailabs.bayern/en/):
502
502
+ Download `de_DE.tgz`, `en_UK.tgz`, `en_US.tgz` ([Mirror](https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/))
503
503
+ Create a subdirectory `m_ailabs` in `<~/.speechrc:speech_arc>`
504
504
+ Unpack the downloaded tarbals inside the `m_ailabs` subdirectory
505
- + Then run run the script `./mailabs_to_vf .py` to convert the corpus to the VoxForge
505
+ + Then run run the script `./import_mailabs .py` to convert the corpus to the VoxForge
506
506
format. The resulting corpus will be written to `<~/.speechrc:speech_corpora>/m_ailabs_en` and `<~/.speechrc:speech_corpora>/m_ailabs_de`.
507
507
508
508
- [VoxForge (English, 75 hours)](http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/):
0 commit comments