Skip to content

Commit de715b3

Browse files
authored
Add script for deep speech benchmark (#4750)
* Add script for benchmark * Add random seed
1 parent 3158727 commit de715b3

File tree

3 files changed

+68
-5
lines changed

3 files changed

+68
-5
lines changed

research/deep_speech/README.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,9 @@ or
3636
pip install -r requirements.txt
3737
```
3838

39-
### Download and preprocess dataset
39+
### Run each step individually
40+
41+
#### Download and preprocess dataset
4042
To download the dataset, issue the following command:
4143
```
4244
python data/download.py
@@ -46,7 +48,7 @@ Arguments:
4648

4749
Use the `--help` or `-h` flag to get a full list of possible arguments.
4850

49-
### Train and evaluate model
51+
#### Train and evaluate model
5052
To train and evaluate the model, issue the following command:
5153
```
5254
python deep_speech.py
@@ -59,3 +61,9 @@ Arguments:
5961

6062
There are other arguments about DeepSpeech2 model and training/evaluation process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions.
6163

64+
### Run the benchmark
65+
A shell script [run_deep_speech.sh](run_deep_speech.sh) is provided to run the whole pipeline with default parameters. Issue the following command to run the benchmark:
66+
```
67+
sh run_deep_speech.sh
68+
```
69+
Note by default, the training dataset in the benchmark include train-clean-100, train-clean-360 and train-other-500, and the evaluation dataset include dev-clean and dev-other.

research/deep_speech/deep_speech.py

+8-3
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,7 @@ def generate_dataset(data_dir):
212212

213213
def run_deep_speech(_):
214214
"""Run deep speech training and eval loop."""
215+
tf.set_random_seed(flags_obj.seed)
215216
# Data preprocessing
216217
tf.logging.info("Data preprocessing...")
217218
train_speech_dataset = generate_dataset(flags_obj.train_data_dir)
@@ -319,19 +320,23 @@ def define_deep_speech_flags():
319320
flags_core.set_defaults(
320321
model_dir="/tmp/deep_speech_model/",
321322
export_dir="/tmp/deep_speech_saved_model/",
322-
train_epochs=200,
323+
train_epochs=10,
323324
batch_size=128,
324325
hooks="")
325326

326327
# Deep speech flags
328+
flags.DEFINE_integer(
329+
name="seed", default=1,
330+
help=flags_core.help_wrap("The random seed."))
331+
327332
flags.DEFINE_string(
328333
name="train_data_dir",
329-
default="/tmp/librispeech_data/train-clean/LibriSpeech/train-clean.csv",
334+
default="/tmp/librispeech_data/test-clean/LibriSpeech/test-clean.csv",
330335
help=flags_core.help_wrap("The csv file path of train dataset."))
331336

332337
flags.DEFINE_string(
333338
name="eval_data_dir",
334-
default="/tmp/librispeech_data/dev-clean/LibriSpeech/dev-clean.csv",
339+
default="/tmp/librispeech_data/test-clean/LibriSpeech/test-clean.csv",
335340
help=flags_core.help_wrap("The csv file path of evaluation dataset."))
336341

337342
flags.DEFINE_bool(
+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#!/bin/bash
2+
# Script to run deep speech model to achieve the MLPerf target (WER = 0.23)
3+
# Step 1: download the LibriSpeech dataset.
4+
echo "Data downloading..."
5+
python data/download.py
6+
7+
## After data downloading, the dataset directories are:
8+
train_clean_100="/tmp/librispeech_data/train-clean-100/LibriSpeech/train-clean-100.csv"
9+
train_clean_360="/tmp/librispeech_data/train-clean-360/LibriSpeech/train-clean-360.csv"
10+
train_other_500="/tmp/librispeech_data/train-other-500/LibriSpeech/train-other-500.csv"
11+
dev_clean="/tmp/librispeech_data/dev-clean/LibriSpeech/dev-clean.csv"
12+
dev_other="/tmp/librispeech_data/dev-other/LibriSpeech/dev-other.csv"
13+
test_clean="/tmp/librispeech_data/test-clean/LibriSpeech/test-clean.csv"
14+
test_other="/tmp/librispeech_data/test-other/LibriSpeech/test-other.csv"
15+
16+
# Step 2: generate train dataset and evaluation dataset
17+
echo "Data preprocessing..."
18+
train_file="/tmp/librispeech_data/train_dataset.csv"
19+
eval_file="/tmp/librispeech_data/eval_dataset.csv"
20+
21+
head -1 $train_clean_100 > $train_file
22+
for filename in $train_clean_100 $train_clean_360 $train_other_500
23+
do
24+
sed 1d $filename >> $train_file
25+
done
26+
27+
head -1 $dev_clean > $eval_file
28+
for filename in $dev_clean $dev_other
29+
do
30+
sed 1d $filename >> $eval_file
31+
done
32+
33+
# Step 3: filter out the audio files that exceed max time duration.
34+
final_train_file="/tmp/librispeech_data/final_train_dataset.csv"
35+
final_eval_file="/tmp/librispeech_data/final_eval_dataset.csv"
36+
37+
MAX_AUDIO_LEN=27.0
38+
awk -v maxlen="$MAX_AUDIO_LEN" 'BEGIN{FS="\t";} NR==1{print $0} NR>1{cmd="soxi -D "$1""; cmd|getline x; if(x<=maxlen) {print $0}; close(cmd);}' $train_file > $final_train_file
39+
awk -v maxlen="$MAX_AUDIO_LEN" 'BEGIN{FS="\t";} NR==1{print $0} NR>1{cmd="soxi -D "$1""; cmd|getline x; if(x<=maxlen) {print $0}; close(cmd);}' $eval_file > $final_eval_file
40+
41+
# Step 4: run the training and evaluation loop in background, and save the running info to a log file
42+
echo "Model training and evaluation..."
43+
start=`date +%s`
44+
45+
log_file=log_`date +%Y-%m-%d`
46+
nohup python deep_speech.py --train_data_dir=$final_train_file --eval_data_dir=$final_eval_file --num_gpus=-1 --wer_threshold=0.23 --seed=1 >$log_file 2>&1&
47+
48+
end=`date +%s`
49+
runtime=$((end-start))
50+
echo "Model training time is" $runtime "seconds."

0 commit comments

Comments
 (0)