Small dataset subsampling for speech commands #74

turian · 2021-09-09T02:06:39Z

Ending up with a weird number of samples in the train/valid split:

test: 96
train: 56
valid: 132

This is caused by the background_noise subsampling in the tasks/sampler.py. In speech commands all the background noise samples (which are labelled as silence) are delivered as longer audio samples that are expected to be sliced up into smaller chunks. When we are subsampling this dataset only one background noise sample is being included (running_tap.wav), and that happens to be in the validation set. As a result we are ending up with a validation set that is almost exclusively silence samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small dataset subsampling for speech commands #74

Small dataset subsampling for speech commands #74

turian commented Sep 9, 2021

Small dataset subsampling for speech commands #74

Small dataset subsampling for speech commands #74

Comments

turian commented Sep 9, 2021