Skip to content

Commit 33c8407

Browse files
authored
[SpeechCommands] Clean up training/browser-fft directory (#513)
- Removes obsolete Python files, now that we have notebooks showing the complete workflow - Removes supporting TypeScript scripts and package.json file. - Updates README.md. - Also adds the step to write meta
1 parent cd17028 commit 33c8407

11 files changed

+55
-1562
lines changed

run_python_tests.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,5 @@ set -e
2020

2121
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
2222

23-
"${SCRIPT_DIR}/speech-commands/training/browser-fft/py_lint_and_test.sh"
23+
# This file is currently an empty placeholder. Add Python tests here when
24+
# they are added in the future.
+20-83
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,22 @@
11
# Training a TensorFlow.js model for Speech Commands Using Browser FFT
22

3-
## Preparing data for training
4-
5-
Before you can train your model that uses spectrogram from the browser's
6-
WebAudio as input features, you need to convert the speech-commands
7-
data set into a format that TensorFlow.js can ingest, by running the
8-
data through the native WebAudio frequency analyzer (FFT) of the browser.
9-
The following steps are involved:
10-
11-
1. Download the speech-commands data set from
12-
https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.01.tar.gz
13-
or
14-
https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
15-
Version 0.02 is a larger dataset than 0.01.
16-
17-
2. Use `prep_wavs.py` to convert the raw wav files into a binary format
18-
ready for FFT conversion in the browser. E.g.,
19-
20-
```sh
21-
python prep_wavs.py \
22-
--words zero,one,two,three,four,five,six,seven,eight,nine,go,stop,left,right,up,down \
23-
--test_split 0.15 \
24-
--include_noise \
25-
"${HOME}/ml-data/speech_commands_data" \
26-
"${HOME}/ml-data/speech_commands_data_converted"
27-
```
28-
29-
With the `--words` flag, you can specify what words to include in the
30-
training of the model. With the `--test_split` flag, you can specify the
31-
fraction of the .wav files that will be randomly drawn for testing after
32-
training. The `--include_noise` flag asks the script to randomly draw
33-
segments from the long .wav files in the '_background_noise_' folder to
34-
generate training (and test) examples for background noise. (N.B.: this is
35-
*not* about adding noise to the word examples.)
36-
The last two arguments point to the input and output directories,
37-
respectively.
38-
39-
Under the output path (i.e., `speech_commands_data_converted` in this example),
40-
there will be two subfolders, called `train` and `test`, which hold the
41-
training and testing splits, respectively. Under each of `train` and `test`,
42-
there are subfolders with names matching the words (e.g., `zero`, `one`,
43-
etc.) In each of those subfolders, there will subfolders with names
44-
such as `0` and `1`, which contain a number of `.dat` files.
45-
46-
3. Run WebAudio FFT on the `.dat` files generated in step 2 in the browser.
47-
TODO(cais): Provide more details here.
48-
49-
## Training the TensorFlow.js Model in tfjs-node or tfjs-node-gpu
50-
51-
1. Download and extract the browser-FFT version of the speech-commands dataset:
52-
53-
```sh
54-
curl -fSsL https://storage.googleapis.com/learnjs-data/speech-commands/speech-commands-data-v0.02-browser.tar.gz -o speech-commands-data-v0.02-browser.tar.gz && \
55-
tar xzvf speech-commands-data-v0.02-browser.tar.gz
56-
```
57-
58-
2. Start training. First, download JavaScript dependencies using:
59-
60-
```sh
61-
yarn
62-
```
63-
64-
Then, to train the model using CPU (tfjs-node):
65-
66-
```sh
67-
yarn train speech-commands-data-v0.02-browser/ ./my-model/
68-
```
69-
70-
Or, to train the model using a GPU (tfjs-node-gpu,
71-
requires CUDA-enabled GPU and drivers):
72-
73-
```sh
74-
yarn train --gpu speech-commands-data-v0.02-browser/ ./my-model/
75-
```
76-
77-
## Development
78-
79-
### Python
80-
81-
To run linting and tests of the Python files in this directory, use script:
82-
83-
```sh
84-
./py_lint_and_test.sh
85-
```
3+
This directory contains two example notebooks. They demonstrate how to train
4+
custom TensorFlow.js audio models and deploy them for inference. The models
5+
trained this way expect inputs to be spectrograms in a format consistent with
6+
[WebAudio's `getFloatFrequencyData`](https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatFrequencyData).
7+
Therefore they can be deployed to the browser using the speech-commands library
8+
for inference.
9+
10+
Specifically,
11+
12+
- [training_custom_audio_model_in_python.ipynb](./training_custom_audio_model_in_python.ipynb)
13+
contains steps to preprocess a directory with audio examples stored as .wav
14+
files and the steps in which a tf.keras model can be trained on the
15+
preprocessed data. It then demonstrates how the trained tf.keras model can be
16+
converted to a TensorFlow.js `LayersModel` that can be loaded with the
17+
speech-command library's `create()` API. In addition, the notebook also shows
18+
the steps to convert the trained tf.keras model to a TFLite model for
19+
inference on mobile devices.
20+
- [tflite_conversion.ipynb](./tflite_conversion.ipynb) illustrates how
21+
an audio model trained on [Teachable Machine](https://teachablemachine.withgoogle.com/train/audio)
22+
can be converted to TFLite directly.

speech-commands/training/browser-fft/package.json

-19
This file was deleted.

0 commit comments

Comments
 (0)