|
1 | 1 | # Training a TensorFlow.js model for Speech Commands Using Browser FFT
|
2 | 2 |
|
3 |
| -## Preparing data for training |
4 |
| - |
5 |
| -Before you can train your model that uses spectrogram from the browser's |
6 |
| -WebAudio as input features, you need to convert the speech-commands |
7 |
| -data set into a format that TensorFlow.js can ingest, by running the |
8 |
| -data through the native WebAudio frequency analyzer (FFT) of the browser. |
9 |
| -The following steps are involved: |
10 |
| - |
11 |
| -1. Download the speech-commands data set from |
12 |
| - https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.01.tar.gz |
13 |
| - or |
14 |
| - https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz |
15 |
| - Version 0.02 is a larger dataset than 0.01. |
16 |
| - |
17 |
| -2. Use `prep_wavs.py` to convert the raw wav files into a binary format |
18 |
| - ready for FFT conversion in the browser. E.g., |
19 |
| - |
20 |
| - ```sh |
21 |
| - python prep_wavs.py \ |
22 |
| - --words zero,one,two,three,four,five,six,seven,eight,nine,go,stop,left,right,up,down \ |
23 |
| - --test_split 0.15 \ |
24 |
| - --include_noise \ |
25 |
| - "${HOME}/ml-data/speech_commands_data" \ |
26 |
| - "${HOME}/ml-data/speech_commands_data_converted" |
27 |
| - ``` |
28 |
| - |
29 |
| - With the `--words` flag, you can specify what words to include in the |
30 |
| - training of the model. With the `--test_split` flag, you can specify the |
31 |
| - fraction of the .wav files that will be randomly drawn for testing after |
32 |
| - training. The `--include_noise` flag asks the script to randomly draw |
33 |
| - segments from the long .wav files in the '_background_noise_' folder to |
34 |
| - generate training (and test) examples for background noise. (N.B.: this is |
35 |
| - *not* about adding noise to the word examples.) |
36 |
| - The last two arguments point to the input and output directories, |
37 |
| - respectively. |
38 |
| - |
39 |
| - Under the output path (i.e., `speech_commands_data_converted` in this example), |
40 |
| - there will be two subfolders, called `train` and `test`, which hold the |
41 |
| - training and testing splits, respectively. Under each of `train` and `test`, |
42 |
| - there are subfolders with names matching the words (e.g., `zero`, `one`, |
43 |
| - etc.) In each of those subfolders, there will subfolders with names |
44 |
| - such as `0` and `1`, which contain a number of `.dat` files. |
45 |
| - |
46 |
| -3. Run WebAudio FFT on the `.dat` files generated in step 2 in the browser. |
47 |
| - TODO(cais): Provide more details here. |
48 |
| - |
49 |
| -## Training the TensorFlow.js Model in tfjs-node or tfjs-node-gpu |
50 |
| - |
51 |
| -1. Download and extract the browser-FFT version of the speech-commands dataset: |
52 |
| - |
53 |
| - ```sh |
54 |
| - curl -fSsL https://storage.googleapis.com/learnjs-data/speech-commands/speech-commands-data-v0.02-browser.tar.gz -o speech-commands-data-v0.02-browser.tar.gz && \ |
55 |
| - tar xzvf speech-commands-data-v0.02-browser.tar.gz |
56 |
| - ``` |
57 |
| - |
58 |
| -2. Start training. First, download JavaScript dependencies using: |
59 |
| - |
60 |
| - ```sh |
61 |
| - yarn |
62 |
| - ``` |
63 |
| - |
64 |
| - Then, to train the model using CPU (tfjs-node): |
65 |
| - |
66 |
| - ```sh |
67 |
| - yarn train speech-commands-data-v0.02-browser/ ./my-model/ |
68 |
| - ``` |
69 |
| - |
70 |
| - Or, to train the model using a GPU (tfjs-node-gpu, |
71 |
| - requires CUDA-enabled GPU and drivers): |
72 |
| - |
73 |
| - ```sh |
74 |
| - yarn train --gpu speech-commands-data-v0.02-browser/ ./my-model/ |
75 |
| - ``` |
76 |
| - |
77 |
| -## Development |
78 |
| - |
79 |
| -### Python |
80 |
| - |
81 |
| -To run linting and tests of the Python files in this directory, use script: |
82 |
| - |
83 |
| -```sh |
84 |
| -./py_lint_and_test.sh |
85 |
| -``` |
| 3 | +This directory contains two example notebooks. They demonstrate how to train |
| 4 | +custom TensorFlow.js audio models and deploy them for inference. The models |
| 5 | +trained this way expect inputs to be spectrograms in a format consistent with |
| 6 | +[WebAudio's `getFloatFrequencyData`](https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatFrequencyData). |
| 7 | +Therefore they can be deployed to the browser using the speech-commands library |
| 8 | +for inference. |
| 9 | + |
| 10 | +Specifically, |
| 11 | + |
| 12 | +- [training_custom_audio_model_in_python.ipynb](./training_custom_audio_model_in_python.ipynb) |
| 13 | + contains steps to preprocess a directory with audio examples stored as .wav |
| 14 | + files and the steps in which a tf.keras model can be trained on the |
| 15 | + preprocessed data. It then demonstrates how the trained tf.keras model can be |
| 16 | + converted to a TensorFlow.js `LayersModel` that can be loaded with the |
| 17 | + speech-command library's `create()` API. In addition, the notebook also shows |
| 18 | + the steps to convert the trained tf.keras model to a TFLite model for |
| 19 | + inference on mobile devices. |
| 20 | +- [tflite_conversion.ipynb](./tflite_conversion.ipynb) illustrates how |
| 21 | + an audio model trained on [Teachable Machine](https://teachablemachine.withgoogle.com/train/audio) |
| 22 | + can be converted to TFLite directly. |
0 commit comments