-
-
Notifications
You must be signed in to change notification settings - Fork 286
TensorFlow Lite MicroSpeech
I have adapted the MicroSpeech example from TensorFlow Lite to follow the philosophy of this framework. The example uses the model which can recognise the words 'yes' and 'no'. The output stream class is TfLiteAudioOutput. In the example I am using an AudioKit board, but you can replace this with any type of microphone.
Here is the complete Arduino Sketch:
#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioOutput.h"
#include "model.h" // tensorflow model
AudioKitStream kit; // Audio source
TfLiteAudioOutput<4> tfl; // Audio sink
const char* kCategoryLabels[4] = {
"silence",
"unknown",
"yes",
"no",
};
StreamCopy copier(tfl, kit); // copy mic to tfl
int channels = 1;
int samples_per_second = 16000;
// Command callback handler
void respondToCommand(const char* found_command, uint8_t score,
bool is_new_command) {
if (is_new_command) {
char buffer[80];
sprintf(buffer, "Result: %s, score: %d, is_new: %s", found_command, score,
is_new_command ? "true" : "false");
Serial.println(buffer);
}
}
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Warning);
// setup Audiokit
auto cfg = kit.defaultConfig(RX_MODE);
cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
cfg.channels = channels;
cfg.sample_rate = samples_per_second;
cfg.use_apll = false;
cfg.auto_clear = true;
cfg.buffer_size = 512;
cfg.buffer_count = 16;
kit.begin(cfg);
// Setup tensorflow
auto tcfg = tfl.defaultConfig();
tcfg.channels = channels;
tcfg.sample_rate = samples_per_second;
tcfg.kTensorArenaSize = 10 * 1024;
tcfg.respondToCommand = respondToCommand;
tcfg.model = g_model;
tcfg.labels = kCategoryLabels;
tfl.begin(tcfg);
}
void loop() { copier.copy(); }
The key information that needs to be provided as configuration to tensorflow are
- number of channels
- sample rate
- kTensorArenaSize
- a callback for handling the responses (respondToCommand)
- the model
- the labels
The TfLiteAudioOutput class uses Fast Fourier transform (FFT) to calculate the FFT result which is an array of frequencies with the length of kFeatureSliceSize using slices (defined by kFeatureSliceStrideMs and kFeatureSliceDurationMs) of audio data. This is then used to update a spectrogram (with the length of kFeatureSliceSize x kFeatureSliceCount). After we added 2 (kSlicesToProcess) new FFT results to the end, we let Tensorflow evaluate the updated spectrogram to calculate the classification result. These results are post-processed (in the TfLiteRecognizeCommands class) to make sure that the result is stable.
- Arduino Audio Tools
- tflite-micro-arduino-examples
- arduino-audiokit - Optional if you use an AudioKit board
The full example can be found on Github