TensorFlow Lite MicroSpeech

I have adapted the MicroSpeech example from TensorFlow Lite to follow the philosophy of this framework. The example uses the model which can recognise the words 'yes' and 'no'. The output stream class is TfLiteAudioOutput. In the example I am using an AudioKit board, but you can replace this with any type of microphone.

The Arduino Sketch

Here is the complete Arduino Sketch:


#include "AudioTools.h"
#include "AudioLibs/AudioKit.h"
#include "AudioLibs/TfLiteAudioOutput.h"
#include "model.h"  // tensorflow model

AudioKitStream kit;  // Audio source
TfLiteAudioOutput<4> tfl;  // Audio sink
const char* kCategoryLabels[4] = {
    "silence",
    "unknown",
    "yes",
    "no",
};
StreamCopy copier(tfl, kit);  // copy mic to tfl
int channels = 1;
int samples_per_second = 16000;

// Command callback handler
void respondToCommand(const char* found_command, uint8_t score,
                      bool is_new_command) {
  if (is_new_command) {
    char buffer[80];
    sprintf(buffer, "Result: %s, score: %d, is_new: %s", found_command, score,
            is_new_command ? "true" : "false");
    Serial.println(buffer);
  }
}

void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Warning);

  // setup Audiokit
  auto cfg = kit.defaultConfig(RX_MODE);
  cfg.input_device = AUDIO_HAL_ADC_INPUT_LINE2;
  cfg.channels = channels;
  cfg.sample_rate = samples_per_second;
  cfg.use_apll = false;
  cfg.auto_clear = true;
  cfg.buffer_size = 512;
  cfg.buffer_count = 16;
  kit.begin(cfg);

  // Setup tensorflow
  auto tcfg = tfl.defaultConfig();
  tcfg.channels = channels;
  tcfg.sample_rate = samples_per_second;
  tcfg.kTensorArenaSize = 10 * 1024;
  tcfg.respondToCommand = respondToCommand;
  tcfg.model = g_model;
  tcfg.labels = kCategoryLabels;
  tfl.begin(tcfg);
}

void loop() { copier.copy(); }

The key information that needs to be provided as configuration to tensorflow are

number of channels
sample rate
kTensorArenaSize
a callback for handling the responses (respondToCommand)
the model
the labels

Overall Processing Logic

The TfLiteAudioOutput class uses Fast Fourier transform (FFT) to calculate the FFT result which is an array of frequencies with the length of kFeatureSliceSize using slices (defined by kFeatureSliceStrideMs and kFeatureSliceDurationMs) of audio data. This is then used to update a spectrogram (with the length of kFeatureSliceSize x kFeatureSliceCount). After we added 2 (kSlicesToProcess) new FFT results to the end, we let Tensorflow evaluate the updated spectrogram to calculate the classification result. These results are post-processed (in the TfLiteRecognizeCommands class) to make sure that the result is stable.

Dependencies

Arduino Audio Tools
tflite-micro-arduino-examples
arduino-audiokit - Optional if you use an AudioKit board

Github

The full example can be found on Github

Uh oh!

TensorFlow Lite MicroSpeech

The Arduino Sketch

Overall Processing Logic

Dependencies

Github

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally