TinyML Audio Wakeword Detection
TinyML Keyword Spotting
Let's look at the methods and algorithms for leveraging microphone sensor data for tasks like audio wake-word detection and keyword spotting. We will examine the techniques used by services like "OK Google" and "Alexa," focusing on the neural network of the applications and more specifically on building a keyword spotting app without writing code for IoT devices like microcontrollers.
What is Keyword Spotting?
Keyword spotting is a low power end-to-end solution that uses always-on microphone listening to audio signals in its surrounding for a known keyword like "hey google" or "alexa" and generating a signal to take a meaningful action for further processing.
Applications of Keyword Spotting
For many mobile and edge applications, such as those for phones, wearables, and automobiles, keyword spotting (KWS) offers a crucial user interface. The utility of KWS systems depends on both accuracy and power efficiency being maximized because they are often "always on." In this article, we demonstrate hardware aware training (HAT), a method for creating new KWS neural networks based on Arduino nano ble sensing that enables low parameter counts and state-of-the-art (SotA) accuracy.
Privacy and low cost are highlights of this app making it scalable to industries and verticals including smart homes (e.g. appliances and switches), smart cities (e.g. parking meters, smart locks) and many others.
Building Audio Wake-Word App on No Code Platform Cainvas
As shown in the picture above, the app requires a board with microphone to capture real time audio waveform, which is then fed to a small deep neural network in frequency domain with less than 12 kb memory footprint. It is compact and computationally efficient with power in a few miliwatts. The app is optimized for boards built on Cortex™ M series CPUs and can run on a coin battery for months.
Most importantly, the application can be customized and compiled in a matter of a few minutes.
Steps on Cainvas
Step 1: Train
AUWW Platform lets you record live audio hot-words samples or upload recorded samples as wav files. It performs data analysis on the recordings and converts them into an audio dataset suitable for training the proprietary model as shown in figure 1 above.
Step 2: Test
Once the model is trained, the user can record a speech sample on the test screen to perform live testing. Result of the live speech sample is shown on the next screen with green time regions as hot-word detection.
Step 3: Download & Flash
Once you’re satisfied with the testing and accuracy of speech detection, you can choose the board and click compile. Results of compiled binaries are downloaded on your disk and you’re ready to flash it on the device and see it in action.
Step-by-Step Video
Here is a quick 5 min video that will take you step-by-step on how to build a keyword spotting application on tiny electronics boards like microcontrollers and IoT devices.
Low Code Audio Wake-Word Detection
If you happen to be a developer and need more flexibility in your application, there is a low code version of the audio wake-word detection application that you can simply copy and start editing to suit your needs. Here is the link to start editing the notebook and build your app and detailed guide that explains steps in the notebook.
Summary
There is an entire chapter dedicated to audio wake-word detection in the book "Introduction to TinyML" by Rohit Sharma. If you like the article and would like to build The link to purchase the book are in footer.