|
| 1 | +# AI in a Box repo introduction. |
| 2 | + |
| 3 | +AI in a Box from [Useful Sensors](https://usefulsensors.com/) showcases |
| 4 | +speech-based AI applications. All models are on-device and run locally with |
| 5 | +no internet connection so are private by design. It ships with a bootable |
| 6 | +microSD card containing Ubuntu server operating system and application code. |
| 7 | + |
| 8 | +This repo provides open source code and setup instructions for microSD card. |
| 9 | + |
| 10 | +We do not plan to maintain this repo and encourage interested parties to make |
| 11 | +forks. |
| 12 | + |
| 13 | +- [AI in a Box repo introduction.](#ai-in-a-box-repo-introduction) |
| 14 | +- [Application modes.](#application-modes) |
| 15 | +- [Connectors and buttons.](#connectors-and-buttons) |
| 16 | + - [Support for external devices.](#support-for-external-devices) |
| 17 | +- [Installation.](#installation) |
| 18 | + - [Quick setup.](#quick-setup) |
| 19 | + - [Full installation from baseline image.](#full-installation-from-baseline-image) |
| 20 | + - [Boot and initial sanity checks.](#boot-and-initial-sanity-checks) |
| 21 | + - [Software.](#software) |
| 22 | + - [Model download and extraction.](#model-download-and-extraction) |
| 23 | + - [Permissions for scripts.](#permissions-for-scripts) |
| 24 | + - [Test run AI in a Box.](#test-run-ai-in-a-box) |
| 25 | + - [Startup service.](#startup-service) |
| 26 | + - [Optional steps.](#optional-steps) |
| 27 | +- [Model details.](#model-details) |
| 28 | +- [Contributors.](#contributors) |
| 29 | + |
| 30 | +# Application modes. |
| 31 | + |
| 32 | +AI in a Box has three speech driven modes with different display layouts. |
| 33 | + |
| 34 | +| Mode | Wake word(s) | Notes | |
| 35 | +| --------- | ------------------ | ------------------------------------------------- | |
| 36 | +| Caption | "caption" | Transcription in English. USB keyboard. | |
| 37 | +| Chatty | "chatty" | Answers questions in English. LLM 4-bit weights. | |
| 38 | +| Translate | "translate x to y" | e.g.: translate French to German. | |
| 39 | + |
| 40 | +Apply power to the top USB-C connector (not the side USB-C connector). |
| 41 | + |
| 42 | +Boots into caption mode for continuous transcription in English. |
| 43 | + |
| 44 | +<img src="images/caption_mode.jpg" alt="caption mode on boot" width="500"/> |
| 45 | + |
| 46 | +Chatty mode. |
| 47 | + |
| 48 | +<img src="images/chatty_mode.jpg" alt="chatty mode" width="500"/> |
| 49 | + |
| 50 | +Translate mode. |
| 51 | + |
| 52 | +<img src="images/translate_mode.jpg" alt="translate mode" width="500"/> |
| 53 | + |
| 54 | +The translate mode supports the selection of languages defined in |
| 55 | +`lang_to_flores200_dict` in this [code](/state_machine.py). It uses non-Latin |
| 56 | +font typefaces for Chinese, Japanese, Korean and Thai languages based on this |
| 57 | +[code](/fontfile.py). All other selectable languages use a default Latin font. |
| 58 | + |
| 59 | +We describe the models used in AI in a Box [here](#model-details). |
| 60 | + |
| 61 | +# Connectors and buttons. |
| 62 | + |
| 63 | +Power to the top USB-C connector boots AI in a Box (do not connect power to the |
| 64 | +side USB-C connector). |
| 65 | + |
| 66 | +<img src="images/power.jpg" alt="power" width="500"/> |
| 67 | + |
| 68 | +Optional HDMI display if connected before boot (some display resolutions may |
| 69 | +not work such as 800x480). We find this physical connection is not always |
| 70 | +reliable. |
| 71 | + |
| 72 | +LAN connection is needed for |
| 73 | +[full installation](#full-installation-from-baseline-image). |
| 74 | + |
| 75 | +<img src="images/lan_usbc_keyboard.jpg" alt="lan_usbc" width="200"/> |
| 76 | + |
| 77 | +Optional USB-C keyboard for caption mode transcription in English. This USB-C |
| 78 | +connector does not support powering AI in a Box. |
| 79 | + |
| 80 | +More details on external device support are |
| 81 | +[here](#support-for-external-devices). |
| 82 | + |
| 83 | +There are four buttons for navigating the pop-up menu: |
| 84 | +* Up/Down keys toggle between the three [modes](#application-modes). |
| 85 | +* Right key triggers a menu for volume and language selection. |
| 86 | + * use Up/Down to navigate and Right key to select. |
| 87 | + * use Left key to navigate back. |
| 88 | + |
| 89 | +<img src="images/buttons.jpg" alt="buttons" width="200"/> |
| 90 | + |
| 91 | +The volume selection is retained when rebooted. Our default value is `50`. |
| 92 | + |
| 93 | +## Support for external devices. |
| 94 | +* Power supply of at least 20 W to the top connector. For USB protocol details see Rock 5A [power](https://radxa.com/products/rock5/5a#techspec) support. |
| 95 | +* HDMI monitor requires reboot. However some HDMI displays may not work for example 800x480 display resolution. This connector and third-party cables may not function as it is recessed. |
| 96 | +* Headset audio jack is not supported by AI in a Box. |
| 97 | +* USB audio devices are not supported by AI in a Box. We added experimental script support for USB devices [here](/configure_devices.sh) but it is not reliable in our testing. |
| 98 | +* USB keyboard requires a USB-C cable that supports data. This side connector is not used to power AI in a Box. USB keyboard has been tested on MacBook TextEdit application. We ignore the MacOS pop-up prompt for the unknown keyboard layout. |
| 99 | + |
| 100 | + |
| 101 | +# Installation. |
| 102 | + |
| 103 | +For this project we use Ubuntu OS server, specifically Jammy CLI b18 release |
| 104 | +from |
| 105 | +[here](https://github.com/radxa-build/rock-5a/releases). This release was |
| 106 | +marked "latest" avalable on 01/26/2024. It is installed in the microSD images |
| 107 | +described below. |
| 108 | + |
| 109 | +The application is coded with Python scripts and runs Python3.10. |
| 110 | + |
| 111 | +The microSD card images have username `ubuntu` and password `ubunturock` for |
| 112 | +SSH. |
| 113 | + |
| 114 | +## Quick setup. |
| 115 | + |
| 116 | +Download this compressed |
| 117 | +[image](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_11gb_20240126.img.gz) |
| 118 | +then flash to a 16GB or higher microSD card. |
| 119 | +```console |
| 120 | +cd |
| 121 | +curl -L -O https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_11gb_20240126.img.gz |
| 122 | +``` |
| 123 | +Flash the compressed image file `ai_in_a_box_11gb_20240126.img.gz` using |
| 124 | +[BalenaEtcher](https://etcher.balena.io/) or other method. |
| 125 | + |
| 126 | +Insert the flashed microSD card in AI in a Box after removing the four screws |
| 127 | +securing the rear panel. Connect USB-C power to boot AI in a Box into the |
| 128 | +caption mode. |
| 129 | + |
| 130 | +<img src="images/booting.jpg" alt="booting..." width="150"/> |
| 131 | + |
| 132 | +After around 60 seconds "Ready..." appears on the |
| 133 | +display. |
| 134 | + |
| 135 | +<img src="images/ready.jpg" alt="ready" width="150"/> |
| 136 | + |
| 137 | +AI in a Box is now listening for speech. |
| 138 | + |
| 139 | +## Full installation from baseline image. |
| 140 | + |
| 141 | +AI in a Box hardware has custom hardware for the display and audio and USB |
| 142 | +keyboard. For the full installation or experimentation we provide a baseline |
| 143 | +microSD card image with the OS and needed overlays and configuration for the |
| 144 | +custom hardware. You will also need GitHub access to complete these steps. |
| 145 | + |
| 146 | +This baseline image does not include our application code which is added during |
| 147 | +this installation. The preparation of this image is not documented in this |
| 148 | +repo. It was created on a Sandisk A1 16GB microSD card (SDSQUAR-O16G-GN6MN |
| 149 | +with 15,931,539,456 Bytes storage). |
| 150 | + |
| 151 | +Download this |
| 152 | +[compressed image](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_baseline_16gb_20240125.img.gz). |
| 153 | +```console |
| 154 | +cd |
| 155 | +curl -L -O https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_baseline_16gb_20240125.img.gz |
| 156 | +``` |
| 157 | +Flash the compressed image file `ai_in_a_box_baseline_16GB_20240125.img.gz` |
| 158 | +using BalenaEtcher or other method to a microSD card. The image file is not |
| 159 | +needed for the rest of this installation. |
| 160 | + |
| 161 | +Insert the flashed microSD card into AI in a Box after removing the four screws |
| 162 | +securing the rear panel. |
| 163 | + |
| 164 | +### Boot and initial sanity checks. |
| 165 | + |
| 166 | +Connect AI in a Box to LAN network and power-up using the provided USB-C |
| 167 | +charger. A prompt will appear on the display. |
| 168 | + |
| 169 | +Identify the IP address with command `nmap -Pn -p22 --open 192.168.1.0/24` on a |
| 170 | +Mac computer, or with a USB keyboard and `ip a` command. Login through SSH |
| 171 | +with `ubuntu` / `ubunturock`. |
| 172 | + |
| 173 | +Optional: check custom hardware interfaces are available with these commands. |
| 174 | + |
| 175 | +For input device check for |
| 176 | +`<alsa_input.platform-uctronics-sound.stereo-fallback>`. |
| 177 | +```console |
| 178 | +pacmd list-sources | grep -e 'name:' -e 'index:' -e 'spec:' |
| 179 | +``` |
| 180 | + |
| 181 | +For output device check for |
| 182 | +`<alsa_output.platform-uctronics-sound.stereo-fallback>`. |
| 183 | +```console |
| 184 | +pacmd list-sinks | grep -e 'name:' -e 'index:' -e 'spec:' |
| 185 | +``` |
| 186 | + |
| 187 | +For serial port needed for the USB keyboard feature check `/dev/ttyS6`. |
| 188 | +```console |
| 189 | +ls /dev/ttyS* |
| 190 | +``` |
| 191 | + |
| 192 | +### Software. |
| 193 | + |
| 194 | +Setup GitHub access with SSH Key or other and clone this repo. |
| 195 | +```console |
| 196 | +git clone [email protected]:usefulsensors/ai_in_a_box.git --depth=1 |
| 197 | +``` |
| 198 | + |
| 199 | +Run installs including packages. |
| 200 | +```console |
| 201 | +cd |
| 202 | +sudo apt update |
| 203 | +sudo apt upgrade -y |
| 204 | + |
| 205 | +sudo apt-get install -y pulseaudio |
| 206 | +sudo apt-get install -y libasound-dev portaudio19-dev |
| 207 | +sudo apt-get install -y libportaudio2 libportaudiocpp0 |
| 208 | +sudo apt install -y libegl-dev libegl1 |
| 209 | +sudo apt-get install -y python3-dev |
| 210 | + |
| 211 | +sudo apt install -y python3.10 pip |
| 212 | +sudo apt install -y python3-pygame |
| 213 | + |
| 214 | +# Run pip install as root to allow booting into demo. |
| 215 | +sudo python3 -m pip install -r ai_in_a_box/requirements.txt |
| 216 | +``` |
| 217 | +During the above installs you may get prompted. |
| 218 | +```bash |
| 219 | +*** panfrost.conf.bak (Y/I/N/O/D/Z) [default=N] ? |
| 220 | +``` |
| 221 | +If you see this prompt choose default `N`. |
| 222 | + |
| 223 | +Check the memlock limits. |
| 224 | +```console |
| 225 | +sudo nano /etc/security/limits.conf |
| 226 | + |
| 227 | +# Add these two lines before end, uncomment and save. |
| 228 | +#* soft memlock unlimited |
| 229 | +#* hard memlock unlimited |
| 230 | +``` |
| 231 | + |
| 232 | + |
| 233 | +### Model download and extraction. |
| 234 | + |
| 235 | +The five models used on AI in a Box are outlined [below](#model-details). |
| 236 | + |
| 237 | +We download ~ 3 GB of archives over the internet and move to locations on the |
| 238 | +card. This step is best run inside a terminal multiplexer such as `tmux` |
| 239 | +in case the SSH session disconnects. Sudo password |
| 240 | +`ubunturock` is needed during the first install. |
| 241 | + |
| 242 | +```console |
| 243 | +cd |
| 244 | +ai_in_a_box/get_model_archives.sh |
| 245 | +``` |
| 246 | +After download readme files and licence texts are in `models/` folder, model |
| 247 | +files are in `downloaded/` folder. |
| 248 | + |
| 249 | + |
| 250 | +### Permissions for scripts. |
| 251 | + |
| 252 | +This script configures audio devices and is used in launcher script. |
| 253 | +```console |
| 254 | +cd |
| 255 | +chmod +x ai_in_a_box/configure_devices.sh |
| 256 | +``` |
| 257 | + |
| 258 | +This script is the launcher for AI in a Box boot. |
| 259 | +```console |
| 260 | +cd |
| 261 | +chmod +x ai_in_a_box/run_chatty.sh |
| 262 | +``` |
| 263 | + |
| 264 | +### Test run AI in a Box. |
| 265 | + |
| 266 | +We can make a test run of AI in a Box. This step is optional and you may |
| 267 | +proceed to the [next section](#startup-service). |
| 268 | + |
| 269 | +First reboot AI in a Box after above installation. |
| 270 | +```console |
| 271 | +sudo reboot |
| 272 | +``` |
| 273 | + |
| 274 | +SSH back in to AI in a Box and start the launcher script. |
| 275 | +```console |
| 276 | +cd |
| 277 | +sudo ai_in_a_box/run_chatty.sh |
| 278 | +``` |
| 279 | +AI in a Box takes around 60 seconds to start caption mode `Ready...`. Note the |
| 280 | +launcher script is run with superuser privileges. |
| 281 | + |
| 282 | +Ignore this error in the SSH session. |
| 283 | +```bash |
| 284 | +/usr/local/lib/python3.10/dist-packages/pygame_menu/sound.py:204: UserWarning: sound error: No such device. |
| 285 | + warn('sound error: ' + str(e)) |
| 286 | +``` |
| 287 | +The above error is superceded with this log status. |
| 288 | +```bash |
| 289 | +audio input stream started successfully: True |
| 290 | +``` |
| 291 | + |
| 292 | +If needed we can exit the application in another SSH session with this |
| 293 | +command. |
| 294 | +```console |
| 295 | +sudo pkill -9 python |
| 296 | +``` |
| 297 | +AI in a Box now displays Ubuntu's console prompt. |
| 298 | + |
| 299 | +### Startup service. |
| 300 | + |
| 301 | +This section describes how to configure AI in a Box to boot to the application. |
| 302 | + |
| 303 | +Create a startup service. It will be run as superuser. |
| 304 | +```console |
| 305 | +sudo nano /etc/systemd/system/run-chatty-startup.service |
| 306 | +``` |
| 307 | + |
| 308 | +Add this text and save. |
| 309 | +```bash |
| 310 | +[Unit] |
| 311 | +Description=AI in a Box Startup Service |
| 312 | + |
| 313 | +[Service] |
| 314 | +ExecStart=/bin/sh -c '/home/ubuntu/ai_in_a_box/run_chatty.sh > /tmp/run_chatty_log.txt 2>&1' |
| 315 | +WorkingDirectory=/home/ubuntu |
| 316 | +StandardOutput=file:/tmp/run_chatty_log.txt |
| 317 | +StandardError=file:/tmp/run_chatty_log.txt |
| 318 | + |
| 319 | +[Install] |
| 320 | +WantedBy=default.target |
| 321 | + |
| 322 | +``` |
| 323 | + |
| 324 | +Reload the Systemd configuration and enable the service to auto start. |
| 325 | +```console |
| 326 | +sudo systemctl daemon-reload |
| 327 | +sudo systemctl enable run-chatty-startup |
| 328 | +``` |
| 329 | + |
| 330 | +AI in a Box does not need any LAN internet connection following this step. |
| 331 | +```console |
| 332 | +sudo reboot |
| 333 | +``` |
| 334 | +AI in a Box will boot into [caption mode](#application-modes) `Ready...` after |
| 335 | +about 60 seconds. Speak to the box to see a transcription on the display. |
| 336 | + |
| 337 | +The full installation is now complete. |
| 338 | + |
| 339 | +You may now remove and reinsert the USB-C power to hard boot AI in a Box. |
| 340 | + |
| 341 | +### Optional steps. |
| 342 | + |
| 343 | +Optional: we reduced our |
| 344 | +[quick setup](#quick-setup) image size using third-party tools `gparted` to |
| 345 | +reduce the microSD card partition size and `DD` to clone the image to ~ 11GB. |
| 346 | +This step is optional. If your workflow requires this we recommend leaving at |
| 347 | +least 1GB of unused space to run AI in a Box. Otherwise use all free space on |
| 348 | +your card (16GB or larger) when running AI in a Box. |
| 349 | + |
| 350 | +Optional: inspect the application log in an SSH session. |
| 351 | +```console |
| 352 | +watch -n 1 tail -n 20 /tmp/run_chatty_log.txt |
| 353 | +``` |
| 354 | + |
| 355 | +Optional: remove the system startup configuration in an SSH session if booting |
| 356 | +into AI in a Box application is not wanted. |
| 357 | +```console |
| 358 | +sudo systemctl disable run-chatty-startup |
| 359 | +sudo rm /etc/systemd/system/run-chatty-startup.service |
| 360 | +``` |
| 361 | + |
| 362 | +Optional: remove GitHub SSH key authentication and configuration. |
| 363 | +```console |
| 364 | +rm ~/.ssh/* |
| 365 | +git config --global user.email "" |
| 366 | +git config --global user.name "" |
| 367 | +``` |
| 368 | + |
| 369 | +# Model details. |
| 370 | + |
| 371 | +We provide copies of all models used in AI in a Box each with license, original |
| 372 | +source URL and readme in compressed tarball archive files - details for |
| 373 | +each archive file are provided in this table. |
| 374 | + |
| 375 | +During the full installation above we used this |
| 376 | +[script](/get_model_archives.sh) to automate the download and extraction onto |
| 377 | +the AI in a Box microSD card. |
| 378 | + |
| 379 | +| Name and source URL | download URL | microSD location | Task | |
| 380 | +| -------------------------------- | ------------ | ------------------- | --------------------------------------- | |
| 381 | +| [useful-transformers_wheel.tar.gz](https://github.com/usefulsensors/useful-transformers) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/useful-transformers_wheel.tar.gz) | python3.10 package | Speech to text (S2T) in all modes | |
| 382 | +| [nllb-200-distilled-600M.tar.gz](https://huggingface.co/facebook/nllb-200-distilled-600M) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/nllb-200-distilled-600M.tar.gz) | downloaded/ | Language translation for translate mode | |
| 383 | +| [orca-mini-3b.tar.gz](https://huggingface.co/TheBloke/orca_mini_3B-GGML) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/orca-mini-3b.tar.gz) | downloaded/ | LLM for chatty mode | |
| 384 | +| [piper_tts_en_US.tar.gz](https://github.com/rhasspy/piper) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/piper_tts_en_US.tar.gz) | downloaded/ | Text to speech (TTS) for chatty mode | |
| 385 | +| [silero_vad.tar.gz](https://github.com/snakers4/silero-vad) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/silero_vad.tar.gz) | downloaded/ | Voice activity detection | |
| 386 | + |
| 387 | + |
| 388 | +# Contributors. |
| 389 | +* Nat Jeffries (@njeffrie) |
| 390 | +* Manjunath Kudlur (@keveman) |
| 391 | +* William Meng (@wlmeng11) |
| 392 | +* Guy Nicholson (@guynich) |
| 393 | +* James Wang (@JamesUseful) |
| 394 | +* Pete Warden (@petewarden) |
| 395 | +* Ali Zartash (@aliz64) |
0 commit comments