Skip to content

Commit cdd4187

Browse files
authored
Initial commit from 20240126 11GB card image. (#1)
* Initial commit from 20240126 11GB card image. Indentical files flashed to AI in a Box microSD. * Update README Adds image download link for quick setup. Adds two new images. Update existing image. Table of Contents. Text clarifications on functionality and operation such as translation language support and external devices.. * Removes invalid script line. Copied in from staging repo. * Moves support for external devices. Move this section under "Connectors and Buttons". Some test changes on support. * Emphasize the top USB-C connector is for power. The side USB-C connector is not for powering AI in a Box (side USB-C is for USB keyboard not power).
1 parent 8d1c1ae commit cdd4187

40 files changed

+3353
-0
lines changed

Diff for: LICENSE

+674
Large diffs are not rendered by default.

Diff for: README.md

+395
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
# AI in a Box repo introduction.
2+
3+
AI in a Box from [Useful Sensors](https://usefulsensors.com/) showcases
4+
speech-based AI applications. All models are on-device and run locally with
5+
no internet connection so are private by design. It ships with a bootable
6+
microSD card containing Ubuntu server operating system and application code.
7+
8+
This repo provides open source code and setup instructions for microSD card.
9+
10+
We do not plan to maintain this repo and encourage interested parties to make
11+
forks.
12+
13+
- [AI in a Box repo introduction.](#ai-in-a-box-repo-introduction)
14+
- [Application modes.](#application-modes)
15+
- [Connectors and buttons.](#connectors-and-buttons)
16+
- [Support for external devices.](#support-for-external-devices)
17+
- [Installation.](#installation)
18+
- [Quick setup.](#quick-setup)
19+
- [Full installation from baseline image.](#full-installation-from-baseline-image)
20+
- [Boot and initial sanity checks.](#boot-and-initial-sanity-checks)
21+
- [Software.](#software)
22+
- [Model download and extraction.](#model-download-and-extraction)
23+
- [Permissions for scripts.](#permissions-for-scripts)
24+
- [Test run AI in a Box.](#test-run-ai-in-a-box)
25+
- [Startup service.](#startup-service)
26+
- [Optional steps.](#optional-steps)
27+
- [Model details.](#model-details)
28+
- [Contributors.](#contributors)
29+
30+
# Application modes.
31+
32+
AI in a Box has three speech driven modes with different display layouts.
33+
34+
| Mode | Wake word(s) | Notes |
35+
| --------- | ------------------ | ------------------------------------------------- |
36+
| Caption | "caption" | Transcription in English. USB keyboard. |
37+
| Chatty | "chatty" | Answers questions in English. LLM 4-bit weights. |
38+
| Translate | "translate x to y" | e.g.: translate French to German. |
39+
40+
Apply power to the top USB-C connector (not the side USB-C connector).
41+
42+
Boots into caption mode for continuous transcription in English.
43+
44+
<img src="images/caption_mode.jpg" alt="caption mode on boot" width="500"/>
45+
46+
Chatty mode.
47+
48+
<img src="images/chatty_mode.jpg" alt="chatty mode" width="500"/>
49+
50+
Translate mode.
51+
52+
<img src="images/translate_mode.jpg" alt="translate mode" width="500"/>
53+
54+
The translate mode supports the selection of languages defined in
55+
`lang_to_flores200_dict` in this [code](/state_machine.py). It uses non-Latin
56+
font typefaces for Chinese, Japanese, Korean and Thai languages based on this
57+
[code](/fontfile.py). All other selectable languages use a default Latin font.
58+
59+
We describe the models used in AI in a Box [here](#model-details).
60+
61+
# Connectors and buttons.
62+
63+
Power to the top USB-C connector boots AI in a Box (do not connect power to the
64+
side USB-C connector).
65+
66+
<img src="images/power.jpg" alt="power" width="500"/>
67+
68+
Optional HDMI display if connected before boot (some display resolutions may
69+
not work such as 800x480). We find this physical connection is not always
70+
reliable.
71+
72+
LAN connection is needed for
73+
[full installation](#full-installation-from-baseline-image).
74+
75+
<img src="images/lan_usbc_keyboard.jpg" alt="lan_usbc" width="200"/>
76+
77+
Optional USB-C keyboard for caption mode transcription in English. This USB-C
78+
connector does not support powering AI in a Box.
79+
80+
More details on external device support are
81+
[here](#support-for-external-devices).
82+
83+
There are four buttons for navigating the pop-up menu:
84+
* Up/Down keys toggle between the three [modes](#application-modes).
85+
* Right key triggers a menu for volume and language selection.
86+
* use Up/Down to navigate and Right key to select.
87+
* use Left key to navigate back.
88+
89+
<img src="images/buttons.jpg" alt="buttons" width="200"/>
90+
91+
The volume selection is retained when rebooted. Our default value is `50`.
92+
93+
## Support for external devices.
94+
* Power supply of at least 20 W to the top connector. For USB protocol details see Rock 5A [power](https://radxa.com/products/rock5/5a#techspec) support.
95+
* HDMI monitor requires reboot. However some HDMI displays may not work for example 800x480 display resolution. This connector and third-party cables may not function as it is recessed.
96+
* Headset audio jack is not supported by AI in a Box.
97+
* USB audio devices are not supported by AI in a Box. We added experimental script support for USB devices [here](/configure_devices.sh) but it is not reliable in our testing.
98+
* USB keyboard requires a USB-C cable that supports data. This side connector is not used to power AI in a Box. USB keyboard has been tested on MacBook TextEdit application. We ignore the MacOS pop-up prompt for the unknown keyboard layout.
99+
100+
101+
# Installation.
102+
103+
For this project we use Ubuntu OS server, specifically Jammy CLI b18 release
104+
from
105+
[here](https://github.com/radxa-build/rock-5a/releases). This release was
106+
marked "latest" avalable on 01/26/2024. It is installed in the microSD images
107+
described below.
108+
109+
The application is coded with Python scripts and runs Python3.10.
110+
111+
The microSD card images have username `ubuntu` and password `ubunturock` for
112+
SSH.
113+
114+
## Quick setup.
115+
116+
Download this compressed
117+
[image](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_11gb_20240126.img.gz)
118+
then flash to a 16GB or higher microSD card.
119+
```console
120+
cd
121+
curl -L -O https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_11gb_20240126.img.gz
122+
```
123+
Flash the compressed image file `ai_in_a_box_11gb_20240126.img.gz` using
124+
[BalenaEtcher](https://etcher.balena.io/) or other method.
125+
126+
Insert the flashed microSD card in AI in a Box after removing the four screws
127+
securing the rear panel. Connect USB-C power to boot AI in a Box into the
128+
caption mode.
129+
130+
<img src="images/booting.jpg" alt="booting..." width="150"/>
131+
132+
After around 60 seconds "Ready..." appears on the
133+
display.
134+
135+
<img src="images/ready.jpg" alt="ready" width="150"/>
136+
137+
AI in a Box is now listening for speech.
138+
139+
## Full installation from baseline image.
140+
141+
AI in a Box hardware has custom hardware for the display and audio and USB
142+
keyboard. For the full installation or experimentation we provide a baseline
143+
microSD card image with the OS and needed overlays and configuration for the
144+
custom hardware. You will also need GitHub access to complete these steps.
145+
146+
This baseline image does not include our application code which is added during
147+
this installation. The preparation of this image is not documented in this
148+
repo. It was created on a Sandisk A1 16GB microSD card (SDSQUAR-O16G-GN6MN
149+
with 15,931,539,456 Bytes storage).
150+
151+
Download this
152+
[compressed image](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_baseline_16gb_20240125.img.gz).
153+
```console
154+
cd
155+
curl -L -O https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/ai_in_a_box_baseline_16gb_20240125.img.gz
156+
```
157+
Flash the compressed image file `ai_in_a_box_baseline_16GB_20240125.img.gz`
158+
using BalenaEtcher or other method to a microSD card. The image file is not
159+
needed for the rest of this installation.
160+
161+
Insert the flashed microSD card into AI in a Box after removing the four screws
162+
securing the rear panel.
163+
164+
### Boot and initial sanity checks.
165+
166+
Connect AI in a Box to LAN network and power-up using the provided USB-C
167+
charger. A prompt will appear on the display.
168+
169+
Identify the IP address with command `nmap -Pn -p22 --open 192.168.1.0/24` on a
170+
Mac computer, or with a USB keyboard and `ip a` command. Login through SSH
171+
with `ubuntu` / `ubunturock`.
172+
173+
Optional: check custom hardware interfaces are available with these commands.
174+
175+
For input device check for
176+
`<alsa_input.platform-uctronics-sound.stereo-fallback>`.
177+
```console
178+
pacmd list-sources | grep -e 'name:' -e 'index:' -e 'spec:'
179+
```
180+
181+
For output device check for
182+
`<alsa_output.platform-uctronics-sound.stereo-fallback>`.
183+
```console
184+
pacmd list-sinks | grep -e 'name:' -e 'index:' -e 'spec:'
185+
```
186+
187+
For serial port needed for the USB keyboard feature check `/dev/ttyS6`.
188+
```console
189+
ls /dev/ttyS*
190+
```
191+
192+
### Software.
193+
194+
Setup GitHub access with SSH Key or other and clone this repo.
195+
```console
196+
git clone [email protected]:usefulsensors/ai_in_a_box.git --depth=1
197+
```
198+
199+
Run installs including packages.
200+
```console
201+
cd
202+
sudo apt update
203+
sudo apt upgrade -y
204+
205+
sudo apt-get install -y pulseaudio
206+
sudo apt-get install -y libasound-dev portaudio19-dev
207+
sudo apt-get install -y libportaudio2 libportaudiocpp0
208+
sudo apt install -y libegl-dev libegl1
209+
sudo apt-get install -y python3-dev
210+
211+
sudo apt install -y python3.10 pip
212+
sudo apt install -y python3-pygame
213+
214+
# Run pip install as root to allow booting into demo.
215+
sudo python3 -m pip install -r ai_in_a_box/requirements.txt
216+
```
217+
During the above installs you may get prompted.
218+
```bash
219+
*** panfrost.conf.bak (Y/I/N/O/D/Z) [default=N] ?
220+
```
221+
If you see this prompt choose default `N`.
222+
223+
Check the memlock limits.
224+
```console
225+
sudo nano /etc/security/limits.conf
226+
227+
# Add these two lines before end, uncomment and save.
228+
#* soft memlock unlimited
229+
#* hard memlock unlimited
230+
```
231+
![/etc/security/limits.conf](images/memlock.jpg)
232+
233+
### Model download and extraction.
234+
235+
The five models used on AI in a Box are outlined [below](#model-details).
236+
237+
We download ~ 3 GB of archives over the internet and move to locations on the
238+
card. This step is best run inside a terminal multiplexer such as `tmux`
239+
in case the SSH session disconnects. Sudo password
240+
`ubunturock` is needed during the first install.
241+
242+
```console
243+
cd
244+
ai_in_a_box/get_model_archives.sh
245+
```
246+
After download readme files and licence texts are in `models/` folder, model
247+
files are in `downloaded/` folder.
248+
249+
250+
### Permissions for scripts.
251+
252+
This script configures audio devices and is used in launcher script.
253+
```console
254+
cd
255+
chmod +x ai_in_a_box/configure_devices.sh
256+
```
257+
258+
This script is the launcher for AI in a Box boot.
259+
```console
260+
cd
261+
chmod +x ai_in_a_box/run_chatty.sh
262+
```
263+
264+
### Test run AI in a Box.
265+
266+
We can make a test run of AI in a Box. This step is optional and you may
267+
proceed to the [next section](#startup-service).
268+
269+
First reboot AI in a Box after above installation.
270+
```console
271+
sudo reboot
272+
```
273+
274+
SSH back in to AI in a Box and start the launcher script.
275+
```console
276+
cd
277+
sudo ai_in_a_box/run_chatty.sh
278+
```
279+
AI in a Box takes around 60 seconds to start caption mode `Ready...`. Note the
280+
launcher script is run with superuser privileges.
281+
282+
Ignore this error in the SSH session.
283+
```bash
284+
/usr/local/lib/python3.10/dist-packages/pygame_menu/sound.py:204: UserWarning: sound error: No such device.
285+
warn('sound error: ' + str(e))
286+
```
287+
The above error is superceded with this log status.
288+
```bash
289+
audio input stream started successfully: True
290+
```
291+
292+
If needed we can exit the application in another SSH session with this
293+
command.
294+
```console
295+
sudo pkill -9 python
296+
```
297+
AI in a Box now displays Ubuntu's console prompt.
298+
299+
### Startup service.
300+
301+
This section describes how to configure AI in a Box to boot to the application.
302+
303+
Create a startup service. It will be run as superuser.
304+
```console
305+
sudo nano /etc/systemd/system/run-chatty-startup.service
306+
```
307+
308+
Add this text and save.
309+
```bash
310+
[Unit]
311+
Description=AI in a Box Startup Service
312+
313+
[Service]
314+
ExecStart=/bin/sh -c '/home/ubuntu/ai_in_a_box/run_chatty.sh > /tmp/run_chatty_log.txt 2>&1'
315+
WorkingDirectory=/home/ubuntu
316+
StandardOutput=file:/tmp/run_chatty_log.txt
317+
StandardError=file:/tmp/run_chatty_log.txt
318+
319+
[Install]
320+
WantedBy=default.target
321+
322+
```
323+
324+
Reload the Systemd configuration and enable the service to auto start.
325+
```console
326+
sudo systemctl daemon-reload
327+
sudo systemctl enable run-chatty-startup
328+
```
329+
330+
AI in a Box does not need any LAN internet connection following this step.
331+
```console
332+
sudo reboot
333+
```
334+
AI in a Box will boot into [caption mode](#application-modes) `Ready...` after
335+
about 60 seconds. Speak to the box to see a transcription on the display.
336+
337+
The full installation is now complete.
338+
339+
You may now remove and reinsert the USB-C power to hard boot AI in a Box.
340+
341+
### Optional steps.
342+
343+
Optional: we reduced our
344+
[quick setup](#quick-setup) image size using third-party tools `gparted` to
345+
reduce the microSD card partition size and `DD` to clone the image to ~ 11GB.
346+
This step is optional. If your workflow requires this we recommend leaving at
347+
least 1GB of unused space to run AI in a Box. Otherwise use all free space on
348+
your card (16GB or larger) when running AI in a Box.
349+
350+
Optional: inspect the application log in an SSH session.
351+
```console
352+
watch -n 1 tail -n 20 /tmp/run_chatty_log.txt
353+
```
354+
355+
Optional: remove the system startup configuration in an SSH session if booting
356+
into AI in a Box application is not wanted.
357+
```console
358+
sudo systemctl disable run-chatty-startup
359+
sudo rm /etc/systemd/system/run-chatty-startup.service
360+
```
361+
362+
Optional: remove GitHub SSH key authentication and configuration.
363+
```console
364+
rm ~/.ssh/*
365+
git config --global user.email ""
366+
git config --global user.name ""
367+
```
368+
369+
# Model details.
370+
371+
We provide copies of all models used in AI in a Box each with license, original
372+
source URL and readme in compressed tarball archive files - details for
373+
each archive file are provided in this table.
374+
375+
During the full installation above we used this
376+
[script](/get_model_archives.sh) to automate the download and extraction onto
377+
the AI in a Box microSD card.
378+
379+
| Name and source URL | download URL | microSD location | Task |
380+
| -------------------------------- | ------------ | ------------------- | --------------------------------------- |
381+
| [useful-transformers_wheel.tar.gz](https://github.com/usefulsensors/useful-transformers) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/useful-transformers_wheel.tar.gz) | python3.10 package | Speech to text (S2T) in all modes |
382+
| [nllb-200-distilled-600M.tar.gz](https://huggingface.co/facebook/nllb-200-distilled-600M) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/nllb-200-distilled-600M.tar.gz) | downloaded/ | Language translation for translate mode |
383+
| [orca-mini-3b.tar.gz](https://huggingface.co/TheBloke/orca_mini_3B-GGML) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/orca-mini-3b.tar.gz) | downloaded/ | LLM for chatty mode |
384+
| [piper_tts_en_US.tar.gz](https://github.com/rhasspy/piper) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/piper_tts_en_US.tar.gz) | downloaded/ | Text to speech (TTS) for chatty mode |
385+
| [silero_vad.tar.gz](https://github.com/snakers4/silero-vad) | [link](https://storage.googleapis.com/download.usefulsensors.com/ai_in_a_box/silero_vad.tar.gz) | downloaded/ | Voice activity detection |
386+
387+
388+
# Contributors.
389+
* Nat Jeffries (@njeffrie)
390+
* Manjunath Kudlur (@keveman)
391+
* William Meng (@wlmeng11)
392+
* Guy Nicholson (@guynich)
393+
* James Wang (@JamesUseful)
394+
* Pete Warden (@petewarden)
395+
* Ali Zartash (@aliz64)

0 commit comments

Comments
 (0)