Summary
The Reachy Mini Conversation App connects successfully to OpenAI Realtime, the
camera tool fires, the speaker plays audio — but the user's voice is never
recognized as input. The model stays in idle and autonomously calls the
camera tool with self-generated questions ("What interesting detail can we
notice in the background?"), as if it never receives any user speech.
I can talk loudly 20–30 cm from the robot for 30+ seconds and see no transcript
events, no response.created from the user turn, only is_idle=True tool calls.
Environment
- Robot: Reachy Mini (Wi-Fi version)
- Daemon:
v1.7.0, App: v0.9.29 (latest at the time of writing)
- OS: Reachy Mini OS (whatever ships installed by Pollen)
- Network: Robot connected to my home Wi-Fi (
xft), IP 10.1.0.34
- HF account: signed in via the dashboard
- Backend: OpenAI Realtime with the bundled OpenAI access (status:
Connected)
- Personality: default profile, voice
marin (also tried switching voice — got
"Failed to apply voice: Load failed", but base session was still healthy)
What works
- Camera capture works (
tool 'camera' executed successfully + Added camera image to conversation).
- Speaker playback works (
Using ALSA device reachymini_audio_sink for playback).
- Antennas/head animate when the model is speaking, so audio output reaches the speaker.
- Microphone slider in the dashboard reports activity when toggled, so the device is exposed.
- Identical behaviour in two parallel sessions (after fresh install, after daemon restart).
What does not work
- No user-turn events in the daemon logs while I am speaking.
- The model never replies to anything I say; it only generates autonomous
observations of the camera scene.
- Same symptom whether I sit very close to the robot or further away.
- Speaker volume and microphone volume both at 100% in the dashboard.
Daemon logs (representative window)
reachy_mini.media.media_server - INFO - Using ALSA device reachymini_audio_sink for playback.
... uvicorn pings, GST WebRTC pongs ...
reachy_mini.apps.manager.runner - WARNING - Tool call received — tool_name='camera', call_id=..., is_idle=True, args={
"question": "What interesting detail can we notice in the background?"
}
reachy_mini.apps.manager.runner - WARNING - Started background tool: camera (id=call_W1MYMJRxQr7Z0bmv)
reachy_mini.apps.manager.runner - WARNING - Tool call: camera question=What interesting detail can we notice in the background?
reachy_mini.apps.manager.runner - WARNING - Tool 'camera' (id=call_W1MYMJRxQr7Z0bmv) executed successfully.
reachy_mini.apps.manager.runner - WARNING - Added camera image to conversation
This pattern repeats every ~15 seconds with different self-generated questions.
Never any transcript/input_audio_buffer.committed/response.audio_transcript.delta
events. The session looks alive on the OpenAI side (camera tool call goes
through and image is attached) but the audio stream from the Reachy microphone
never produces a transcribable user turn.
Independent diagnostic — what the mic ALSA layer reports
I built a small diagnostic app (Python, runs inside the Reachy app sandbox) that
runs arecord/amixer and tries every plausible capture device:
card 2: Audio [Reachy Mini Audio], device 0: USB Audio
'Headset' Capture 60 [100%] [0.00dB] [on] ← unmuted, max gain
[record default] FAIL: Device or resource busy
[record plughw:2,0] FAIL: Device or resource busy
[record reachymini_audio_src] FAIL: Channels count non available
[record plug:reachymini_audio_src] OK peak=0 (0%) rms=0 ← opens but pure silence
So the USB capture device is held exclusively by the daemon's GStreamer
pipeline, and the only path that's open to userland (plug:reachymini_audio_src)
returns digital silence. From inside an app it is impossible to capture audio
in parallel — which is fine if the daemon's pipeline is the one feeding
OpenAI Realtime, but in my case that pipeline produces no usable speech for
the model either.
Things I have already tried
- Multiple uninstall/reinstall of
reachy_mini_conversation_app.
- Daemon and app restart (full power cycle of the Wi-Fi robot once).
- Toggling the dashboard
LISTEN button.
- Toggling microphone volume slider 0% → 100% several times.
- Switching personalities and voices.
- Running my own minimal app (
sparky_mini) using the SDK
media_manager.audio.start_recording() + get_audio_sample() — it returns
96000 samples per 6 seconds at peak=0.0000 (pure silence).
- Verifying the app stays alive (no crashes after I added the proper
if __name__ == "__main__" block and wrapped_run()).
Questions for Pollen
- Is there a known issue where the daemon's GStreamer mic pipeline runs but
produces silence (peak=0) on the Wi-Fi version of Reachy Mini?
- Is there a way from the dashboard to see the microphone real-time level
(a VU meter) so an end user can confirm the hardware actually captures
anything at all? The slider only exposes volume but no signal indication.
- Is the conversation app expected to work on the bundled OpenAI access for
end users, or do users need their own OPENAI_API_KEY for OpenAI Realtime?
- Any debug flag to dump the raw input audio buffer that the conversation app
sends to OpenAI Realtime, so we can confirm whether non-zero audio is
reaching the API?
Happy to run any additional diagnostics — I have full app-level access on the
robot via the dashboard and via custom apps. Just don't have SSH.
Thanks!
Summary
The Reachy Mini Conversation App connects successfully to OpenAI Realtime, the
camera tool fires, the speaker plays audio — but the user's voice is never
recognized as input. The model stays in idle and autonomously calls the
cameratool with self-generated questions ("What interesting detail can wenotice in the background?"), as if it never receives any user speech.
I can talk loudly 20–30 cm from the robot for 30+ seconds and see no transcript
events, no
response.createdfrom the user turn, onlyis_idle=Truetool calls.Environment
v1.7.0, App:v0.9.29(latest at the time of writing)xft), IP10.1.0.34Connected)marin(also tried switching voice — got"Failed to apply voice: Load failed", but base session was still healthy)
What works
tool 'camera' executed successfully+Added camera image to conversation).Using ALSA device reachymini_audio_sink for playback).What does not work
observations of the camera scene.
Daemon logs (representative window)
This pattern repeats every ~15 seconds with different self-generated questions.
Never any
transcript/input_audio_buffer.committed/response.audio_transcript.deltaevents. The session looks alive on the OpenAI side (camera tool call goes
through and image is attached) but the audio stream from the Reachy microphone
never produces a transcribable user turn.
Independent diagnostic — what the mic ALSA layer reports
I built a small diagnostic app (Python, runs inside the Reachy app sandbox) that
runs
arecord/amixerand tries every plausible capture device:So the USB capture device is held exclusively by the daemon's GStreamer
pipeline, and the only path that's open to userland (
plug:reachymini_audio_src)returns digital silence. From inside an app it is impossible to capture audio
in parallel — which is fine if the daemon's pipeline is the one feeding
OpenAI Realtime, but in my case that pipeline produces no usable speech for
the model either.
Things I have already tried
reachy_mini_conversation_app.LISTENbutton.sparky_mini) using the SDKmedia_manager.audio.start_recording()+get_audio_sample()— it returns96000 samples per 6 seconds at peak=0.0000 (pure silence).
if __name__ == "__main__"block andwrapped_run()).Questions for Pollen
produces silence (peak=0) on the Wi-Fi version of Reachy Mini?
(a VU meter) so an end user can confirm the hardware actually captures
anything at all? The slider only exposes volume but no signal indication.
end users, or do users need their own
OPENAI_API_KEYfor OpenAI Realtime?sends to OpenAI Realtime, so we can confirm whether non-zero audio is
reaching the API?
Happy to run any additional diagnostics — I have full app-level access on the
robot via the dashboard and via custom apps. Just don't have SSH.
Thanks!