Conversation app on Wi-Fi Reachy Mini: OpenAI Realtime never receives user audio (model stays idle, autonomous camera tool calls)

## Summary

The Reachy Mini Conversation App connects successfully to OpenAI Realtime, the
camera tool fires, the speaker plays audio — but **the user's voice is never
recognized as input**. The model stays in idle and autonomously calls the
`camera` tool with self-generated questions ("What interesting detail can we
notice in the background?"), as if it never receives any user speech.

I can talk loudly 20–30 cm from the robot for 30+ seconds and see no transcript
events, no `response.created` from the user turn, only `is_idle=True` tool calls.

## Environment

- Robot: **Reachy Mini** (Wi-Fi version)
- Daemon: `v1.7.0`, App: `v0.9.29` (latest at the time of writing)
- OS: Reachy Mini OS (whatever ships installed by Pollen)
- Network: Robot connected to my home Wi-Fi (`xft`), IP `10.1.0.34`
- HF account: signed in via the dashboard
- Backend: **OpenAI Realtime** with the **bundled** OpenAI access (status: `Connected`)
- Personality: default profile, voice `marin` (also tried switching voice — got
  "Failed to apply voice: Load failed", but base session was still healthy)

## What works

- Camera capture works (`tool 'camera' executed successfully` + `Added camera image to conversation`).
- Speaker playback works (`Using ALSA device reachymini_audio_sink for playback`).
- Antennas/head animate when the model is speaking, so audio output reaches the speaker.
- Microphone slider in the dashboard reports activity when toggled, so the device is exposed.
- Identical behaviour in two parallel sessions (after fresh install, after daemon restart).

## What does not work

- No user-turn events in the daemon logs while I am speaking.
- The model never replies to anything I say; it only generates autonomous
  observations of the camera scene.
- Same symptom whether I sit very close to the robot or further away.
- Speaker volume and microphone volume both at 100% in the dashboard.

## Daemon logs (representative window)

```
reachy_mini.media.media_server - INFO - Using ALSA device reachymini_audio_sink for playback.
... uvicorn pings, GST WebRTC pongs ...
reachy_mini.apps.manager.runner - WARNING - Tool call received — tool_name='camera', call_id=..., is_idle=True, args={
  "question": "What interesting detail can we notice in the background?"
}
reachy_mini.apps.manager.runner - WARNING - Started background tool: camera (id=call_W1MYMJRxQr7Z0bmv)
reachy_mini.apps.manager.runner - WARNING - Tool call: camera question=What interesting detail can we notice in the background?
reachy_mini.apps.manager.runner - WARNING - Tool 'camera' (id=call_W1MYMJRxQr7Z0bmv) executed successfully.
reachy_mini.apps.manager.runner - WARNING - Added camera image to conversation
```

This pattern repeats every ~15 seconds with different self-generated questions.
Never any `transcript`/`input_audio_buffer.committed`/`response.audio_transcript.delta`
events. The session looks alive on the OpenAI side (camera tool call goes
through and image is attached) but the audio stream from the Reachy microphone
never produces a transcribable user turn.

## Independent diagnostic — what the mic ALSA layer reports

I built a small diagnostic app (Python, runs inside the Reachy app sandbox) that
runs `arecord`/`amixer` and tries every plausible capture device:

```
card 2: Audio [Reachy Mini Audio], device 0: USB Audio
'Headset' Capture 60 [100%] [0.00dB] [on]            ← unmuted, max gain

[record default]                    FAIL: Device or resource busy
[record plughw:2,0]                 FAIL: Device or resource busy
[record reachymini_audio_src]       FAIL: Channels count non available
[record plug:reachymini_audio_src]  OK peak=0 (0%) rms=0    ← opens but pure silence
```

So the USB capture device is held exclusively by the daemon's GStreamer
pipeline, and the only path that's open to userland (`plug:reachymini_audio_src`)
returns digital silence. From inside an app it is impossible to capture audio
in parallel — which is fine if the daemon's pipeline is the one feeding
OpenAI Realtime, but in my case that pipeline produces no usable speech for
the model either.

## Things I have already tried

- Multiple uninstall/reinstall of `reachy_mini_conversation_app`.
- Daemon and app restart (full power cycle of the Wi-Fi robot once).
- Toggling the dashboard `LISTEN` button.
- Toggling microphone volume slider 0% → 100% several times.
- Switching personalities and voices.
- Running my own minimal app (`sparky_mini`) using the SDK
  `media_manager.audio.start_recording()` + `get_audio_sample()` — it returns
  96000 samples per 6 seconds at peak=0.0000 (pure silence).
- Verifying the app stays alive (no crashes after I added the proper
  `if __name__ == "__main__"` block and `wrapped_run()`).

## Questions for Pollen

1. Is there a known issue where the daemon's GStreamer mic pipeline runs but
   produces silence (peak=0) on the Wi-Fi version of Reachy Mini?
2. Is there a way from the dashboard to see the microphone *real-time level*
   (a VU meter) so an end user can confirm the hardware actually captures
   anything at all? The slider only exposes volume but no signal indication.
3. Is the conversation app expected to work on the bundled OpenAI access for
   end users, or do users need their own `OPENAI_API_KEY` for OpenAI Realtime?
4. Any debug flag to dump the raw input audio buffer that the conversation app
   sends to OpenAI Realtime, so we can confirm whether non-zero audio is
   reaching the API?

Happy to run any additional diagnostics — I have full app-level access on the
robot via the dashboard and via custom apps. Just don't have SSH.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversation app on Wi-Fi Reachy Mini: OpenAI Realtime never receives user audio (model stays idle, autonomous camera tool calls) #337

Summary

Environment

What works

What does not work

Daemon logs (representative window)

Independent diagnostic — what the mic ALSA layer reports

Things I have already tried

Questions for Pollen

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Conversation app on Wi-Fi Reachy Mini: OpenAI Realtime never receives user audio (model stays idle, autonomous camera tool calls) #337

Description

Summary

Environment

What works

What does not work

Daemon logs (representative window)

Independent diagnostic — what the mic ALSA layer reports

Things I have already tried

Questions for Pollen

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions