Skip to content

Conversation app on Wi-Fi Reachy Mini: OpenAI Realtime never receives user audio (model stays idle, autonomous camera tool calls) #337

@francescotripepi

Description

@francescotripepi

Summary

The Reachy Mini Conversation App connects successfully to OpenAI Realtime, the
camera tool fires, the speaker plays audio — but the user's voice is never
recognized as input
. The model stays in idle and autonomously calls the
camera tool with self-generated questions ("What interesting detail can we
notice in the background?"), as if it never receives any user speech.

I can talk loudly 20–30 cm from the robot for 30+ seconds and see no transcript
events, no response.created from the user turn, only is_idle=True tool calls.

Environment

  • Robot: Reachy Mini (Wi-Fi version)
  • Daemon: v1.7.0, App: v0.9.29 (latest at the time of writing)
  • OS: Reachy Mini OS (whatever ships installed by Pollen)
  • Network: Robot connected to my home Wi-Fi (xft), IP 10.1.0.34
  • HF account: signed in via the dashboard
  • Backend: OpenAI Realtime with the bundled OpenAI access (status: Connected)
  • Personality: default profile, voice marin (also tried switching voice — got
    "Failed to apply voice: Load failed", but base session was still healthy)

What works

  • Camera capture works (tool 'camera' executed successfully + Added camera image to conversation).
  • Speaker playback works (Using ALSA device reachymini_audio_sink for playback).
  • Antennas/head animate when the model is speaking, so audio output reaches the speaker.
  • Microphone slider in the dashboard reports activity when toggled, so the device is exposed.
  • Identical behaviour in two parallel sessions (after fresh install, after daemon restart).

What does not work

  • No user-turn events in the daemon logs while I am speaking.
  • The model never replies to anything I say; it only generates autonomous
    observations of the camera scene.
  • Same symptom whether I sit very close to the robot or further away.
  • Speaker volume and microphone volume both at 100% in the dashboard.

Daemon logs (representative window)

reachy_mini.media.media_server - INFO - Using ALSA device reachymini_audio_sink for playback.
... uvicorn pings, GST WebRTC pongs ...
reachy_mini.apps.manager.runner - WARNING - Tool call received — tool_name='camera', call_id=..., is_idle=True, args={
  "question": "What interesting detail can we notice in the background?"
}
reachy_mini.apps.manager.runner - WARNING - Started background tool: camera (id=call_W1MYMJRxQr7Z0bmv)
reachy_mini.apps.manager.runner - WARNING - Tool call: camera question=What interesting detail can we notice in the background?
reachy_mini.apps.manager.runner - WARNING - Tool 'camera' (id=call_W1MYMJRxQr7Z0bmv) executed successfully.
reachy_mini.apps.manager.runner - WARNING - Added camera image to conversation

This pattern repeats every ~15 seconds with different self-generated questions.
Never any transcript/input_audio_buffer.committed/response.audio_transcript.delta
events. The session looks alive on the OpenAI side (camera tool call goes
through and image is attached) but the audio stream from the Reachy microphone
never produces a transcribable user turn.

Independent diagnostic — what the mic ALSA layer reports

I built a small diagnostic app (Python, runs inside the Reachy app sandbox) that
runs arecord/amixer and tries every plausible capture device:

card 2: Audio [Reachy Mini Audio], device 0: USB Audio
'Headset' Capture 60 [100%] [0.00dB] [on]            ← unmuted, max gain

[record default]                    FAIL: Device or resource busy
[record plughw:2,0]                 FAIL: Device or resource busy
[record reachymini_audio_src]       FAIL: Channels count non available
[record plug:reachymini_audio_src]  OK peak=0 (0%) rms=0    ← opens but pure silence

So the USB capture device is held exclusively by the daemon's GStreamer
pipeline, and the only path that's open to userland (plug:reachymini_audio_src)
returns digital silence. From inside an app it is impossible to capture audio
in parallel — which is fine if the daemon's pipeline is the one feeding
OpenAI Realtime, but in my case that pipeline produces no usable speech for
the model either.

Things I have already tried

  • Multiple uninstall/reinstall of reachy_mini_conversation_app.
  • Daemon and app restart (full power cycle of the Wi-Fi robot once).
  • Toggling the dashboard LISTEN button.
  • Toggling microphone volume slider 0% → 100% several times.
  • Switching personalities and voices.
  • Running my own minimal app (sparky_mini) using the SDK
    media_manager.audio.start_recording() + get_audio_sample() — it returns
    96000 samples per 6 seconds at peak=0.0000 (pure silence).
  • Verifying the app stays alive (no crashes after I added the proper
    if __name__ == "__main__" block and wrapped_run()).

Questions for Pollen

  1. Is there a known issue where the daemon's GStreamer mic pipeline runs but
    produces silence (peak=0) on the Wi-Fi version of Reachy Mini?
  2. Is there a way from the dashboard to see the microphone real-time level
    (a VU meter) so an end user can confirm the hardware actually captures
    anything at all? The slider only exposes volume but no signal indication.
  3. Is the conversation app expected to work on the bundled OpenAI access for
    end users, or do users need their own OPENAI_API_KEY for OpenAI Realtime?
  4. Any debug flag to dump the raw input audio buffer that the conversation app
    sends to OpenAI Realtime, so we can confirm whether non-zero audio is
    reaching the API?

Happy to run any additional diagnostics — I have full app-level access on the
robot via the dashboard and via custom apps. Just don't have SSH.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions