Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RealtimeClient & RealtimeAPI to specify transport of either WebRTC (default) or WebSocket #99

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

paulpv
Copy link

@paulpv paulpv commented Feb 7, 2025

Example of how to use

RealtimeAPI:

    const connectionType = RealtimeTransportType.WEBRTC; // or `RealtimeTransportType.WEBSOCKET`

    realtime = new RealtimeAPI({
        transportType: connectionType,
        apiKey: dangerousApiKey,
        dangerouslyAllowAPIKeyInBrowser: true,
        debug: debugRealtimeApi,
    });

    switch (connectionType) {
        case RealtimeTransportType.WEBRTC:
            setAudioOutputCallback = (audioSource) => {
                audioControl.srcObject = audioSource;
            };
            getMicrophoneCallback = async () => {
                const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
                microphone = ms.getAudioTracks()[0];
                microphone.enabled = false;
                return microphone;
            };
            break;
        case RealtimeTransportType.WEBSOCKET:
            wavStreamPlayer = new WavStreamPlayer({ sampleRate: 24000 });
            await wavStreamPlayer.connect();
            realtime.on('server.response.audio.delta', (event) => {
                const itemId = event.item_id;
                const delta = event.delta;
                const arrayBuffer = RealtimeUtils.base64ToArrayBuffer(delta);
                wavStreamPlayer.add16BitPCM(arrayBuffer, itemId);
            });
            inputAudioBuffer = new Int16Array(0);
            wavRecorder = new WavRecorder({ sampleRate: 24000 });
            await wavRecorder.begin();
            await wavRecorder.record((data) => appendInputAudio(data.mono)); // copy `appendInputAudio` from client
            break;
        default:
            throw new Error(`Unknown connection type: "${connectionType}"`);
    }

    const sessionConfig = {
        model: 'gpt-4o-mini-realtime-preview',
        voice: 'ash',
        turn_detection: null,
    };

    await realtime.connect({ sessionConfig, setAudioOutputCallback, getMicrophoneCallback });
    await realtime.updateSession(sessionConfig);

RealtimeClient:

    const connectionType = RealtimeTransportType.WEBRTC; // or `RealtimeTransportType.WEBSOCKET`

    client = new RealtimeClient({
        transportType: connectionType,
        apiKey: dangerousApiKey,
        dangerouslyAllowAPIKeyInBrowser: true,
        debug: debugRealtimeApi,
    });
    client.on('close', (data) => {
        log('close', data);
        disconnect();
    });

    switch (connectionType) {
        case RealtimeTransportType.WEBRTC:
            setAudioOutputCallback = (audioSource) => {
                audioControl.srcObject = audioSource;
            };
            getMicrophoneCallback = async () => {
                const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
                microphone = ms.getAudioTracks()[0];
                microphone.enabled = false;
                return microphone;
            };
            break;
        case RealtimeTransportType.WEBSOCKET:
            wavStreamPlayer = new WavStreamPlayer({ sampleRate: 24000 });
            await wavStreamPlayer.connect();
            client.on('conversation.updated', ({ item, delta }) => {
                if (delta?.audio) {
                    wavStreamPlayer.add16BitPCM(delta.audio, item.id);
                }
            });
            wavRecorder = new WavRecorder({ sampleRate: 24000 });
            await wavRecorder.begin();
            await wavRecorder.record((data) => client.appendInputAudio(data.mono));
            break;
        default:
            throw new Error(`Unknown connection type: "${connectionType}"`);
    }

    const sessionConfig = {
        model: 'gpt-4o-mini-realtime-preview',
        voice: 'ash',
        turn_detection: null,
    };

    await client.connect({ sessionConfig, setAudioOutputCallback, getMicrophoneCallback });
    await client.updateSession(sessionConfig);

@paulpv
Copy link
Author

paulpv commented Feb 7, 2025

⚠️ NOTE: I am making a few tweaks to RealtimeClient to make it work with both WebRTC and WebSocket...

@paulpv
Copy link
Author

paulpv commented Feb 8, 2025

Updated and confirmed that it seems to work for both WebRTC and WebSocket:

  1. Connect
  2. Send "hello" text, get audio response.
  3. Say "knock knock", get audio "who's there" response.
  4. Say "tell me a story", get audio response, send cancelResponse, audio soon stops.

@paulpv
Copy link
Author

paulpv commented Feb 11, 2025

NOTE that I have not added any tests [yet] to cover the WebRTC code. 😏

@paulpv paulpv force-pushed the webrtc branch 3 times, most recently from df6847a to 50170ae Compare April 4, 2025 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant