Skip to content

Add WebRTC option #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
paulpv opened this issue Feb 7, 2025 · 3 comments
Open

Add WebRTC option #12

paulpv opened this issue Feb 7, 2025 · 3 comments

Comments

@paulpv
Copy link

paulpv commented Feb 7, 2025

I just stumbled across your excellent refactor of https://github.com/openai/openai-realtime-api-beta .

Before finding you...

While writing my https://github.com/swooby/AlfredAI native Android Phone/Wear app that uses my own https://github.com/swooby/openai-openapi-kotlin Kotlin OpenAI Realtime API lib, I think I started to run into some inconsistencies with how my WebRTC implementation behaves versus how I see the plethora of JavaScript WebSocket implementations behave.

I decided I "needed" a good stable JavaScript test app that I could easily A/B toggle between WebRTC and WebSocket to compare/contrast behavior with.

I came up with this:
https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/index.html

Image Image

It is fugly, but it [mostly] works for my A/B testing purposes.

But, while writing that I also wanted to implement a WebRTC version of:
https://github.com/openai/openai-realtime-api-beta/blob/main/lib/api.js

I came up with this:
https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/api_webrtc.js

It implements RealtimeApiWebRTC, which is a near drop in replacement for OpenAI's original RealtimeAPI.

I show how to flip between the two in:
https://github.com/swooby/AlfredAI/blob/3a3ffb31c44a8462313dacfa5111d789f0a887ae/openai-realtime-js/index.js#L196-L259

THEN I FOUND THIS/YOUR EXCELLENT REPO!

You are more than welcome to TypeScript and clean up my https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/api_webrtc.js and do the needful to implement a WebRTC version of your excellent looking lib!

I did not go the extra distance and make sure that it fully works with RealtimeClient, but that would need to be tweaked a little to be passed a parameter of what connection type to use.

Basing the logic off of whether the given url started with wss:// seemed like a reasonable way to attempt to auto-detect what connection type to use.

Let me know how/if I can help.

🍻

@paulpv
Copy link
Author

paulpv commented Feb 7, 2025

Diff of [my 1% altered] api.js to api_webrtc.js:

pv@Pauls-MacBook-Pro-M4-Pro openai-realtime-js % diff -y -b --suppress-common-lines api.js api_webrtc.js 
export class RealtimeAPI extends RealtimeEventHandler {       | export class RealtimeApiWebRTC extends RealtimeEventHandler {
   * Create a new RealtimeAPI instance                        |      * Create a new RealtimeClientWebRTC instance
   * @returns {RealtimeAPI}                                   |      * @returns {RealtimeClientWebRTC}
    this.defaultUrl = 'wss://api.openai.com/v1/realtime';     |         this.defaultUrl = 'https://api.openai.com/v1/realtime
    this.ws = null;                                           |         this.peerConnection = null;
                                                              >         this.dataChannel = null;
   * Tells us whether or not the WebSocket is connected       |      * Tells us whether or not the WebRTC is connected
    return !!this.ws;                                         |         return !!this.peerConnection;
   * Writes WebSocket logs to console                         |      * Writes WebRTC logs to console
                                                              >         if (this.debug) {
    const logs = [`[Websocket/${date}]`].concat(args).map((ar |             const logs = [`[WebRTC/${date}]`].concat(args).ma
    if (this.debug) {                                         <
   * Connects to Realtime API Websocket Server                |      * Connects to Realtime API WebRTC Server
  async connect({ model } = { model: 'gpt-4o-realtime-preview |     async connect(sessionConfig = { model: 'gpt-4o-realtime-p
                                                              >         getMicrophoneCallback,
                                                              >         setAudioOutputCallback,
                                                              >     ) {
                                                              >         sessionConfig = {
                                                              >             model: 'gpt-4o-realtime-preview-2024-12-17',
                                                              >             voice: 'verse',
                                                              >             ...sessionConfig,
                                                              >         };
                                                              >         log(`connect(sessionConfig=${JSON.stringify(sessionCo
    if (globalThis.WebSocket) {                               <
      /**                                                     <
       * Web browser                                          <
       */                                                     <
      const WebSocket = globalThis.WebSocket;                 |         const emphemeralApiToken = await this._requestEphemer
      const ws = new WebSocket(`${this.url}${model ? `?model= |         await this._init(emphemeralApiToken, sessionConfig.mo
        'realtime',                                           |     }
        `openai-insecure-api-key.${this.apiKey}`,             |
        'openai-beta.realtime-v1',                            <
      ]);                                                     <
      ws.addEventListener('message', (event) => {             <
        const message = JSON.parse(event.data);               <
        this.receive(message.type, message);                  <
      });                                                     <
      return new Promise((resolve, reject) => {               <
        const connectionErrorHandler = () => {                <
          this.disconnect(ws);                                <
          reject(new Error(`Could not connect to "${this.url} <
        };                                                    <
        ws.addEventListener('error', connectionErrorHandler); <
        ws.addEventListener('open', () => {                   <
          this.log(`Connected to "${this.url}"`);             <
          ws.removeEventListener('error', connectionErrorHand <
          ws.addEventListener('error', () => {                <
            this.disconnect(ws);                              <
            this.log(`Error, disconnected from "${this.url}"` <
            this.dispatch('close', { error: true });          <
          });                                                 <
          ws.addEventListener('close', () => {                <
            this.disconnect(ws);                              <
            this.log(`Disconnected from "${this.url}"`);      <
            this.dispatch('close', { error: false });         <
          });                                                 <
          this.ws = ws;                                       <
          resolve(true);                                      <
        });                                                   <
      });                                                     <
    } else {                                                  <
       * Node.js                                              |      * Initially from:
                                                              >      * https://platform.openai.com/docs/guides/realtime-webrt
      const moduleName = 'ws';                                |     async _requestEphemeralApiToken(dangerousApiKey, sessionC
      const wsModule = await import(/* webpackIgnore: true */ |         const r = await fetch(`${this.url}/sessions`, {
      const WebSocket = wsModule.default;                     |             method: 'POST',
      const ws = new WebSocket(                               |             headers: {
        'wss://api.openai.com/v1/realtime?model=gpt-4o-realti |                 'Authorization': `Bearer ${dangerousApiKey}`,
        [],                                                   |                 'Content-Type': 'application/json',
        {                                                     <
          finishRequest: (request) => {                       <
            // Auth                                           <
            request.setHeader('Authorization', `Bearer ${this <
            request.setHeader('OpenAI-Beta', 'realtime=v1');  <
            request.end();                                    <
        },                                                    |             body: JSON.stringify(sessionConfig),
      );                                                      <
      ws.on('message', (data) => {                            <
        const message = JSON.parse(data.toString());          <
        this.receive(message.type, message);                  <
      return new Promise((resolve, reject) => {               |         const data = await r.json();
        const connectionErrorHandler = () => {                |         return data.client_secret.value;
          this.disconnect(ws);                                |     }
          reject(new Error(`Could not connect to "${this.url} |
        };                                                    |     /**
        ws.on('error', connectionErrorHandler);               |      * Initially from:
        ws.on('open', () => {                                 |      * https://platform.openai.com/docs/guides/realtime-webrt
          this.log(`Connected to "${this.url}"`);             |      */
          ws.removeListener('error', connectionErrorHandler); |     async _init(ephemeralApiToken, model, getMicrophoneCallba
          ws.on('error', () => {                              |         log(`init(...)`);
            this.disconnect(ws);                              |         this.peerConnection = new RTCPeerConnection();
            this.log(`Error, disconnected from "${this.url}"` |
            this.dispatch('close', { error: true });          |         this.peerConnection.addTrack(await getMicrophoneCallb
                                                              >         this.peerConnection.ontrack = (e) => setAudioOutputCa
                                                              >
                                                              >         return new Promise(async (resolve, reject) => {
                                                              >             const dataChannel = this.peerConnection?.createDa
                                                              >             if (!dataChannel) {
                                                              >                 reject(new Error('dataChannel == null'));
                                                              >                 return;
                                                              >             }
                                                              >             dataChannel.addEventListener('open', () => {
                                                              >                 log('Data channel is open');
                                                              >                 this.dataChannel = dataChannel;
                                                              >                 resolve(true);
          ws.on('close', () => {                              |             dataChannel.addEventListener('closing', () => {
            this.disconnect(ws);                              |                 log('Data channel is closing');
            this.log(`Disconnected from "${this.url}"`);      |             });
                                                              >             dataChannel.addEventListener('close', () => {
                                                              >                 this.disconnect();
                                                              >                 log('Data channel is closed');
          this.ws = ws;                                       |             dataChannel.addEventListener('message', (e) => {
          resolve(true);                                      |                 const message = JSON.parse(e.data);
                                                              >                 this.receive(message.type, message);
                                                              >
                                                              >             // Start the session using the Session Descriptio
                                                              >             const offer = await this.peerConnection?.createOf
                                                              >             if (!offer) {
                                                              >                 reject(new Error('offer == null'));
                                                              >                 return;
                                                              >             }
                                                              >             await this.peerConnection?.setLocalDescription(of
                                                              >             const sdpResponse = await fetch(`${this.url}?mode
                                                              >                 method: 'POST',
                                                              >                 body: offer.sdp,
                                                              >                 headers: {
                                                              >                     Authorization: `Bearer ${ephemeralApiToke
                                                              >                     'Content-Type': 'application/sdp'
                                                              >                 },
                                                              >             await this.peerConnection?.setRemoteDescription({
                                                              >                 type: 'answer',
                                                              >                 sdp: await sdpResponse.text(),
                                                              >             });
                                                              >         });
  }                                                           <
   * @param {WebSocket} [ws]                                  <
   * @returns {true}                                          <
  disconnect(ws) {                                            |     disconnect() {
    if (!ws || this.ws === ws) {                              |         log('disconnect()');
      this.ws && this.ws.close();                             |         if (this.dataChannel) {
      this.ws = null;                                         |             this.dataChannel.close();
      return true;                                            |             this.dataChannel = null;
                                                              >         if (this.peerConnection) {
                                                              >             this.peerConnection.close();
                                                              >             this.peerConnection = null;
                                                              >     }
   * Receives an event from WebSocket and dispatches as "serv |      * Receives an event from WebRTC and dispatches as "serve
    if (this.debug) {                                         <
      if (eventName === 'response.audio.delta') {             <
        const delta = event.delta;                            <
        this.log(`received:`, eventName, { ...event, delta: d <
      } else {                                                <
      }                                                       <
    }                                                         <
   * Sends an event to WebSocket and dispatches as "client.{e |      * Sends an event to WebRTC and dispatches as "client.{ev
    if (this.debug) {                                         |         this.log(`sent:`, eventName, event);
      if (eventName === 'input_audio_buffer.append') {        |         this.dataChannel.send(JSON.stringify(event));
        const audio = event.audio;                            <
        this.log(`sending:`, eventName, { ...event, audio: au <
      } else {                                                <
        this.log(`sending:`, eventName, event);               <
      }                                                       <
    }                                                         <
    this.ws.send(JSON.stringify(event));                      <

@paulpv
Copy link
Author

paulpv commented Feb 7, 2025

Looks like longseespace add support for WebRTCClient has already done something like this

Now that I think about this more, I recommend:

  1. Leave RealtimeClient almost entirely alone and common to both WebRTC and WebSocket (ie: not create a separate WebRTCClient class that may duplicate much of RealtimeClient like longseespace did).
  2. NOT create a separate RealtimeApiWebRTC class that 80% duplicates RealtimeAPI like I did [for my JavaScript code].
  3. Would rather have RealtimeTransportWebRTC and RealtimeTransportWebSocket that RealtimeAPI takes a construction parameter to use (like my RealtimeClient.kt class; I chose [maybe unwisely] to not abstract out an equivalent RealtimeAPI in my Kotlin implementation).
    Something like what works fairly well for my Kotlin project at https://github.com/swooby/AlfredAI/tree/main/shared/src/main/java/com/swooby/alfredai/openai/realtime :

@paulpv
Copy link
Author

paulpv commented Feb 7, 2025

I implemented what I am thinking in [weak] JavaScript and submitted it as a PR to OpenAI:
openai/openai-realtime-api-beta#99

They have a backlog of PRs, so I don't expect them to take this anytime soon.

For this [more "agile"] repo, someone better than I is free to improve the JavaScript and/or port this to TypeScript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant