-
Notifications
You must be signed in to change notification settings - Fork 33
Add WebRTC option #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Diff of [my 1% altered] api.js to api_webrtc.js: pv@Pauls-MacBook-Pro-M4-Pro openai-realtime-js % diff -y -b --suppress-common-lines api.js api_webrtc.js export class RealtimeAPI extends RealtimeEventHandler { | export class RealtimeApiWebRTC extends RealtimeEventHandler {
* Create a new RealtimeAPI instance | * Create a new RealtimeClientWebRTC instance
* @returns {RealtimeAPI} | * @returns {RealtimeClientWebRTC}
this.defaultUrl = 'wss://api.openai.com/v1/realtime'; | this.defaultUrl = 'https://api.openai.com/v1/realtime
this.ws = null; | this.peerConnection = null;
> this.dataChannel = null;
* Tells us whether or not the WebSocket is connected | * Tells us whether or not the WebRTC is connected
return !!this.ws; | return !!this.peerConnection;
* Writes WebSocket logs to console | * Writes WebRTC logs to console
> if (this.debug) {
const logs = [`[Websocket/${date}]`].concat(args).map((ar | const logs = [`[WebRTC/${date}]`].concat(args).ma
if (this.debug) { <
* Connects to Realtime API Websocket Server | * Connects to Realtime API WebRTC Server
async connect({ model } = { model: 'gpt-4o-realtime-preview | async connect(sessionConfig = { model: 'gpt-4o-realtime-p
> getMicrophoneCallback,
> setAudioOutputCallback,
> ) {
> sessionConfig = {
> model: 'gpt-4o-realtime-preview-2024-12-17',
> voice: 'verse',
> ...sessionConfig,
> };
> log(`connect(sessionConfig=${JSON.stringify(sessionCo
if (globalThis.WebSocket) { <
/** <
* Web browser <
*/ <
const WebSocket = globalThis.WebSocket; | const emphemeralApiToken = await this._requestEphemer
const ws = new WebSocket(`${this.url}${model ? `?model= | await this._init(emphemeralApiToken, sessionConfig.mo
'realtime', | }
`openai-insecure-api-key.${this.apiKey}`, |
'openai-beta.realtime-v1', <
]); <
ws.addEventListener('message', (event) => { <
const message = JSON.parse(event.data); <
this.receive(message.type, message); <
}); <
return new Promise((resolve, reject) => { <
const connectionErrorHandler = () => { <
this.disconnect(ws); <
reject(new Error(`Could not connect to "${this.url} <
}; <
ws.addEventListener('error', connectionErrorHandler); <
ws.addEventListener('open', () => { <
this.log(`Connected to "${this.url}"`); <
ws.removeEventListener('error', connectionErrorHand <
ws.addEventListener('error', () => { <
this.disconnect(ws); <
this.log(`Error, disconnected from "${this.url}"` <
this.dispatch('close', { error: true }); <
}); <
ws.addEventListener('close', () => { <
this.disconnect(ws); <
this.log(`Disconnected from "${this.url}"`); <
this.dispatch('close', { error: false }); <
}); <
this.ws = ws; <
resolve(true); <
}); <
}); <
} else { <
* Node.js | * Initially from:
> * https://platform.openai.com/docs/guides/realtime-webrt
const moduleName = 'ws'; | async _requestEphemeralApiToken(dangerousApiKey, sessionC
const wsModule = await import(/* webpackIgnore: true */ | const r = await fetch(`${this.url}/sessions`, {
const WebSocket = wsModule.default; | method: 'POST',
const ws = new WebSocket( | headers: {
'wss://api.openai.com/v1/realtime?model=gpt-4o-realti | 'Authorization': `Bearer ${dangerousApiKey}`,
[], | 'Content-Type': 'application/json',
{ <
finishRequest: (request) => { <
// Auth <
request.setHeader('Authorization', `Bearer ${this <
request.setHeader('OpenAI-Beta', 'realtime=v1'); <
request.end(); <
}, | body: JSON.stringify(sessionConfig),
); <
ws.on('message', (data) => { <
const message = JSON.parse(data.toString()); <
this.receive(message.type, message); <
return new Promise((resolve, reject) => { | const data = await r.json();
const connectionErrorHandler = () => { | return data.client_secret.value;
this.disconnect(ws); | }
reject(new Error(`Could not connect to "${this.url} |
}; | /**
ws.on('error', connectionErrorHandler); | * Initially from:
ws.on('open', () => { | * https://platform.openai.com/docs/guides/realtime-webrt
this.log(`Connected to "${this.url}"`); | */
ws.removeListener('error', connectionErrorHandler); | async _init(ephemeralApiToken, model, getMicrophoneCallba
ws.on('error', () => { | log(`init(...)`);
this.disconnect(ws); | this.peerConnection = new RTCPeerConnection();
this.log(`Error, disconnected from "${this.url}"` |
this.dispatch('close', { error: true }); | this.peerConnection.addTrack(await getMicrophoneCallb
> this.peerConnection.ontrack = (e) => setAudioOutputCa
>
> return new Promise(async (resolve, reject) => {
> const dataChannel = this.peerConnection?.createDa
> if (!dataChannel) {
> reject(new Error('dataChannel == null'));
> return;
> }
> dataChannel.addEventListener('open', () => {
> log('Data channel is open');
> this.dataChannel = dataChannel;
> resolve(true);
ws.on('close', () => { | dataChannel.addEventListener('closing', () => {
this.disconnect(ws); | log('Data channel is closing');
this.log(`Disconnected from "${this.url}"`); | });
> dataChannel.addEventListener('close', () => {
> this.disconnect();
> log('Data channel is closed');
this.ws = ws; | dataChannel.addEventListener('message', (e) => {
resolve(true); | const message = JSON.parse(e.data);
> this.receive(message.type, message);
>
> // Start the session using the Session Descriptio
> const offer = await this.peerConnection?.createOf
> if (!offer) {
> reject(new Error('offer == null'));
> return;
> }
> await this.peerConnection?.setLocalDescription(of
> const sdpResponse = await fetch(`${this.url}?mode
> method: 'POST',
> body: offer.sdp,
> headers: {
> Authorization: `Bearer ${ephemeralApiToke
> 'Content-Type': 'application/sdp'
> },
> await this.peerConnection?.setRemoteDescription({
> type: 'answer',
> sdp: await sdpResponse.text(),
> });
> });
} <
* @param {WebSocket} [ws] <
* @returns {true} <
disconnect(ws) { | disconnect() {
if (!ws || this.ws === ws) { | log('disconnect()');
this.ws && this.ws.close(); | if (this.dataChannel) {
this.ws = null; | this.dataChannel.close();
return true; | this.dataChannel = null;
> if (this.peerConnection) {
> this.peerConnection.close();
> this.peerConnection = null;
> }
* Receives an event from WebSocket and dispatches as "serv | * Receives an event from WebRTC and dispatches as "serve
if (this.debug) { <
if (eventName === 'response.audio.delta') { <
const delta = event.delta; <
this.log(`received:`, eventName, { ...event, delta: d <
} else { <
} <
} <
* Sends an event to WebSocket and dispatches as "client.{e | * Sends an event to WebRTC and dispatches as "client.{ev
if (this.debug) { | this.log(`sent:`, eventName, event);
if (eventName === 'input_audio_buffer.append') { | this.dataChannel.send(JSON.stringify(event));
const audio = event.audio; <
this.log(`sending:`, eventName, { ...event, audio: au <
} else { <
this.log(`sending:`, eventName, event); <
} <
} <
this.ws.send(JSON.stringify(event)); < |
Looks like longseespace Now that I think about this more, I recommend:
|
I implemented what I am thinking in [weak] JavaScript and submitted it as a PR to OpenAI: They have a backlog of PRs, so I don't expect them to take this anytime soon. For this [more "agile"] repo, someone better than I is free to improve the JavaScript and/or port this to TypeScript. |
I just stumbled across your excellent refactor of https://github.com/openai/openai-realtime-api-beta .
Before finding you...
While writing my https://github.com/swooby/AlfredAI native Android Phone/Wear app that uses my own https://github.com/swooby/openai-openapi-kotlin Kotlin OpenAI Realtime API lib, I think I started to run into some inconsistencies with how my WebRTC implementation behaves versus how I see the plethora of JavaScript WebSocket implementations behave.
I decided I "needed" a good stable JavaScript test app that I could easily A/B toggle between WebRTC and WebSocket to compare/contrast behavior with.
I came up with this:
https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/index.html
It is fugly, but it [mostly] works for my A/B testing purposes.
But, while writing that I also wanted to implement a WebRTC version of:
https://github.com/openai/openai-realtime-api-beta/blob/main/lib/api.js
I came up with this:
https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/api_webrtc.js
It implements
RealtimeApiWebRTC
, which is a near drop in replacement for OpenAI's originalRealtimeAPI
.I show how to flip between the two in:
https://github.com/swooby/AlfredAI/blob/3a3ffb31c44a8462313dacfa5111d789f0a887ae/openai-realtime-js/index.js#L196-L259
THEN I FOUND THIS/YOUR EXCELLENT REPO!
You are more than welcome to TypeScript and clean up my https://github.com/swooby/AlfredAI/blob/main/openai-realtime-js/api_webrtc.js and do the needful to implement a WebRTC version of your excellent looking lib!
I did not go the extra distance and make sure that it fully works with RealtimeClient, but that would need to be tweaked a little to be passed a parameter of what connection type to use.
Basing the logic off of whether the given url started with
wss://
seemed like a reasonable way to attempt to auto-detect what connection type to use.Let me know how/if I can help.
🍻
The text was updated successfully, but these errors were encountered: