Skip to content

Commit 203733a

Browse files
Release voice agent API (#497)
* Agent API Early Access * adds changes for API GA * additional changes for API GA * reverts FunctionCallingMessage to FunctionCalling * resolved code review * adds InjectionRefused * updates readme with agent examples * adds 3rd party TTS options * resolves linter errors in Readme * runs make lint * readme lint fixes --------- Co-authored-by: David vonThenen <[email protected]>
1 parent 25dadca commit 203733a

File tree

23 files changed

+2800
-25
lines changed

23 files changed

+2800
-25
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ venv/
77
venv.bak/
88
.vscode/
99
.DS_Store
10+
Pipfile
11+
Pipfile.lock
1012

1113
# python artifacts
1214
__pycache__
@@ -18,3 +20,4 @@ dist/
1820
# build
1921
build/
2022
poetry.lock
23+

README.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -175,19 +175,26 @@ Before running any of these examples, then you need to take a look at the README
175175
pip install -r examples/requirements-examples.txt
176176
```
177177

178-
Text to Speech:
178+
To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example with: `python main.py` or `python3 main.py`.
179+
180+
### Agent
181+
182+
- Simple - [examples/agent/simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/simple/main.py)
183+
- Async Simple - [examples/agent/async_simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/async_simple/main.py)
184+
185+
### Text to Speech
179186

180187
- Asynchronous - [examples/text-to-speech](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/text-to-speech/rest/file/async_hello_world/main.py)
181188
- Synchronous - [examples/text-to-speech](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/text-to-speech/rest/file/hello_world/main.py)
182189

183-
Analyze Text:
190+
### Analyze Text
184191

185192
- Intent Recognition - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/intent/main.py)
186193
- Sentiment Analysis - [examples/sentiment/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/sentiment/main.py)
187194
- Summarization - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/summary/main.py)
188195
- Topic Detection - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/topic/main.py)
189196

190-
PreRecorded Audio:
197+
### PreRecorded Audio
191198

192199
- Transcription From an Audio File - [examples/prerecorded/file](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/file/main.py)
193200
- Transcription From an URL - [examples/prerecorded/url](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/url/main.py)
@@ -196,7 +203,7 @@ PreRecorded Audio:
196203
- Summarization - [examples/speech-to-text/rest/summary](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/summary/main.py)
197204
- Topic Detection - [examples/speech-to-text/rest/topic](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/topic/main.py)
198205

199-
Live Audio Transcription:
206+
### Live Audio Transcription
200207

201208
- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/stream_file/main.py)
202209
- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/async_url/main.py)
@@ -211,8 +218,6 @@ Management API exercise the full [CRUD](https://en.wikipedia.org/wiki/Create,_re
211218
- Scopes - [examples/manage/scopes](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/manage/scopes/main.py)
212219
- Usage - [examples/manage/usage](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/manage/usage/main.py)
213220

214-
To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example: `go run main.py`.
215-
216221
## Logging
217222

218223
This SDK provides logging as a means to troubleshoot and debug issues encountered. By default, this SDK will enable `Information` level messages and higher (ie `Warning`, `Error`, etc) when you initialize the library as follows:

deepgram/__init__.py

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
from .errors import DeepgramApiKeyError
3535

3636
# listen/read client
37-
from .client import Listen, Read
37+
from .client import ListenRouter, ReadRouter, SpeakRouter, AgentRouter
3838

3939
# common
4040
from .client import (
@@ -302,6 +302,60 @@
302302
AsyncSelfHostedClient,
303303
)
304304

305+
306+
# agent
307+
from .client import AgentWebSocketEvents
308+
309+
# websocket
310+
from .client import (
311+
AgentWebSocketClient,
312+
AsyncAgentWebSocketClient,
313+
)
314+
315+
from .client import (
316+
#### common websocket response
317+
# OpenResponse,
318+
# CloseResponse,
319+
# ErrorResponse,
320+
# UnhandledResponse,
321+
#### unique
322+
WelcomeResponse,
323+
SettingsAppliedResponse,
324+
ConversationTextResponse,
325+
UserStartedSpeakingResponse,
326+
AgentThinkingResponse,
327+
FunctionCalling,
328+
FunctionCallRequest,
329+
AgentStartedSpeakingResponse,
330+
AgentAudioDoneResponse,
331+
InjectionRefusedResponse,
332+
)
333+
334+
from .client import (
335+
# top level
336+
SettingsConfigurationOptions,
337+
UpdateInstructionsOptions,
338+
UpdateSpeakOptions,
339+
InjectAgentMessageOptions,
340+
FunctionCallResponse,
341+
AgentKeepAlive,
342+
# sub level
343+
Listen,
344+
Speak,
345+
Header,
346+
Item,
347+
Properties,
348+
Parameters,
349+
Function,
350+
Provider,
351+
Think,
352+
Agent,
353+
Input,
354+
Output,
355+
Audio,
356+
Context,
357+
)
358+
305359
# utilities
306360
# pylint: disable=wrong-import-position
307361
from .audio import Microphone, DeepgramMicrophoneError

deepgram/audio/microphone/microphone.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import logging
1010

1111
from ...utils import verboselogs
12+
1213
from .constants import LOGGING, CHANNELS, RATE, CHUNK
1314

1415
if TYPE_CHECKING:

deepgram/audio/speaker/speaker.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ class Speaker: # pylint: disable=too-many-instance-attributes
5050
# _asyncio_loop: asyncio.AbstractEventLoop
5151
# _asyncio_thread: threading.Thread
5252
_receiver_thread: Optional[threading.Thread] = None
53-
5453
_loop: Optional[asyncio.AbstractEventLoop] = None
5554

5655
_push_callback_org: Optional[Callable] = None
@@ -265,6 +264,7 @@ async def _start_asyncio_receiver(self):
265264
await self._push_callback(message)
266265
elif isinstance(message, bytes):
267266
self._logger.verbose("Received audio data...")
267+
await self._push_callback(message)
268268
self.add_audio_to_queue(message)
269269
except websockets.exceptions.ConnectionClosedOK as e:
270270
self._logger.debug("send() exiting gracefully: %d", e.code)
@@ -297,6 +297,7 @@ def _start_threaded_receiver(self):
297297
self._push_callback(message)
298298
elif isinstance(message, bytes):
299299
self._logger.verbose("Received audio data...")
300+
self._push_callback(message)
300301
self.add_audio_to_queue(message)
301302
except Exception as e: # pylint: disable=broad-except
302303
self._logger.notice("_start_threaded_receiver exception: %s", str(e))
@@ -365,6 +366,7 @@ def _play(self, audio_out, stream, stop):
365366
"LastPlay delta is greater than threshold. Unmute!"
366367
)
367368
self._microphone.unmute()
369+
368370
data = audio_out.get(True, TIMEOUT)
369371
with self._lock_wait:
370372
self._last_datagram = datetime.now()

deepgram/client.py

Lines changed: 66 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
)
5656

5757
# listen client
58-
from .clients import Listen, Read, Speak
58+
from .clients import ListenRouter, ReadRouter, SpeakRouter, AgentRouter
5959

6060
# speech-to-text
6161
from .clients import LiveClient, AsyncLiveClient # backward compat
@@ -308,6 +308,61 @@
308308
AsyncSelfHostedClient,
309309
)
310310

311+
312+
# agent
313+
from .clients import AgentWebSocketEvents
314+
315+
# websocket
316+
from .clients import (
317+
AgentWebSocketClient,
318+
AsyncAgentWebSocketClient,
319+
)
320+
321+
from .clients import (
322+
#### common websocket response
323+
# OpenResponse,
324+
# CloseResponse,
325+
# ErrorResponse,
326+
# UnhandledResponse,
327+
#### unique
328+
WelcomeResponse,
329+
SettingsAppliedResponse,
330+
ConversationTextResponse,
331+
UserStartedSpeakingResponse,
332+
AgentThinkingResponse,
333+
FunctionCalling,
334+
FunctionCallRequest,
335+
AgentStartedSpeakingResponse,
336+
AgentAudioDoneResponse,
337+
InjectionRefusedResponse,
338+
)
339+
340+
from .clients import (
341+
# top level
342+
SettingsConfigurationOptions,
343+
UpdateInstructionsOptions,
344+
UpdateSpeakOptions,
345+
InjectAgentMessageOptions,
346+
FunctionCallResponse,
347+
AgentKeepAlive,
348+
# sub level
349+
Listen,
350+
Speak,
351+
Header,
352+
Item,
353+
Properties,
354+
Parameters,
355+
Function,
356+
Provider,
357+
Think,
358+
Agent,
359+
Input,
360+
Output,
361+
Audio,
362+
Context,
363+
)
364+
365+
311366
# client errors and options
312367
from .options import DeepgramClientOptions, ClientOptionsFromEnv
313368
from .errors import DeepgramApiKeyError
@@ -397,21 +452,21 @@ def listen(self):
397452
"""
398453
Returns a Listen dot-notation router for interacting with Deepgram's transcription services.
399454
"""
400-
return Listen(self._config)
455+
return ListenRouter(self._config)
401456

402457
@property
403458
def read(self):
404459
"""
405460
Returns a Read dot-notation router for interacting with Deepgram's read services.
406461
"""
407-
return Read(self._config)
462+
return ReadRouter(self._config)
408463

409464
@property
410465
def speak(self):
411466
"""
412467
Returns a Speak dot-notation router for interacting with Deepgram's speak services.
413468
"""
414-
return Speak(self._config)
469+
return SpeakRouter(self._config)
415470

416471
@property
417472
@deprecation.deprecated(
@@ -480,6 +535,13 @@ def asyncselfhosted(self):
480535
"""
481536
return self.Version(self._config, "asyncselfhosted")
482537

538+
@property
539+
def agent(self):
540+
"""
541+
Returns a Agent dot-notation router for interacting with Deepgram's speak services.
542+
"""
543+
return AgentRouter(self._config)
544+
483545
# INTERNAL CLASSES
484546
class Version:
485547
"""

deepgram/clients/__init__.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,10 @@
4848
)
4949
from .errors import DeepgramModuleError
5050

51-
from .listen_router import Listen
52-
from .read_router import Read
53-
from .speak_router import Speak
51+
from .listen_router import ListenRouter
52+
from .read_router import ReadRouter
53+
from .speak_router import SpeakRouter
54+
from .agent_router import AgentRouter
5455

5556
# listen
5657
from .listen import LiveTranscriptionEvents
@@ -318,3 +319,56 @@
318319
SelfHostedClient,
319320
AsyncSelfHostedClient,
320321
)
322+
323+
# agent
324+
from .agent import AgentWebSocketEvents
325+
326+
# websocket
327+
from .agent import (
328+
AgentWebSocketClient,
329+
AsyncAgentWebSocketClient,
330+
)
331+
332+
from .agent import (
333+
#### common websocket response
334+
# OpenResponse,
335+
# CloseResponse,
336+
# ErrorResponse,
337+
# UnhandledResponse,
338+
#### unique
339+
WelcomeResponse,
340+
SettingsAppliedResponse,
341+
ConversationTextResponse,
342+
UserStartedSpeakingResponse,
343+
AgentThinkingResponse,
344+
FunctionCalling,
345+
FunctionCallRequest,
346+
AgentStartedSpeakingResponse,
347+
AgentAudioDoneResponse,
348+
InjectionRefusedResponse,
349+
)
350+
351+
from .agent import (
352+
# top level
353+
SettingsConfigurationOptions,
354+
UpdateInstructionsOptions,
355+
UpdateSpeakOptions,
356+
InjectAgentMessageOptions,
357+
FunctionCallResponse,
358+
AgentKeepAlive,
359+
# sub level
360+
Listen,
361+
Speak,
362+
Header,
363+
Item,
364+
Properties,
365+
Parameters,
366+
Function,
367+
Provider,
368+
Think,
369+
Agent,
370+
Input,
371+
Output,
372+
Audio,
373+
Context,
374+
)

0 commit comments

Comments
 (0)