Skip to content

Commit 3c477f2

Browse files
authored
Merge pull request #502 from deepgram/feat/keyterms+nova-3
Feat/keyterms+nova 3
2 parents 203733a + f48103f commit 3c477f2

File tree

84 files changed

+1681
-131
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+1681
-131
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ deepgram: DeepgramClient = DeepgramClient("", ClientOptionsFromEnv())
8585

8686
## STEP 2 Call the transcribe_url method on the prerecorded class
8787
options: PrerecordedOptions = PrerecordedOptions(
88-
model="nova-2",
88+
model="nova-3",
8989
smart_format=True,
9090
)
9191
response = deepgram.listen.rest.v("1").transcribe_url(AUDIO_URL, options)
@@ -134,7 +134,7 @@ dg_connection.on(LiveTranscriptionEvents.Error, on_error)
134134
dg_connection.on(LiveTranscriptionEvents.Close, on_close)
135135

136136
options: LiveOptions = LiveOptions(
137-
model="nova-2",
137+
model="nova-3",
138138
punctuate=True,
139139
language="en-US",
140140
encoding="linear16",

deepgram/clients/agent/v1/websocket/options.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ class Listen(BaseResponse):
2323
This class defines any configuration settings for the Listen model.
2424
"""
2525

26-
model: Optional[str] = field(default="nova-2")
26+
model: Optional[str] = field(
27+
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
28+
)
2729

2830

2931
@dataclass

deepgram/clients/listen/v1/rest/options.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,9 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
8282
intents: Optional[bool] = field(
8383
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
8484
)
85+
keyterm: Optional[List[str]] = field(
86+
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
87+
)
8588
keywords: Optional[Union[List[str], str]] = field(
8689
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
8790
)
@@ -92,7 +95,7 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
9295
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
9396
)
9497
model: Optional[str] = field(
95-
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
98+
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
9699
)
97100
multichannel: Optional[bool] = field(
98101
default=None, metadata=dataclass_config(exclude=lambda f: f is None)

deepgram/clients/listen/v1/websocket/options.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,14 @@ class LiveOptions(BaseResponse): # pylint: disable=too-many-instance-attributes
6868
keywords: Optional[Union[List[str], str]] = field(
6969
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
7070
)
71+
keyterm: Optional[List[str]] = field(
72+
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
73+
)
7174
language: Optional[str] = field(
7275
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
7376
)
7477
model: Optional[str] = field(
75-
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
78+
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
7679
)
7780
multichannel: Optional[bool] = field(
7881
default=None, metadata=dataclass_config(exclude=lambda f: f is None)

examples/advanced/rest/direct_invocation/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def main():
2929

3030
# STEP 2 Call the transcribe_url method on the prerecorded class
3131
options: PrerecordedOptions = PrerecordedOptions(
32-
model="nova-2",
32+
model="nova-3",
3333
smart_format=True,
3434
summarize="v2",
3535
)

examples/advanced/websocket/direct_invocation/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def on_error(self, error, **kwargs):
5858
liveClient.on(LiveTranscriptionEvents.Error, on_error)
5959

6060
# connect to websocket
61-
options: LiveOptions = LiveOptions(model="nova-2", language="en-US")
61+
options: LiveOptions = LiveOptions(model="nova-3", language="en-US")
6262

6363
if liveClient.start(options) is False:
6464
print("Failed to connect to Deepgram")

examples/advanced/websocket/microphone_inheritance/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def main():
7979
liveClient: MyLiveClient = MyLiveClient(ClientOptionsFromEnv())
8080

8181
options: LiveOptions = LiveOptions(
82-
model="nova-2",
82+
model="nova-3",
8383
punctuate=True,
8484
language="en-US",
8585
encoding="linear16",

examples/advanced/websocket/mute-microphone/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def on_error(self, error, **kwargs):
6666
dg_connection.on(LiveTranscriptionEvents.Error, on_error)
6767

6868
options: LiveOptions = LiveOptions(
69-
model="nova-2",
69+
model="nova-3",
7070
punctuate=True,
7171
language="en-US",
7272
encoding="linear16",

examples/analyze/intent/conversation.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac
1616

1717
While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.
1818

19-
Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
19+
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
2020

2121
We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.
2222

@@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.
5151

5252
Our Approach
5353
----------
54-
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
54+
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
5555

56-
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
56+
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
5757

5858
We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.
5959

6060
These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.
6161

62-
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
62+
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
6363

6464
"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9
6565

@@ -68,4 +68,4 @@ What's Next
6868
----------
6969
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.
7070

71-
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
71+
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.

examples/analyze/legacy_dict_intent/conversation.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac
1616

1717
While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.
1818

19-
Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
19+
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
2020

2121
We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.
2222

@@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.
5151

5252
Our Approach
5353
----------
54-
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
54+
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
5555

56-
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
56+
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
5757

5858
We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.
5959

6060
These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.
6161

62-
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
62+
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
6363

6464
"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9
6565

@@ -68,4 +68,4 @@ What's Next
6868
----------
6969
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.
7070

71-
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
71+
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.

0 commit comments

Comments
 (0)