Supports continuous speech recognition and barge-in #5426

compulim · 2025-02-12T21:06:24Z

Fixes #2661. Fixes #5352.

Initial work done in #5397.

Changelog Entry

Added

Resolved #2661 and #5352. Added speech recognition continuous mode with barge-in support, in PR #5426, by @RushikeshGavali and @compulim
- Set styleOptions.speechRecognitionContinuous to true with a Web Speech API provider with continuous mode support

Changed

Bumped dependencies to the latest versions, by @compulim in PR #5385, #5400, and #5426
- Production dependencies
  - [email protected]

Description

Continuous mode is designed for hands-off/kiosk scenario. End-users can hold a speech-primary conversation with the bot, and occasionally, interact with gestures (e.g. tapping on a card). Speech recognition will be kept active as long as possible, until end-user turn off speech recognition.

Added new styleOptions.speechRecognitionContinuous to enable continuous mode for speech recognition.

Design

Interactive mode: speech recognition is only active for minimal time, focus on privacy
Continuous mode: speech recognition will be active for as long as possible, durable over non-speech interactions, barge-in is supported, focus on hands-off experience

Behavioral differences

Continuous mode will not turn off microphone after speech is recognized
- This is a behavior exhibited by the Web Speech API provider
  - Technically, Web Chat will not turn off microphone until end event is received, and not because result event is received
- If Web Speech API provider does not support continuous mode, it should send end event after speech is recognized
While the bot response is synthesizing and input mode is "expecting input":
- Interactive mode:
  - While synthesis is ongoing, speech recognition is paused
  - After synthesis has completed, speech recognition will be resumed
- Continuous mode:
  - While synthesis is ongoing, speech recognition is continue to be active
  - When interim is recognized, synthesis will be interrupted (a.k.a. barge-in)
  - Logically, "expecting input" is ignored (speech recognition is always active and not paused)
While speech recognition is active, tap on card action
- Interactive mode: will stop speech recognition, will not speak bot response
- Continuous mode: will not stop speech recognition, will speak bot response
While speech recognition is active, receiving a bot message proactively
- Interactive mode: will not synthesize the bot message
- Continuous mode: will synthesize the bot message

Technical details

Web Chat relies on the correctness of the behavior of Web Speech API provider, including
- Web Chat assume the microphone is on when start event is received
- Web Chat assume the microphone is off when end event is received
Web Chat do not care about the SpeechRecognition.continuous property, but depends on the event dispatched by the Web Speech API provider
- Microphone will be turned off when receiving end event
- Microphone will send the message when receiving a result event with resultIndex pointing to a result which its isFinal property is true
  - event.results[event.resultIndex].isFinal === true
If interim is received, Web Chat will stop speech synthesis

Specific Changes

Added new styleOptions.speechRecognitionContinuous

I have added tests and executed them locally
I have updated CHANGELOG.md
~~I have updated documentation~~

Review Checklist

This section is for contributors to review your work.

~~Accessibility reviewed (tab order, content readability, alt text, color contrast)~~
~~Browser and platform compatibilities reviewed~~
~~CSS styles reviewed (minimal rules, no z-index)~~
~~Documents reviewed (docs, samples, live demo)~~
~~Internationalization reviewed (strings, unit formatting)~~
package.json and package-lock.json reviewed
~~Security reviewed (no data URIs, check for nonce leak)~~
Tests reviewed (coverage, legitimacy)

__tests__/html2/speech/js/MockedSpeechSynthesisUtterance.js

packages/api/src/StyleOptions.ts

packages/component/src/Composer.tsx

OEvgeny

Looks solid, couple of nits and questions

compulim added 30 commits February 7, 2025 21:34

Add mock SpeechSynthesis

20de20a

Clean up

f352b5a

Use jest-mock

8019538

Add expectingInput

34d0ee6

Complete the test

379881e

Add import map

1a48a92

Use import map

f7de58a

Add await to resolveAll()

3cfc548

Complete case

14d5511

No need to wait for send when barge-in

5c2cb35

Add interims

6fa4857

Support barge-in

a3da8a8

Bump version

42cde33

Bump version

b5d215d

Continue to show "Listening..."

da5fa78

Bump react-dictate-button

8ef07af

Add more expectations

5ca1ea9

Clean up

c9c9e69

Clean up

14822c0

Add tests

1c62085

Clean up

1815c57

Clean up

de01d5a

Add more scenarios

b4edd0a

Ignore html2

ce4af27

Ported test

d5fc3c2

Added test

8cb2ae5

Bump react-dictate-button

206c6f2

Add entry

1e0f1a3

Update entries

282cd99

Bump to [email protected]

4fd345f

compulim added 11 commits February 13, 2025 08:47

Bump to [email protected]

e88f40e

Clean up

07c84b3

Clean up

9e5abc2

More comments

4936171

Add perform card action

e6f41dc

Add perform card action tests

ddefad0

Add test

d8608d1

More scenarios

2e2797e

Merge branch 'main' into feat-speech-barge-in

1a81e19

Better comments

6cd4a83

Better comment

1311d17

compulim changed the title ~~DRAFT: Supports barge-in for speech recognition~~ Supports continuous speech recognition and barge-in Feb 13, 2025

compulim mentioned this pull request Feb 13, 2025

feat: Added continuous listening functionality which is controlled by prop #5397

Closed

11 tasks

compulim marked this pull request as ready for review February 13, 2025 11:06

compulim requested review from a-b-r-o-w-n, cwhitten, srinaath, tdurnford and beyackle2 as code owners February 13, 2025 11:06

OEvgeny reviewed Feb 13, 2025

View reviewed changes

__tests__/html2/speech/js/MockedSpeechSynthesisUtterance.js Outdated Show resolved Hide resolved

OEvgeny reviewed Feb 13, 2025

View reviewed changes

packages/api/src/StyleOptions.ts Show resolved Hide resolved

OEvgeny reviewed Feb 13, 2025

View reviewed changes

packages/component/src/Composer.tsx Outdated Show resolved Hide resolved

OEvgeny previously approved these changes Feb 13, 2025

View reviewed changes

compulim added 2 commits February 13, 2025 18:32

Add comment

50f1628

Add speech error telemetry

5be4691

compulim dismissed OEvgeny’s stale review via 5be4691 February 13, 2025 18:53

Add types

0e65ee5

OEvgeny approved these changes Feb 13, 2025

View reviewed changes

compulim merged commit c8c5744 into microsoft:main Feb 13, 2025
25 checks passed

compulim deleted the feat-speech-barge-in branch February 13, 2025 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports continuous speech recognition and barge-in #5426

Supports continuous speech recognition and barge-in #5426

compulim commented Feb 12, 2025 •

edited

Loading

OEvgeny left a comment

Supports continuous speech recognition and barge-in #5426

Supports continuous speech recognition and barge-in #5426

Conversation

compulim commented Feb 12, 2025 • edited Loading

Changelog Entry

Added

Changed

Description

Design

Behavioral differences

Technical details

Specific Changes

Review Checklist

OEvgeny left a comment

Choose a reason for hiding this comment

compulim commented Feb 12, 2025 •

edited

Loading