Skip to content

Commit cd2a5bd

Browse files
author
Mateusz Kopciński
committed
review changes
1 parent 293a03b commit cd2a5bd

File tree

9 files changed

+166
-108
lines changed

9 files changed

+166
-108
lines changed

docs/docs/natural-language-processing/useSpeechToText.md

Lines changed: 84 additions & 9 deletions
Large diffs are not rendered by default.

docs/docs/typescript-api/SpeechToTextModule.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,20 +37,27 @@ const transcribedText = await SpeechToTextModule.transcribe(waveform);
3737

3838
### Methods
3939

40-
| Method | Type | Description |
41-
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
42-
| `load` | <code>(modelName: 'whisper' &#124 'moonshine' &#124 'whisperMultilingual', transcribeCallback?: (sequence: string) => void, modelDownloadProgressCallback?: (downloadProgress: number) => void, encoderSource?: ResourceSource, decoderSource?: ResourceSource, tokenizerSource?: ResourceSource)</code> | Loads the model specified with `modelName`, where `encoderSource`, `decoderSource`, `tokenizerSource` are strings specifying the location of the binaries for the models. `modelDownloadProgressCallback` allows you to monitor the current progress of the model download, while `transcribeCallback` is invoked with each generated token |
43-
| `transcribe` | `(waveform: number[], audioLanguage?: SpeechToTextLanguage): Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. Resolves a promise with the output transcription when the model is finished. For multilingual models, you have to specify the audioLanguage flag, which is the language of the spoken language in the audio. |
44-
| `encode` | `(waveform: number[]) => Promise<number[]>` | Runs the encoding part of the model. Returns a float array representing the output of the encoder. |
45-
| `decode` | `(tokens: number[], encodings?: number[]) => Promise<number[]>` | Runs the decoder of the model. Returns a single token representing a next token in the output sequence. If `encodings` are provided then they are used for decoding process, if not then the cached encodings from most recent `encode` call are used. The cached option is much faster due to very large overhead for communication between native and react layers. |
46-
| `configureStreaming` | <code>(overlapSeconds?: number, windowSize?: number, streamingConfig?: 'fast' &#124; 'balanced' &#124; 'quality') => void</code> | Configures options for the streaming algorithm: <ul><li>`overlapSeconds` determines how much adjacent audio chunks overlap (increasing it slows down transcription, decreases probability of weird wording at the chunks intersection, setting it larger than 3 seconds generally is discouraged), </li><li>`windowSize` describes size of the audio chunks (increasing it speeds up the end to end transcription time, but increases latency for the first token to be returned),</li><li> `streamingConfig` predefined configs for `windowSize` and `overlapSeconds` values.</li></ul> Keep `windowSize + 2 * overlapSeconds <= 30`. |
40+
| Method | Type | Description |
41+
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
42+
| `load` | <code>(modelName: 'whisper' &#124 'moonshine' &#124 'whisperMultilingual', transcribeCallback?: (sequence: string) => void, modelDownloadProgressCallback?: (downloadProgress: number) => void, encoderSource?: ResourceSource, decoderSource?: ResourceSource, tokenizerSource?: ResourceSource)</code> | Loads the model specified with `modelName`, where `encoderSource`, `decoderSource`, `tokenizerSource` are strings specifying the location of the binaries for the models. `modelDownloadProgressCallback` allows you to monitor the current progress of the model download, while `transcribeCallback` is invoked with each generated token |
43+
| `transcribe` | `(waveform: number[], audioLanguage?: SpeechToTextLanguage): Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. Resolves a promise with the output transcription when the model is finished. For multilingual models, you have to specify the audioLanguage flag, which is the language of the spoken language in the audio. |
44+
| `streamingTranscribe` | `(streamingAction: STREAMING_ACTION, waveform?: number[], audioLanguage?: SpeechToTextLanguage) => Promise<string>` | This allows for running transcription process on-line, which means where the whole audio is not known beforehand i.e. when transcribing from a live microphone feed. `streamingAction` defines the type of package sent to the model: <li>`START` - initializes the process, allows for optional `waveform` data</li><li>`DATA` - this package should contain consecutive audio data chunks sampled in 16k Hz</li><li>`STOP` - the last data chunk for this transcription, ends the transcription process and flushes internal buffers</li> Each call returns most recent transcription. Returns error when called when module is in use (i.e. processing `transcribe` call) |
45+
| `encode` | `(waveform: number[]) => Promise<number[]>` | Runs the encoding part of the model. Returns a float array representing the output of the encoder. |
46+
| `decode` | `(tokens: number[], encodings?: number[]) => Promise<number[]>` | Runs the decoder of the model. Returns a single token representing a next token in the output sequence. If `encodings` are provided then they are used for decoding process, if not then the cached encodings from most recent `encode` call are used. The cached option is much faster due to very large overhead for communication between native and react layers. |
47+
| `configureStreaming` | <code>(overlapSeconds?: number, windowSize?: number, streamingConfig?: 'fast' &#124; 'balanced' &#124; 'quality') => void</code> | Configures options for the streaming algorithm: <ul><li>`overlapSeconds` determines how much adjacent audio chunks overlap (increasing it slows down transcription, decreases probability of weird wording at the chunks intersection, setting it larger than 3 seconds generally is discouraged), </li><li>`windowSize` describes size of the audio chunks (increasing it speeds up the end to end transcription time, but increases latency for the first token to be returned),</li><li> `streamingConfig` predefined configs for `windowSize` and `overlapSeconds` values.</li></ul> Keep `windowSize + 2 * overlapSeconds <= 30`. |
4748

4849
<details>
4950
<summary>Type definitions</summary>
5051

5152
```typescript
5253
type ResourceSource = string | number | object;
5354

55+
enum STREAMING_ACTION {
56+
START,
57+
DATA,
58+
STOP,
59+
}
60+
5461
enum SpeechToTextLanguage {
5562
Afrikaans = 'af',
5663
Albanian = 'sq',

examples/llm/ios/Podfile.lock

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ PODS:
33
- DoubleConversion (1.1.6)
44
- EXConstants (17.1.6):
55
- ExpoModulesCore
6-
- Expo (53.0.8):
6+
- Expo (53.0.9):
77
- DoubleConversion
88
- ExpoModulesCore
99
- glog
@@ -36,13 +36,13 @@ PODS:
3636
- ExpoModulesCore
3737
- ExpoCalendar (14.1.4):
3838
- ExpoModulesCore
39-
- ExpoFileSystem (18.1.9):
39+
- ExpoFileSystem (18.1.10):
4040
- ExpoModulesCore
4141
- ExpoFont (13.3.1):
4242
- ExpoModulesCore
4343
- ExpoKeepAwake (14.1.4):
4444
- ExpoModulesCore
45-
- ExpoModulesCore (2.3.12):
45+
- ExpoModulesCore (2.3.13):
4646
- DoubleConversion
4747
- glog
4848
- hermes-engine
@@ -67,8 +67,6 @@ PODS:
6767
- ReactCommon/turbomodule/bridging
6868
- ReactCommon/turbomodule/core
6969
- Yoga
70-
- ExpoSpeech (13.1.4):
71-
- ExpoModulesCore
7270
- fast_float (6.1.4)
7371
- FBLazyVector (0.79.2)
7472
- fmt (11.0.2)
@@ -1401,7 +1399,7 @@ PODS:
14011399
- React-jsiexecutor
14021400
- React-RCTFBReactNativeSpec
14031401
- ReactCommon/turbomodule/core
1404-
- react-native-executorch (0.3.1-stt-12):
1402+
- react-native-executorch (0.3.3):
14051403
- DoubleConversion
14061404
- glog
14071405
- hermes-engine
@@ -2087,7 +2085,6 @@ DEPENDENCIES:
20872085
- ExpoFont (from `../node_modules/expo-font/ios`)
20882086
- ExpoKeepAwake (from `../node_modules/expo-keep-awake/ios`)
20892087
- ExpoModulesCore (from `../node_modules/expo-modules-core`)
2090-
- ExpoSpeech (from `../node_modules/expo-speech/ios`)
20912088
- fast_float (from `../node_modules/react-native/third-party-podspecs/fast_float.podspec`)
20922089
- FBLazyVector (from `../node_modules/react-native/Libraries/FBLazyVector`)
20932090
- fmt (from `../node_modules/react-native/third-party-podspecs/fmt.podspec`)
@@ -2193,8 +2190,6 @@ EXTERNAL SOURCES:
21932190
:path: "../node_modules/expo-keep-awake/ios"
21942191
ExpoModulesCore:
21952192
:path: "../node_modules/expo-modules-core"
2196-
ExpoSpeech:
2197-
:path: "../node_modules/expo-speech/ios"
21982193
fast_float:
21992194
:podspec: "../node_modules/react-native/third-party-podspecs/fast_float.podspec"
22002195
FBLazyVector:
@@ -2349,15 +2344,14 @@ SPEC CHECKSUMS:
23492344
boost: 7e761d76ca2ce687f7cc98e698152abd03a18f90
23502345
DoubleConversion: cb417026b2400c8f53ae97020b2be961b59470cb
23512346
EXConstants: 9f310f44bfedba09087042756802040e464323c0
2352-
Expo: 769ab5c190382eedebc733af6708bbc9ca5f643b
2347+
Expo: a9fc723f6c8f673f0e7e036c9021772d3a1a0707
23532348
ExpoAsset: 3bc9adb7dbbf27ae82c18ca97eb988a3ae7e73b1
23542349
ExpoBrightness: c335c6ccc082d5249a4b38dba5cd9a08aa0bf62b
23552350
ExpoCalendar: f5f94ea8dcd957b1434beb4e1c0da1af063322e6
2356-
ExpoFileSystem: 0f3f466ecd3560f55768cd3f94ac3a17f093b8e6
2351+
ExpoFileSystem: c36eb8155eb2381c83dda7dc210e3eec332368b6
23572352
ExpoFont: abbb91a911eb961652c2b0a22eef801860425ed6
23582353
ExpoKeepAwake: bf0811570c8da182bfb879169437d4de298376e7
2359-
ExpoModulesCore: 3ac17421302df62928fc99c133cf25bdbcf0b004
2360-
ExpoSpeech: 4db7ef7888b9edc39ca9afee54e9c4b3df269ccb
2354+
ExpoModulesCore: 5d37821c36f3781dcd0ea9a393800c90eaa6259d
23612355
fast_float: 06eeec4fe712a76acc9376682e4808b05ce978b6
23622356
FBLazyVector: 84b955f7b4da8b895faf5946f73748267347c975
23632357
fmt: a40bb5bd0294ea969aaaba240a927bd33d878cdd
@@ -2395,7 +2389,7 @@ SPEC CHECKSUMS:
23952389
React-logger: 8edfcedc100544791cd82692ca5a574240a16219
23962390
React-Mapbuffer: c3f4b608e4a59dd2f6a416ef4d47a14400194468
23972391
React-microtasksnativemodule: 054f34e9b82f02bd40f09cebd4083828b5b2beb6
2398-
react-native-executorch: 8835fcfdfc71b1d42d30525ee047b2811c359cb8
2392+
react-native-executorch: d0c3dffa0a4a4111ea9c7b97f3fbf088a48d3b2a
23992393
react-native-safe-area-context: 562163222d999b79a51577eda2ea8ad2c32b4d06
24002394
React-NativeModulesApple: 2c4377e139522c3d73f5df582e4f051a838ff25e
24012395
React-oscompat: ef5df1c734f19b8003e149317d041b8ce1f7d29c

0 commit comments

Comments
 (0)