Skip to content

chore: fix fetching files in speech to text controller #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.SYSTEM_ALERT_WINDOW"/>
<uses-permission android:name="android.permission.VIBRATE"/>
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
Expand Down
Copy link
Collaborator

@pweglik pweglik May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it now, but in future try not to include irrelevant changes like this caused by rebuilds that has nothing to do with changes mentioned in PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the relevant changes were actually autogenerated (build)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, changes in permissions can be autogenerated, but I see no reason to include formatting changes in kotlin in this PR. I'm not even sure why is this happening - it should have been properly formatted after expo update along with other files.
Anyway, I think these kind of changes should be limited to PRs related to bumping version of dependencies etc

Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ class MainActivity : ReactActivity() {
* Returns the instance of the [ReactActivityDelegate]. We use [DefaultReactActivityDelegate]
* which allows you to enable New Architecture with a single boolean flags [fabricEnabled]
*/
override fun createReactActivityDelegate(): ReactActivityDelegate {
return ReactActivityDelegateWrapper(
override fun createReactActivityDelegate(): ReactActivityDelegate =
ReactActivityDelegateWrapper(
this,
BuildConfig.IS_NEW_ARCHITECTURE_ENABLED,
object : DefaultReactActivityDelegate(
Expand All @@ -37,7 +37,6 @@ class MainActivity : ReactActivity() {
fabricEnabled,
) {},
)
}

/**
* Align the back button behavior with Android S
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,17 @@ import com.facebook.soloader.SoLoader
import expo.modules.ApplicationLifecycleDispatcher
import expo.modules.ReactNativeHostWrapper

class MainApplication : Application(), ReactApplication {
class MainApplication :
Application(),
ReactApplication {
override val reactNativeHost: ReactNativeHost =
ReactNativeHostWrapper(
this,
object : DefaultReactNativeHost(this) {
override fun getPackages(): List<ReactPackage> {
val packages = PackageList(this).packages
// Packages that cannot be autolinked yet can be added manually here, for example:
// packages.add(new MyReactNativePackage());
// packages.add(MyReactNativePackage())
return packages
}

Expand Down
5 changes: 4 additions & 1 deletion examples/speech-to-text/app.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@
},
"ios": {
"supportsTablet": true,
"bundleIdentifier": "com.anonymous.speechtotext"
"bundleIdentifier": "com.anonymous.speechtotext",
"infoPlist": {
"NSMicrophoneUsageDescription": "This app needs access to your microphone to record audio."
}
},
"android": {
"adaptiveIcon": {
Expand Down
2 changes: 2 additions & 0 deletions examples/speech-to-text/ios/speechtotext/Info.plist
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@
<key>NSAllowsLocalNetworking</key>
<true/>
</dict>
<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to your microphone to record audio.</string>
<key>UILaunchStoryboardName</key>
<string>SplashScreen</string>
<key>UIRequiredDeviceCapabilities</key>
Expand Down
20 changes: 19 additions & 1 deletion examples/speech-to-text/screens/SpeechToTextScreen.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ import {
StyleSheet,
SafeAreaView,
TouchableOpacity,
PermissionsAndroid,
Platform,
} from 'react-native';
import LiveAudioStream from 'react-native-live-audio-stream';
import SWMIcon from '../assets/swm_icon.svg';
Expand Down Expand Up @@ -75,6 +77,21 @@ export const SpeechToTextScreen = () => {
};

const handleRecordPress = async () => {
if (Platform.OS === 'android') {
const permission = await PermissionsAndroid.check(
PermissionsAndroid.PERMISSIONS.RECORD_AUDIO
);
if (!permission) {
const granted = await PermissionsAndroid.request(
PermissionsAndroid.PERMISSIONS.RECORD_AUDIO
);
if (granted !== PermissionsAndroid.RESULTS.GRANTED) {
console.log('Microphone permission denied');
return;
}
}
}

Comment on lines +80 to +94
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not asking for permission on ios?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's handled automatically

if (isRecording) {
LiveAudioStream.stop();
setIsRecording(false);
Expand Down Expand Up @@ -162,7 +179,7 @@ export const SpeechToTextScreen = () => {
}}
>
<Text style={[styles.recordingButtonText, styles.font13]}>
{'TRANSCRIBE FROM URL'}
TRANSCRIBE FROM URL
</Text>
</TouchableOpacity>
</View>
Expand Down Expand Up @@ -226,6 +243,7 @@ const styles = StyleSheet.create({
justifyContent: 'center',
alignItems: 'center',
marginBottom: 20,
backgroundColor: 'white',
},
recordingButtonWrapper: {
flex: 1,
Expand Down
30 changes: 10 additions & 20 deletions src/controllers/SpeechToTextController.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@ import {
NUM_TOKENS_TO_SLICE,
} from '../constants/sttDefaults';
import { AvailableModels, ModelConfig } from '../types/stt';
import {
SpeechToTextNativeModule,
TokenizerNativeModule,
} from '../native/RnExecutorchModules';
import { SpeechToTextNativeModule } from '../native/RnExecutorchModules';
import { TokenizerModule } from '../modules/natural_language_processing/TokenizerModule';
import { ResourceSource } from '../types/common';
import { ResourceFetcher } from '../utils/ResourceFetcher';
import { longCommonInfPref } from '../utils/stt';
Expand All @@ -24,7 +22,7 @@ export class SpeechToTextController {
public sequence: number[] = [];
public isReady = false;
public isGenerating = false;
private nativeTokenizer = TokenizerNativeModule;
private nativeTokenizer = TokenizerModule;

// User callbacks
private decodedTranscribeCallback: (sequence: number[]) => void;
Expand Down Expand Up @@ -85,24 +83,16 @@ export class SpeechToTextController {
this.config = MODEL_CONFIGS[modelName];

try {
encoderSource = await ResourceFetcher.fetch(
encoderSource || this.config.sources.encoder,
(progress) => this.modelDownloadProgressCallback?.(progress / 2)
);

decoderSource = await ResourceFetcher.fetch(
decoderSource || this.config.sources.decoder,
(progress) => this.modelDownloadProgressCallback?.(0.5 + progress / 2)
);

let tokenizerUri = await ResourceFetcher.fetch(
await this.nativeTokenizer.load(
tokenizerSource || this.config.tokenizer.source
);

// The tokenizer native module does not accept the file:// prefix
await this.nativeTokenizer.loadModule(
tokenizerUri.replace('file://', '')
);
[encoderSource, decoderSource] =
await ResourceFetcher.fetchMultipleResources(
this.modelDownloadProgressCallback,
encoderSource || this.config.sources.encoder,
decoderSource || this.config.sources.decoder
);
} catch (e) {
this.onErrorCallback?.(e);
return;
Expand Down