English-Speech-Accent-classification

Using speechbrain models for detecting English speaker accent

How to install

I use Windows 11, Anaconda, Python 3.11, and PyTorch 2.6.0+cu12.4. You can use other PyTorch versions according to your CUDA version, read this https://pytorch.org/get-started/locally/

and before install it, first you need to install the ffmepg too, you can download it via this link : https://ffmpeg.org/download.html

#create venv in anaconda
conda create --name speech_accent python=3.11
conda activate speech_accent

#clone this repo
git clone https://github.com/RaffelRavionaldo/English-Speech-Accent-classification.git
cd English-Speech-Accent-classification

# install pytorch according to your CUDA version, but if you want have a same pytorch version like me (on local)
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
# if you don't have GPU (just CPU), use this
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0

# install the requirements
pip install -r requirements.txt

How to run/use this app

On your anaconda, run this syntax: streamlit run app.py
After you run it, your default browser will open a streamlit UI like the image below :

Input your video link in text input or upload a video from your computer, for this test I am using a video YouTube (https://www.youtube.com/watch?v=0nLIGzLOXgg)
Click process button and wait until the output show like below image (I'm still a newbie on streamlit, so when the "system" is predicting the accent of the video, you can still enter another input and click process, this should be disabled while the process is still running but I don't know how yet)

Flow of this app

After the app receives your video or video link and you click process, the video will be processed. If it's the video link, it will be downloaded and saved in the temp folder
Extract audio from the video, detect the non-silent part of the video, and get 10 seconds of it. I only take 10 seconds for the speed of detection. If we classify all of the voices from the video, it will depend on the video length.
Send the audio to speech accent model, i use this model : https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english
Give the output (an accent and the confidence score)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
app.py		app.py
app_2.py		app_2.py
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

English-Speech-Accent-classification

How to install

How to run/use this app

Flow of this app

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

RaffelRavionaldo/English-Speech-Accent-classification

Folders and files

Latest commit

History

Repository files navigation

English-Speech-Accent-classification

How to install

How to run/use this app

Flow of this app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages