Skip to content

Comprehensive Awadhi dialect speech dataset for linguistic research and speech recognition enhancement

Notifications You must be signed in to change notification settings

PrashantShuklaa/Awadhi_Speech_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Awadhi_Speech_Dataset

Comprehensive Awadhi dialect speech dataset for linguistic research and speech recognition enhancement

Awadhi Language Dataset

Introduction

Welcome to the repository for the Awadhi_Speech_Dataset. Awadhi, a dialect of Hindi Language, is an Indo-Aryan language spoken primarily in the Awadh region of Uttar Pradesh, India, and in parts of Nepal. It has approximately 3.85 million speakers in India and 500,000 in Nepal (as of 2011). This dataset aims to facilitate linguistic research, natural language processing, and cultural preservation efforts related to the Awadhi language. 1_W_Hd0-pACIhygzVjPj737w

Dataset Overview

The dataset includes a variety of linguistic resources for Awadhi, such as:

  • Speech recordings and transcriptions
  • Text samples and translations
  • Annotations for linguistic analysis

External Dataset

DataSet Description
KMI Awadhi Corpus It contains a raw corpus of approximately 70,000 tokens and a POS-annotated corpus of approximately 20,000 tokens. The raw corpus is in the directory called 'source'. And the annotated corpus is in the directory called 'annotation'. The annotations are in CONLL 2000 format.
VarDial 2018 Language Identification Dataset have Awadhi inside a dataset
Awadhi speech dataset contains the transcription of the speech data

Getting Started

To use this dataset in your projects, clone this repository:

git clone https://github.com/yourusername/awadhi-language-dataset.git

Contribution

Contributions to expand and enhance this dataset are welcome. Please refer to our Contribution.md for guidelines.

Community and Support

Join our community to discuss potential applications of this dataset and collaborate on Awadhi language projects. For support or inquiries, please open an issue in this repository.

Acknowledgements

We extend our gratitude to all contributors and researchers who have enriched this repository with valuable data and insights.

About

Comprehensive Awadhi dialect speech dataset for linguistic research and speech recognition enhancement

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published