Awadhi Language Dataset

Awadhi_Speech_Dataset

Comprehensive Awadhi dialect speech dataset for linguistic research and speech recognition enhancement

Awadhi Language Dataset

Introduction

Welcome to the repository for the Awadhi_Speech_Dataset. Awadhi, a dialect of Hindi Language, is an Indo-Aryan language spoken primarily in the Awadh region of Uttar Pradesh, India, and in parts of Nepal. It has approximately 3.85 million speakers in India and 500,000 in Nepal (as of 2011). This dataset aims to facilitate linguistic research, natural language processing, and cultural preservation efforts related to the Awadhi language.

Dataset Overview

The dataset includes a variety of linguistic resources for Awadhi, such as:

Speech recordings and transcriptions
Text samples and translations
Annotations for linguistic analysis

External Dataset

DataSet	Description
KMI Awadhi Corpus	It contains a raw corpus of approximately 70,000 tokens and a POS-annotated corpus of approximately 20,000 tokens. The raw corpus is in the directory called 'source'. And the annotated corpus is in the directory called 'annotation'. The annotations are in CONLL 2000 format.
VarDial 2018 Language Identification Dataset	have Awadhi inside a dataset
Awadhi speech dataset	contains the transcription of the speech data

Getting Started

To use this dataset in your projects, clone this repository:

git clone https://github.com/yourusername/awadhi-language-dataset.git

Contribution

Contributions to expand and enhance this dataset are welcome. Please refer to our Contribution.md for guidelines.

Community and Support

Join our community to discuss potential applications of this dataset and collaborate on Awadhi language projects. For support or inquiries, please open an issue in this repository.

Acknowledgements

We extend our gratitude to all contributors and researchers who have enriched this repository with valuable data and insights.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Contribution.md		Contribution.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awadhi_Speech_Dataset

Awadhi Language Dataset

Introduction

Dataset Overview

External Dataset

Getting Started

Contribution

Community and Support

Acknowledgements

About

Uh oh!

Releases

Packages

PrashantShuklaa/Awadhi_Speech_Dataset

Folders and files

Latest commit

History

Repository files navigation

Awadhi_Speech_Dataset

Awadhi Language Dataset

Introduction

Dataset Overview

External Dataset

Getting Started

Contribution

Community and Support

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages