E-DAIC dataset preprocess

This is a simple project to clean and filter the original files in E-DAIC dataset, the original data struture should be:

-301_P
    -301_Transcript.csv
    -310_AUDIO.wav

steps:

-trans
    -301.json
    -302.json
    -...

-audio
    -301.wav
    -302.wav
    -...

OR simply use Whisper to transcrible the audio files -,-

step 3.1: segment audio files using original json files, automatically correct errors like starting time is bigger than ending time, ending time is bigger than the audio length.
step 3.2: feed each segment to whisper and replace the original text with new ones.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
300.csv		300.csv
300.json		300.json
617_AUDIO.wav		617_AUDIO.wav
readme.md		readme.md
steps.py		steps.py
utils.py		utils.py