Skip to content

sustech-topology/TopCap

Repository files navigation

Topology-enhanced Machine Learning for Consonant Recognition

Code material for Topological Data Analysis in Consonant Recognition. The data that support the findings of this study are openly available in SpeechBox, ALLSSTAR Corpus, L1-ENG division at Home Page of SpeechBox

Data Preprocessing

Using Montreal Forced Aligner (MFA) to align each speech into phonetic segments. The detailed guidance of MFA can be found on the Installation Page. The following steps help align each speech into phonetic segments. See Montreal Forced Aligner Tutorial for more explanations.

  • Download the acoustic model and dictionary.
    mfa model download acoustic english_us_arpa
    mfa model download dictionary english_us_arpa
    
  • Convert sampling rate into 16kHz by wav_modification.
  • Align speeches, the output files are in .TextGrid format.
    mfa align ~/mfa_data/my_corpus english_us_arpa english_us_arpa ~/mfa_data/my_corpus_aligned
    

TopCap Construction

Before constructing TopCap, there is a preliminary experiment that measures the performance of topological methods in time series. fre_amp_av helps understand how topological methods distinguish different vibration patterns in time series. The results are shown in observation_result_refined

TopCap is achieved in csv_writer_consonant, which captures the most significant topological features within those segmented phonetic time series. The output is a .csv file containing the birthtime and lifetime corresponding to the point in the persistent diagram with the longest lifetime.

Further discussions of TopCap are involved in

Machine Learning for Topological Features

Matlab (R2022b) classification learner application, 5-fold cross-validation, set aside 30% records as test data. Use the following automatic built-in algorithm: Optimizable Tree, Optimizable Discriminant, Efficient Logistic Regression, Optimizable Naive Bayes, Optimizable SVM, Optimizable KNN, Kernel, and Optimizable Ensemble.

Model Comparison

We built other state-of-art models for comparison with TopCap to comprehensively evaluate its performance. The MFCC-GRU classification model is obtained in MFCC_GRU_classification_model, the MFCC-Transformer classification model is obtained in MFCC_Transformer_classification_model, both the STFT-CNN classification model and the STFT-CNN^+ classification model are obtained in STFT_CNN_classification_model.

Data Preprocessing of other date set

The comparison experiments include the LJSpeech, TIMIT, and LibriSpeech repositories, along with four additional corpora from ALLSSTAR that do not appear in our main experiments. The data preprocessing files can be found in the folder dataset preprocessing.

Supplements

The folder supplements includes supplementary files for this project.

  • The results folder contains ROC, and AUC for machine learning, as well birthtime, lifetime of consonants.
  • The consonants_waveforms folder contains waveforms of pulmonic consonants. Audio for these consonants comes from Wiki-List of consonants. This gives consonants concrete shapes for readers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published