Code material for Topological Data Analysis in Consonant Recognition. The data that support the findings of this study are openly available in SpeechBox, ALLSSTAR Corpus, L1-ENG division at Home Page of SpeechBox
Using Montreal Forced Aligner (MFA) to align each speech into phonetic segments. The detailed guidance of MFA can be found on the Installation Page. The following steps help align each speech into phonetic segments. See Montreal Forced Aligner Tutorial for more explanations.
- Download the acoustic model and dictionary.
mfa model download acoustic english_us_arpa mfa model download dictionary english_us_arpa
- Convert sampling rate into 16kHz by wav_modification.
- Align speeches, the output files are in
.TextGrid
format.mfa align ~/mfa_data/my_corpus english_us_arpa english_us_arpa ~/mfa_data/my_corpus_aligned
Before constructing TopCap, there is a preliminary experiment that measures the performance of topological methods in time series. fre_amp_av helps understand how topological methods distinguish different vibration patterns in time series. The results are shown in observation_result_refined
TopCap is achieved in csv_writer_consonant, which captures the most significant topological features within those segmented phonetic time series. The output is a .csv
file containing the birthtime and lifetime corresponding to the point in the persistent diagram with the longest lifetime.
Further discussions of TopCap are involved in
- observation_dimension illustrates how dimension influences time delay embedding and persistent diagrams.
- observation_dimension_plot includes parameters and graph in the discussion section.
- observation_skip illustrates how skip influences computation time.
Matlab (R2022b) classification learner application, 5-fold cross-validation, set aside 30% records as test data. Use the following automatic built-in algorithm: Optimizable Tree, Optimizable Discriminant, Efficient Logistic Regression, Optimizable Naive Bayes, Optimizable SVM, Optimizable KNN, Kernel, and Optimizable Ensemble.
We built other state-of-art models for comparison with TopCap to comprehensively evaluate its performance. The MFCC-GRU classification model is obtained in MFCC_GRU_classification_model, the MFCC-Transformer classification model is obtained in MFCC_Transformer_classification_model, both the STFT-CNN classification model and the STFT-CNN^+ classification model are obtained in STFT_CNN_classification_model.
The comparison experiments include the LJSpeech, TIMIT, and LibriSpeech repositories, along with four additional corpora from ALLSSTAR that do not appear in our main experiments. The data preprocessing files can be found in the folder dataset preprocessing.
The folder supplements
includes supplementary files for this project.
- The
results
folder contains ROC, and AUC for machine learning, as well birthtime, lifetime of consonants. - The
consonants_waveforms
folder contains waveforms of pulmonic consonants. Audio for these consonants comes from Wiki-List of consonants. This gives consonants concrete shapes for readers.