The original code to this VCM is to be found here.
The code was cleaned and tested by William N. Havard (07&10/01/22).
Clone this repository. for example if you use ssh:
git clone [email protected]:LAAC-LSCP/vcm.git
If you which to install the dependencies directly you can run:
pip install -r requirements.txt
If you want to keep everything in a conda environment:
conda create -p /my/environment/vcm pip
conda activate /my/environment/vcm
pip install -r requirements.txt
You will need the SMILExtract binary file to run vcm, download it for example in you vcm directory:
wget -O SMILExtract
chmod u+x SMILExtract
If you use conda, remember to check that your environment is activated
conda activate /my/environment/vcm
(vcm) myname@mycomputer$
Vcm is used with the script:
[-x AUDIO_EXTENSION] [--all-children] [--remove-others]
[--from-batched-vtc] [--keep-temp] [--reuse-temp]
[--temp-dir TEMP_DIR] [--skip-done] [-j N_JOBS]
optional arguments:
-h, --help show this help message and exit
-a INPUT_AUDIO_PATH, --input-audio-path INPUT_AUDIO_PATH
Path to the audio file to be processed.
-r INPUT_RTTM_PATH, --input-rttm-path INPUT_RTTM_PATH
Path to the VTC output of the file to be processed.
Path to smilextract SMILExtract (v2.3) binary.
-o OUTPUT_VCM_PATH, --output-vcm-path OUTPUT_VCM_PATH
Output path were the results of the VCM should be
stored. Default: Same as RTTM file.
Audio files file extension (no extension '' also
accepted). Default: '.wav'
--all-children Should speech segment produced by other children
than the key child (KCHI) be analysed. (Default:
--remove-others Should the VTC annotations for the other speakers
be removed from the VCMoutput file. If Segments
from speaker-type SPEECH, MAL, FEM, etc. will be
removed. (Default: False.)
--from-batched-vtc Whether the VTC files were generated using
LSCP/LAAC batch-voice-type-classifier or not./!\
LSCP/LAAC specific, you shouldn't be needing this
option. (Default: False.)
--keep-temp Whether temporary file should be kept or not.
(Default: False.)
--reuse-temp Whether temporary file should be reused instead of
being recomputed. (Default: False.)
--temp-dir TEMP_DIR Set path to temporary directory. (Default:
--skip-done Whether RTTM for which a VCM file already exists
should be skipped. (Default: False.)
-j N_JOBS, -J N_JOBS, --n-jobs N_JOBS
Number of parallel jobs to run.
Launch you computation:
./ -a audio/file/or/directory -r rttm/file/or/directory -s path/SMILExtract -o output/path -j 8
To test the installation, run the following command:
pytest tst --smilextract-bin-path=/scratch2/whavard/PACKAGES/opensmile/bin/linux_x64_standalone_static/SMILExtract
The original code was written in Python 2 and used an unknown version of Pytorch (presumably 0.3.0). The code however
runs seemlessly with Python 3.9 and Pytorch 1.11. requirements.txt
is only given for reproducibility purposes, and
packages with lower version numbers might work as well.
- example.rttm was extracted using Marvin's VTC as packaged by ALICE
- Feature extraction is done using OpenSMILE v2.3. Code was run and tested with the precompile version: opensmile/bin)/linux_x64_standalone_static/
- LSCP/LAAC specific: To process RTTM files generated with the batched version of the voice-type-classificier use the following option
- Extracted features (88 eGeMAPS) reference: "The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing". Eyben et al., 2015.
- VCM reference: "VCMNet: Weakly Supervised Learning for Automatic Infant Vocalisation Maturity Analysis" . Futaisi et al., 2019. DOI: 10.1145/3340555.3353751
- VTC code ( Lavechin et al.)
- ALICE Code (Räsänen et al.)