-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added modified MFCC features based on DNN-c and fDNN-c features; it i… #2908
base: master
Are you sure you want to change the base?
Conversation
…s activated using --modified option.
…fied-mel-kaldi
Any preprint available of the paper mentioned? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation seems to be out of date w.r.t. the code here.
Can you please let me know if this configuration is the one that you are currently recommending, or did you change it somehow since this?
src/feat/mel-computations.h
Outdated
@@ -48,14 +48,16 @@ struct MelBanksOptions { | |||
BaseFloat vtln_low; // vtln lower cutoff of warping function. | |||
BaseFloat vtln_high; // vtln upper cutoff of warping function: if negative, added | |||
// to the Nyquist frequency to get the cutoff. | |||
bool modified; // If true, use 'modified' MFCC, which uses a breakpoint of | |||
// 900 instead of 700. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes to documentation needed here
@@ -69,6 +71,13 @@ struct MelBanksOptions { | |||
opts->Register("vtln-high", &vtln_high, | |||
"High inflection point in piecewise linear VTLN warping function" | |||
" (if negative, offset from high-mel-freq"); | |||
opts->Register("modified", &modified, | |||
"Modified MFCCs, based on paper 'An alternative to MFCCs for ASR' " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this documentation to be accurate and fix typos. (the stuff about 1nt and 2nd formant isn't accurate any more, I believe).
src/feat/mel-computations.cc
Outdated
a lot of bins, their diamter is defined by a formula and it's a function of | ||
the center frequency f of the bin: | ||
diameter = 30 + 60 f / (f + 500). | ||
so it increases from 30Hz to 90Hz with a knee around 500Hz. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation seems a bit out of date.
src/feat/mel-computations.h
Outdated
// breakpoint_ is 700 for normal mel, or 900 for modified. | ||
inline BaseFloat InverseMelScale(BaseFloat mel_freq) { | ||
if (sec_breakpoint_ > 0.0) | ||
return 3500.0 * (expf((expf(mel_freq) - breakpoint_) / 3500.0) - 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this should be sec_breakpoint_ instead of 3500.
src/feat/mel-computations.h
Outdated
// and for other purposes. | ||
BaseFloat breakpoint_; // The breakpoint in the mel scale: 700 normally; | ||
// 500 if opts.modified is true. | ||
BaseFloat sec_breakpoint_; // The second breakpoint used in the modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please call this either second_breakpoint_ or breakpoint2_.
src/feat/mel-computations.cc
Outdated
BaseFloat diameter_floor = (next_center - center_freq) * 1.1, | ||
diameter = 30.0 + 60.0 * (center_freq / (center_freq + breakpoint_)); | ||
|
||
diameter = pow(diameter * diameter + diameter_floor * diameter_floor, 0.5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sqrt would be easier than pow(.., 0.5).
Tangential thought unrelated to the contents of this MR. It pleases me that someone at last had a look at the feature engineering part of our overall business. MFCC were invented to drop as much "irrelevant" information as possible, when ASR was tiny and puny. With the DNN renaissance, our general approach has changed: just give the network all information you have, and let it figure out what is really correlated. I am not at all sure that the currently "standard" features discard mostly useless information. The field mostly got rid of HMMs (hooray!) which make no sense in modeling speech signals: they decay exponentially, which speech obviously do not, ye-e-e-e-eah. My general feeling is our features are another dinosaur that has outlived its time. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it. |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
…s activated using --modified option.