Skip to content

Commit 71e046b

Browse files
authored
E2E/Streaming Transformer/Conformer ASR (#578)
* add cmvn and label smoothing loss layer * add layer for transformer * add glu and conformer conv * add torch compatiable hack, mask funcs * not hack size since it exists * add test; attention * add attention, common utils, hack paddle * add audio utils * conformer batch padding mask bug fix #223 * fix typo, python infer fix rnn mem opt name error and batchnorm1d, will be available at 2.0.2 * fix ci * fix ci * add encoder * refactor egs * add decoder * refactor ctc, add ctc align, refactor ckpt, add warmup lr scheduler, cmvn utils * refactor docs * add fix * fix readme * fix bugs, refactor collator, add pad_sequence, fix ckpt bugs * fix docstring * refactor data feed order * add u2 model * refactor cmvn, test * add utils * add u2 config * fix bugs * fix bugs * fix autograd maybe has problem when using inplace operation * refactor data, build vocab; add format data * fix text featurizer * refactor build vocab * add fbank, refactor feature of speech * refactor audio feat * refactor data preprare * refactor data * model init from config * add u2 bins * flake8 * can train * fix bugs, add coverage, add scripts * test can run * fix data * speed perturb with sox * add spec aug * fix for train * fix train logitc * fix logger * log valid loss, time dataset process * using np for speed perturb, remove some debug log of grad clip * fix logger * fix build vocab * fix logger name * using module logger as default * fix * fix install * reorder imports * fix board logger * fix logger * kaldi fbank and mfcc * fix cmvn and print prarams * fix add_eos_sos and cmvn * fix cmvn compute * fix logger and cmvn * fix subsampling, label smoothing loss, remove useless * add notebook test * fix log * fix tb logger * multi gpu valid * fix log * fix log * fix config * fix compute cmvn, need paddle 2.1 * add cmvn notebook * fix layer tools * fix compute cmvn * add rtf * fix decoding * fix layer tools * fix log, add avg script * more avg and test info * fix dataset pickle problem; using 2.1 paddle; num_workers can > 0; ckpt save in exp dir;fix setup.sh; * add vimrc * refactor tiny script, add transformer and stream conf * spm demo; librisppech scripts and confs * fix log * add librispeech scripts * refactor data pipe; fix conf; fix u2 default params * fix bugs * refactor aishell scripts * fix test * fix cmvn * fix s0 scripts * fix ds2 scripts and bugs * fix dev & test dataset filter * fix dataset filter * filter dev * fix ckpt path * filter test, since librispeech will cause OOM, but all test wer will be worse, since mismatch train with test * add comment * add syllable doc * fix ds2 configs * add doc * add pypinyin tools * fix decoder using blank_id=0 * mmseg with pybind11 * format code
1 parent 3a2de9e commit 71e046b

File tree

446 files changed

+1414633
-2732
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

446 files changed

+1414633
-2732
lines changed

.clang-format

+2-2
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
---
1717
Language: Cpp
1818
BasedOnStyle: Google
19-
IndentWidth: 2
20-
TabWidth: 2
19+
IndentWidth: 4
20+
TabWidth: 4
2121
ContinuationIndentWidth: 4
2222
MaxEmptyLinesToKeep: 2
2323
AccessModifierOffset: -2 # The private/protected/public has no indent in class

.flake8

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
[flake8]
2+
3+
########## OPTIONS ##########
4+
# Set the maximum length that any line (with some exceptions) may be.
5+
max-line-length = 120
6+
7+
8+
################### FILE PATTERNS ##########################
9+
# Provide a comma-separated list of glob patterns to exclude from checks.
10+
exclude =
11+
# git folder
12+
.git,
13+
# python cache
14+
__pycache__,
15+
third_party/,
16+
# Provide a comma-separate list of glob patterns to include for checks.
17+
filename =
18+
*.py
19+
20+
21+
########## RULES ##########
22+
23+
# ERROR CODES
24+
#
25+
# E/W - PEP8 errors/warnings (pycodestyle)
26+
# F - linting errors (pyflakes)
27+
# C - McCabe complexity error (mccabe)
28+
#
29+
# W503 - line break before binary operator
30+
31+
# Specify a list of codes to ignore.
32+
ignore =
33+
W503
34+
E252,E262,E127,E265,E126,E266,E241,E261,E128,E125
35+
W291,W293,W605
36+
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
37+
# shebang has extra meaning in fbcode lints, so I think it's not worth trying
38+
# to line this up with executable bit
39+
EXE001,
40+
# these ignores are from flake8-bugbear; please fix!
41+
B007,B008,
42+
# these ignores are from flake8-comprehensions; please fix!
43+
C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
44+
45+
# Specify the list of error codes you wish Flake8 to report.
46+
select =
47+
E,
48+
W,
49+
F,
50+
C

.gitconfig

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
[alias]
2+
st = status
3+
ci = commit
4+
br = branch
5+
co = checkout
6+
df = diff
7+
l = log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
8+
ll = log --stat
9+
10+
[merge]
11+
tool = vimdiff
12+
13+
[core]
14+
excludesfile = ~/.gitignore
15+
editor = vim
16+
17+
[color]
18+
branch = auto
19+
diff = auto
20+
status = auto
21+
22+
[color "branch"]
23+
current = yellow reverse
24+
local = yellow
25+
remote = green
26+
27+
[color "diff"]
28+
meta = yellow bold
29+
frag = magenta bold
30+
old = red bold
31+
new = green bold
32+
33+
[color "status"]
34+
added = yellow
35+
changed = green
36+
untracked = cyan
37+
38+
[push]
39+
default = matching
40+
41+
[credential]
42+
helper = store
43+
44+
[user]
45+
name =
46+
email =
47+
48+

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,8 @@ tools/venv
55
*.log
66
*.pdmodel
77
*.pdiparams*
8+
*.zip
9+
*.tar
10+
*.tar.gz
11+
.ipynb_checkpoints
12+
*.npz

0 commit comments

Comments
 (0)