WMT'14 En-De results #1

ganeshjawahar · 2023-02-12T23:11:24Z

Hi, thanks for sharing the code. Based on the instructions in the README, I tried to reproduce WMT'14 En-De Fully NAT + KD + CTC results. I got only 25.2 test BLEU with the following commands, while the paper reports 26.5 test BLEU (table 2). Similarly, there's discrepancy for KD + CTC + VAE (26.22 vs. 27.49) and KD + CTC + GLAT (26.16 vs. 27.2) results too.

@MultiPath @shawnkx : Can you please help?
Thanks in advance.

Software versions:
torch 1.7.1
fairseq 0.9.0
python 3.6.13

python train.py Fully-NAT/data/wmt14.en-de.dist.bin
--fp16
--left-pad-source False --left-pad-target False
--arch cmlm_transformer_ctc --task translation_lev
--noise 'full_mask' --valid-noise 'full_mask'
--dynamic-upsample --src-upsample 3
--decoder-learned-pos --encoder-learned-pos
--apply-bert-init --share-all-embeddings
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-06
--clip-norm 2.4 --dropout 0.3 --lr-scheduler inverse_sqrt
--warmup-init-lr 1e-07 --warmup-updates 10000 --lr 0.0005 --min-lr 1e-09
--criterion nat_loss --predict-target 'all' --loss-type 'ctc'
--axe-eps --force-eps-zero
--label-smoothing 0.1 --weight-decay 0.01
--max-tokens 4096 --update-freq 1
--max-update 300000 --save-dir fullynat/feb10_kd_ctc --save-interval-updates 5000
--no-epoch-checkpoints --keep-interval-updates 10 --keep-best-checkpoints 5
--seed 2 --log-interval 100 --no-progress-bar
--eval-bleu --eval-bleu-args '{"iter_decode_max_iter":0,"iter_decode_collapse_repetition":true}'
--eval-bleu-detok 'space'
--eval-tokenized-bleu --eval-bleu-remove-bpe '@@ ' --eval-bleu-print-samples
--best-checkpoint-metric bleu --maximize-best-checkpoint-metric
--tensorboard-logdir fullynat/feb10_kd_ctc/tensorboard

python fairseq_cli/generate.py Fully-NAT/data/wmt14.en-de.dist.bin
--task translation_lev
--path fullynat/feb10_kd_ctc/checkpoint_best.pt
--gen-subset test
--axe-eps --iter-decode-collapse-repetition --force-eps-zero
--left-pad-source False --left-pad-target False
--iter-decode-max-iter 0 --beam 1
--remove-bpe --batch-size 10 \

MultiPath · 2023-02-14T21:08:33Z

How many gpus did you train on?

ganeshjawahar · 2023-02-15T06:59:39Z

single node, 4 NVIDIA V100 GPUs (each about 16GB GPU)

shawnkx · 2023-04-04T22:40:04Z

Since our batch size is 128K tokens, you need to make sure --max_tokens * num_devices * --update-freq = 128k.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WMT'14 En-De results #1

WMT'14 En-De results #1

ganeshjawahar commented Feb 12, 2023

MultiPath commented Feb 14, 2023

ganeshjawahar commented Feb 15, 2023

shawnkx commented Apr 4, 2023 •

edited

Loading

WMT'14 En-De results #1

WMT'14 En-De results #1

Comments

ganeshjawahar commented Feb 12, 2023

MultiPath commented Feb 14, 2023

ganeshjawahar commented Feb 15, 2023

shawnkx commented Apr 4, 2023 • edited Loading

shawnkx commented Apr 4, 2023 •

edited

Loading