Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support target features #2227

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
ff6a605
Added target features support to build_vocab
anderleich Oct 24, 2022
0e5cc73
Merge branch 'v3.0' into support_target_features
anderleich Oct 24, 2022
e42c209
make prefetch_factor configurable in DataLoader
l-k-11235 Oct 26, 2022
ddf5ddf
bucket size ramp up
l-k-11235 Nov 3, 2022
24c0c4f
Merge pull request #2239 from vince62s/v3.0
vince62s Nov 3, 2022
69e09db
bucket size ramp up is configurable
l-k-11235 Nov 3, 2022
3f86f7d
more docs update
vince62s Nov 3, 2022
2dda423
changelog
vince62s Nov 3, 2022
f4cbb03
readme
vince62s Nov 3, 2022
ffac0bc
Merge pull request #2240 from vince62s/docs
vince62s Nov 3, 2022
9b79dfb
fixed flake error
l-k-11235 Nov 4, 2022
a981b43
fixed unit test error
l-k-11235 Nov 4, 2022
06232c0
Better docs
vince62s Nov 4, 2022
7f8d26a
typo
vince62s Nov 4, 2022
4584bfe
forgot to add moved modules
vince62s Nov 4, 2022
51b99f0
onmt_server works with ctranslate2
Ehsan-Jahanbakhsh Nov 6, 2022
21338bf
Fixed Indet.
Ehsan-Jahanbakhsh Nov 7, 2022
8132c8a
Merge pull request #2245 from Ehsan-Jahanbakhsh/ctranslate2server
vince62s Nov 7, 2022
793cba1
Add CTranslate2 in requirements
guillaumekln Sep 16, 2022
0937d36
Merge pull request #2247 from guillaumekln/add-ct2-requirement
vince62s Nov 7, 2022
bcded17
add comment for LM upgrade
vince62s Nov 8, 2022
d30366e
Merge pull request #2241 from vince62s/betterdoc
vince62s Nov 8, 2022
f8e69a2
Merge branch 'master' into facilitate-dataloading-optimization
l-k-11235 Nov 10, 2022
168f8df
fix LM_scoring with v3
vince62s Nov 11, 2022
13ad153
Merge pull request #2248 from vince62s/fixlmscoring
vince62s Nov 11, 2022
33b9110
LMprior with CT2 to infer LM model
vince62s Nov 15, 2022
2c756f8
simpler fix for special tokens order
vince62s Nov 15, 2022
27c51fb
back to list for consistent order
vince62s Nov 15, 2022
77e7ae7
moved comments in the docstring of build_dynamic_dataset_iter
l-k-11235 Nov 15, 2022
7de309a
Merge pull request #2246 from l-k-11235/facilitate-dataloading-optimi…
vince62s Nov 15, 2022
0df961f
better stats
vince62s Nov 17, 2022
c02029a
add onmt LM
vince62s Nov 18, 2022
8204f5e
LMprior working fine
vince62s Nov 22, 2022
9be00db
Merge pull request #2252 from vince62s/lmpriorv3
vince62s Nov 22, 2022
5686adc
Fix dynamic scoring (#2253)
l-k-11235 Nov 22, 2022
0373bd2
reinstate apex.amp (O1 O2) (#2220) (#2256)
vince62s Nov 23, 2022
3c4f6bc
v3.0.1 (#2257)
vince62s Nov 23, 2022
d61f22c
Update Translation.md
vince62s Dec 1, 2022
2eaeed3
Fix tensorboard logging (#2260)
l-k-11235 Dec 2, 2022
dd28db1
Fix validation scoring (#2263)
l-k-11235 Dec 2, 2022
08d2b99
fixes (#2265)
vince62s Dec 3, 2022
cadd99c
revisit tgt_prefix (#2267)
vince62s Dec 6, 2022
70799ae
Optimize validation scoring (#2266)
l-k-11235 Dec 7, 2022
874e18a
Bucket processing (#2261)
l-k-11235 Dec 7, 2022
9698acd
pickable Vocab / v3.0.2 (#2268)
vince62s Dec 7, 2022
9d617b8
Use native CrossEntropyLoss including label_smoothing + more optimisa…
vince62s Dec 9, 2022
b430e24
fixed coverage attention and translator for attn_debug (#2272)
sanghyuk-choi Dec 15, 2022
7ccfd23
Fix detok in scoring utils (#2271)
l-k-11235 Dec 15, 2022
ff9effd
fix no tgt at inference (#2273)
vince62s Dec 16, 2022
386e9be
keep Label Smoothing for Validation (same as Train) (#2274)
vince62s Dec 16, 2022
3b7c92b
revert approx normalization to accurate per item (#2275)
vince62s Dec 16, 2022
5f32750
Bump 3.0.3 (#2277)
vince62s Dec 19, 2022
0ed5dac
Wmt17 example (#2278)
vince62s Dec 20, 2022
6c70fc4
better batching (#2279)
vince62s Dec 20, 2022
f9c7ac8
fixes (#2281)
vince62s Dec 23, 2022
3ea1b66
fix LM scoring (#2284)
vince62s Dec 31, 2022
f64ac05
emb dropout in all cases (#2285)
vince62s Jan 3, 2023
edf4140
fix bad mistake in lm prior (#2286)
vince62s Jan 5, 2023
156f646
Merge remote-tracking branch 'upstream/master' into support_target_fe…
anderleich Jan 6, 2023
8b5600f
Update comment
anderleich Jan 6, 2023
a8e6fe6
Add target features support to training part
anderleich Jan 7, 2023
471e12c
Merge branch 'v3.0' into support_target_features
anderleich Jan 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 69 additions & 4 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,11 @@ jobs:
-word_vec_size 5 \
-report_every 5\
-hidden_size 10 \
-train_steps 10
- name: Test RNN training with copy
-train_steps 10 \
-tensorboard "true" \
-tensorboard_log_dir /tmp/logs_train
python onmt/tests/test_events.py --logdir /tmp/logs_train -tensorboard_checks train
- name: Test RNN training and validation with copy
run: |
python train.py \
-config data/data.yaml \
Expand All @@ -93,8 +96,11 @@ jobs:
-word_vec_size 5 \
-report_every 5 \
-hidden_size 10 \
-train_steps 10 \
-train_steps 10 -valid_steps 5 \
-tensorboard "true" \
-tensorboard_log_dir /tmp/logs_train_valid \
-copy_attn
python onmt/tests/test_events.py --logdir /tmp/logs_train_valid -tensorboard_checks train_valid
- name: Test RNN training with coverage
run: |
python train.py \
Expand All @@ -116,7 +122,6 @@ jobs:
-tgt_vocab /tmp/onmt.vocab.tgt \
-src_vocab_size 1000 \
-tgt_vocab_size 1000 \
-max_generator_batches 0 \
-encoder_type transformer \
-decoder_type transformer \
-layers 4 \
Expand All @@ -133,6 +138,66 @@ jobs:
-attention_dropout 0.2 0.1 0.1 \
-report_every 5 \
-train_steps 10
- name : Test Transformer training with dynamic scoring
run: |
python3 train.py \
-config data/data.yaml \
-src_vocab /tmp/onmt.vocab.src \
-tgt_vocab /tmp/onmt.vocab.tgt \
-src_vocab_size 1000 \
-tgt_vocab_size 1000 \
-encoder_type transformer \
-decoder_type transformer \
-layers 4 \
-word_vec_size 16 \
-hidden_size 16 \
-num_workers 0 -bucket_size 1024 \
-heads 2 \
-transformer_ff 64 \
-num_workers 0 -bucket_size 1024 \
-accum_count 2 4 8 \
-accum_steps 0 15000 30000 \
-save_model /tmp/onmt.model \
-train_steps 20 \
-report_every 5 \
-train_eval_steps 10 \
-train_metrics "BLEU" "TER" \
-tensorboard "true" \
-scoring_debug "true" \
-tensorboard_log_dir /tmp/logs_train_metrics \
-dump_preds /tmp/dump_preds
python onmt/tests/test_events.py --logdir /tmp/logs_train_metrics -tensorboard_checks train_metrics
- name : Test Transformer training and validation with dynamic scoring and copy
run: |
python3 train.py \
-config data/data.yaml \
-src_vocab /tmp/onmt.vocab.src \
-tgt_vocab /tmp/onmt.vocab.tgt \
-src_vocab_size 1000 \
-tgt_vocab_size 1000 \
-encoder_type transformer \
-decoder_type transformer \
-layers 4 \
-word_vec_size 16 \
-hidden_size 16 \
-num_workers 0 -bucket_size 1024 \
-heads 2 \
-transformer_ff 64 \
-num_workers 0 -bucket_size 1024 \
-accum_count 2 4 8 \
-accum_steps 0 15000 30000 \
-save_model /tmp/onmt.model \
-train_steps 10 -valid_steps 5 \
-report_every 2 \
-train_eval_steps 8 \
-train_metrics "BLEU" "TER" \
-valid_metrics "BLEU" "TER" \
-tensorboard "true" \
-scoring_debug "true" \
-tensorboard_log_dir /tmp/logs_train_valid_metrics \
-dump_preds /tmp/dump_preds \
-copy_attn
python onmt/tests/test_events.py --logdir /tmp/logs_train_valid_metrics -tensorboard_checks train_valid_metrics
- name: Test LM training
run: |
python train.py \
Expand Down
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,40 @@


## [Unreleased]
## [3.0.3](https://github.com/OpenNMT/OpenNMT-py/tree/3.0.3) (2022-12-16)
* fix loss normalization when using accum or nb GPU > 1
* use native CrossEntropyLoss with Label Smoothing. reported loss/ppl impacted by LS
* fix long-time coverage loss bug thanks Sanghyuk-Choi
* fix detok at scoring / fix tokenization Subword_nmt + Sentencepiece
* various small bugs fixed

## [3.0.2](https://github.com/OpenNMT/OpenNMT-py/tree/3.0.2) (2022-12-07)
* pyonmttok.Vocab is now pickable. dataloader switched to spawn. (MacOS/Windows compatible)
* fix scoring with specific metrics (BLEU, TER)
* fix tensorboard logging
* fix dedup in batch iterator (only for TRAIN, was happening at inference also)
* New: Change: tgt_prefix renamed to tgt_file_prefix
* New: tgt_prefix / src_prefix used for "prefix" Transform (onmt/transforms/misc.py)
* New: process transforms of buckets in batches (vs per example) / faster

## [3.0.1](https://github.com/OpenNMT/OpenNMT-py/tree/3.0.1) (2022-11-23)

* fix dynamic scoring
* reinstate apex.amp level O1/O2 for benchmarking
* New: LM distillation for NMT training
* New: bucket_size ramp-up to avoid slow start
* fix special tokens order
* remove Library and add link to Yasmin's Tuto

## [3.0.0](https://github.com/OpenNMT/OpenNMT-py/tree/3.0.0) (2022-11-3)

* Removed completely torchtext. Use [Vocab object of pyonmttok](https://github.com/OpenNMT/Tokenizer/tree/master/bindings/python#vocabulary) instead
* Dataloading changed accordingly with the use of pytorch Dataloader (num_workers)
* queue_size / pool_factor no longer needed. bucket_size optimal value > 64K
* options renamed: rnn_size => hidden_size (enc/dec_rnn_size => enc/dec_hid_size)
* new tools/convertv2_v3.py to upgrade v2 models.pt
* inference with length_penalty=avg is now the default
* add_qkvbias (default false, but true for old model)

## [2.3.0](https://github.com/OpenNMT/OpenNMT-py/tree/2.3.0) (2022-09-14)

Expand Down
36 changes: 30 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ Unless there is a bug, please use the [forum](https://forum.opennmt.net) or [Git

----

There is a new step-by-step and explained tuto (Thanks to Yasmin Moslem) here:
Please try to read and/or follow before raising newbie issues [Tutorial](https://github.com/ymoslem/OpenNMT-Tutorial)

----

# OpenNMT-py 3.0

**We're happy to announce the release v3.0 of OpenNMT-py.**
Expand Down Expand Up @@ -52,14 +57,34 @@ If you want to optimize the training performance:

### Breaking changes

A few features were dropped between v1 and v2:
Changes between v2 and v3:

Options removed:
`queue_size`, `pool_factor` are no longer needed. Only adjust the `bucket_size` to the number of examples to be loaded by each `num_workers` of the pytorch Dataloader.

New options:
`num_workers`: number of workers for each process. If you run on one GPU the recommended value is 4. If you run on more than 1 GPU, the recommended value is 2
`add_qkvbias`: default is false. However old model trained with v2 will be set at true. The original transformer paper used no bias for the Q/K/V nn.Linear of the multihead attention module.

Options renamed:
`rnn_size` => `hidden_size`
`enc_rnn_size` => `enc_hid_size`
`dec_rnn_size` => `dec_hid_size`

Note: `tools/convertv2_v3.py` will modify these options stored in the checkpoint to make things compatible with v3.0

Inference:
The translator will use the same dynamic_iterator as the trainer.
The new default for inference is `length_penalty=avg` which will provide better BLEU scores in most cases (and comparable to other toolkits defaults)



Reminder: a few features were dropped between v1 and v2:

- audio, image and video inputs;

For any user that still need these features, the previous codebase will be retained as `legacy` in a separate branch. It will no longer receive extensive development from the core team but PRs may still be accepted.

- For inference, we default to length_penalty: avg which usually gives better BLEU and is comparable to other toolkits.

Feel free to check it out and let us know what you think of the new paradigm!

----
Expand All @@ -79,7 +104,7 @@ Table of Contents

OpenNMT-py requires:

- Python >= 3.6
- Python >= 3.7
- PyTorch >= 1.9.0

Install `OpenNMT-py` from `pip`:
Expand All @@ -104,8 +129,7 @@ pip install -r requirements.opt.txt

## Features

- :warning: **New in OpenNMT-py 2.0**: [On the fly data processing]([here](https://opennmt.net/OpenNMT-py/FAQ.html#what-are-the-readily-available-on-the-fly-data-transforms).)

- [On the fly data processing]([here](https://opennmt.net/OpenNMT-py/FAQ.html#what-are-the-readily-available-on-the-fly-data-transforms).)
- [Encoder-decoder models with multiple RNN cells (LSTM, GRU) and attention types (Luong, Bahdanau)](https://opennmt.net/OpenNMT-py/options/train.html#model-encoder-decoder)
- [Transformer models](https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model)
- [Copy and Coverage Attention](https://opennmt.net/OpenNMT-py/options/train.html#model-attention)
Expand Down
2 changes: 0 additions & 2 deletions config/config-transformer-base-1GPU.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ normalization: tokens
dropout: 0.1
label_smoothing: 0.1

max_generator_batches: 2

param_init: 0.0
param_init_glorot: 'true'
position_encoding: 'true'
Expand Down
2 changes: 0 additions & 2 deletions config/config-transformer-base-4GPU.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ normalization: tokens
dropout: 0.1
label_smoothing: 0.1

max_generator_batches: 2

param_init: 0.0
param_init_glorot: 'true'
position_encoding: 'true'
Expand Down
2 changes: 1 addition & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

> python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/data -src_vocab_size 1000 -tgt_vocab_size 1000

> python train.py -data data/data -save_model /n/rush_lab/data/tmp_ -world_size 1 -gpu_ranks 0 -rnn_size 100 -word_vec_size 50 -layers 1 -train_steps 100 -optim adam -learning_rate 0.001
> python train.py -data data/data -save_model /n/rush_lab/data/tmp_ -world_size 1 -gpu_ranks 0 -hidden_size 100 -word_vec_size 50 -layers 1 -train_steps 100 -optim adam -learning_rate 0.001
Loading