Skip to content

Commit 94cdad7

Browse files
tdomhanTobias Domhan
andauthored
Neural vocabulary selection. (#1046)
Co-authored-by: Tobias Domhan <[email protected]>
1 parent 63286ff commit 94cdad7

24 files changed

+930
-163
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ Note that Sockeye has checks in place to not translate with an old model that wa
1111

1212
Each version section may have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.
1313

14+
## [3.1.14]
15+
16+
### Added
17+
- Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).
18+
- To use NVS simply specify `--neural-vocab-selection` to `sockeye-train`. This will train a model with Neural Vocabulary Selection that is automatically used by `sockeye-translate`. If you want look at translations without vocabulary selection specify `--skip-nvs` as an argument to `sockeye-translate`.
19+
1420
## [3.1.13]
1521

1622
### Added

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,17 +84,18 @@ For more information about Sockeye, see our papers ([BibTeX](sockeye.bib)).
8484
## Research with Sockeye
8585

8686
Sockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below.
87-
If you know more, please let us know or submit a pull request (last updated: April 2022).
87+
If you know more, please let us know or submit a pull request (last updated: May 2022).
8888

8989
### 2022
9090
* Weller-Di Marco, Marion, Matthias Huck, Alexander Fraser. "Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies
9191
". arXiv preprint arXiv:2203.13550 (2022)
92+
* Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber. "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation". Proceedings of NAACL-HLT (2022)
9293

9394
### 2021
9495

9596
* Bergmanis, Toms, Mārcis Pinnis. "Facilitating Terminology Translation with Target Lemma Annotations". arXiv preprint arXiv:2101.10035 (2021)
9697
* Briakou, Eleftheria, Marine Carpuat. "Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation". arXiv preprint arXiv:2105.15087 (2021)
97-
* Hasler, Eva, Tobias Domhan, Jonay Trenous, Ke Tran, Bill Byrne, Felix Hieber. "Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation". Proceedings of EMNLP (2021)
98+
* Hasler, Eva, Tobias Domhan, Sony Trenous, Ke Tran, Bill Byrne, Felix Hieber. "Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation". Proceedings of EMNLP (2021)
9899
* Tang, Gongbo, Philipp Rönchen, Rico Sennrich, Joakim Nivre. "Revisiting Negation in Neural Machine Translation". Transactions of the Association for Computation Linguistics 9 (2021)
99100
* Vu, Thuy, Alessandro Moschitti. "Machine Translation Customization via Automatic Training Data Selection from the Web". arXiv preprint arXiv:2102.1024 (2021)
100101
* Xu, Weijia, Marine Carpuat. "EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints." Transactions of the Association for Computation Linguistics 9 (2021)

docs/training.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,3 +175,13 @@ that can be enabled by setting `--length-task`, respectively, to `ratio` or to `
175175
Specify `--length-task-layers` to set the number of layers in the prediction MLP.
176176
The weight of the loss in the global training objective is controlled with `--length-task-weight` (standard cross-entropy loss has weight 1.0).
177177
During inference the predictions can be used to reward longer translations by enabling `--brevity-penalty-type`.
178+
179+
180+
## Neural Vocabulary Selection (NVS)
181+
182+
When Neural Vocabulary Selection (NVS) gets enabled a target bag-of-word model will be trained.
183+
During decoding the output vocabulary gets reduced to the set of predicted target words speeding up decoding
184+
This is similar to using `--restrict-lexicon` for `sockeye-translate` with the advantage that no external alignment model is required and that the contextualized hidden encoder representations are used to predict the set of target words.
185+
To use NVS simply specify `--neural-vocab-selection` to `sockeye-train`.
186+
This will train a model with NVS that is automatically used by `sockeye-translate`.
187+
If you want look at translations without vocabulary selection specify `--skip-nvs` as an argument to `sockeye-translate`.

sockeye/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111
# express or implied. See the License for the specific language governing
1212
# permissions and limitations under the License.
1313

14-
__version__ = '3.1.13'
14+
__version__ = '3.1.14'

sockeye/arguments.py

Lines changed: 70 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -326,18 +326,23 @@ def add_rerank_args(params):
326326
help="Returns the reranking scores as scores in output JSON objects.")
327327

328328

329-
def add_lexicon_args(params):
329+
def add_lexicon_args(params, is_for_block_lexicon: bool = False):
330330
lexicon_params = params.add_argument_group("Model & Top-k")
331331
lexicon_params.add_argument("--model", "-m", required=True,
332332
help="Model directory containing source and target vocabularies.")
333-
lexicon_params.add_argument("-k", type=int, default=200,
334-
help="Number of target translations to keep per source. Default: %(default)s.")
333+
if not is_for_block_lexicon:
334+
lexicon_params.add_argument("-k", type=int, default=200,
335+
help="Number of target translations to keep per source. Default: %(default)s.")
335336

336337

337-
def add_lexicon_create_args(params):
338+
def add_lexicon_create_args(params, is_for_block_lexicon: bool = False):
338339
lexicon_params = params.add_argument_group("I/O")
340+
if is_for_block_lexicon:
341+
input_help = "A text file with tokens that shall be blocked. All token must be in the model vocabulary."
342+
else:
343+
input_help = "Probabilistic lexicon (fast_align format) to build top-k lexicon from."
339344
lexicon_params.add_argument("--input", "-i", required=True,
340-
help="Probabilistic lexicon (fast_align format) to build top-k lexicon from.")
345+
help=input_help)
341346
lexicon_params.add_argument("--output", "-o", required=True, help="File name to write top-k lexicon to.")
342347

343348

@@ -743,6 +748,21 @@ def add_model_parameters(params):
743748
'PyTorch AMP with some additional risk and requires installing Apex: '
744749
'https://github.com/NVIDIA/apex')
745750

751+
model_params.add_argument('--neural-vocab-selection',
752+
type=str,
753+
default=None,
754+
choices=C.NVS_TYPES,
755+
help='When enabled the model contains a neural vocabulary selection model that restricts '
756+
'the target output vocabulary to speed up inference.'
757+
'logit_max: predictions are made per source token and combined by max pooling.'
758+
'eos: the prediction is based on the hidden representation of the <eos> token.')
759+
760+
model_params.add_argument('--neural-vocab-selection-block-loss',
761+
action='store_true',
762+
help='When enabled, gradients for NVS are blocked from propagating back to the encoder. '
763+
'This means that NVS learns to work with the main model\'s representations but '
764+
'does not influence its training.')
765+
746766

747767
def add_batch_args(params, default_batch_size=4096, default_batch_type=C.BATCH_TYPE_WORD):
748768
params.add_argument('--batch-size', '-b',
@@ -773,6 +793,25 @@ def add_batch_args(params, default_batch_size=4096, default_batch_type=C.BATCH_T
773793
'size 10240). Default: %(default)s.')
774794

775795

796+
def add_nvs_train_parameters(params):
797+
params.add_argument(
798+
'--bow-task-weight',
799+
type=float_greater_or_equal(0.0),
800+
default=1.0,
801+
help=
802+
'The weight of the auxiliary Bag-of-word (BOW) loss when --neural-vocab-selection is enabled. Default %(default)s.'
803+
)
804+
805+
params.add_argument(
806+
'--bow-task-pos-weight',
807+
type=float_greater_or_equal(0.0),
808+
default=10,
809+
help='The weight of the positive class (the set of words present on the target side) for the BOW loss '
810+
'when --neural-vocab-selection is set as x * num_negative_class / num_positive_class where x is the '
811+
'--bow-task-pos-weight. Higher values will bias more towards recall, resulting in larger vocabularies '
812+
'at test time trading off larger vocabularies for higher translation quality. Default %(default)s.')
813+
814+
776815
def add_training_args(params):
777816
train_params = params.add_argument_group("Training parameters")
778817

@@ -803,6 +842,8 @@ def add_training_args(params):
803842
default=1,
804843
help='Number of fully-connected layers for predicting the length ratio. Default %(default)s.')
805844

845+
add_nvs_train_parameters(train_params)
846+
806847
train_params.add_argument('--target-factors-weight',
807848
type=float,
808849
nargs='+',
@@ -1203,18 +1244,38 @@ def add_inference_args(params):
12031244
nargs='+',
12041245
type=multiple_values(num_values=2, data_type=str),
12051246
default=None,
1206-
help="Specify top-k lexicon to restrict output vocabulary to the k most likely context-"
1207-
"free translations of the source words in each sentence (Devlin, 2017). See the "
1208-
"lexicon module for creating top-k lexicons. To use multiple lexicons, provide "
1247+
help="Specify block or top-k lexicon. A top-k lexicon will pose a positive constraint, "
1248+
"by providing the set of allowed target words. While a blocking lexicon poses a "
1249+
"negative constraint on providing a set of target words to be avoided. "
1250+
"Specifically, a top-k lexicon will restrict the output vocabulary to the k most "
1251+
"likely context-free translations of the source words in each sentence "
1252+
"(Devlin, 2017). See the lexicon module for creating lexicons, i.e. by running "
1253+
"sockeye-lexicon. To use multiple lexicons, provide "
12091254
"'--restrict-lexicon key1:path1 key2:path2 ...' and use JSON input to specify the "
12101255
"lexicon for each sentence: "
12111256
"{\"text\": \"some input string\", \"restrict_lexicon\": \"key\"}. "
1257+
"If a single lexicon is specified it will be applied to all inputs. "
1258+
"If multiple lexica are specified they can be selected via the JSON input or it "
1259+
"can be skipped by not providing a lexicon in the JSON input. "
12121260
"Default: %(default)s.")
12131261
decode_params.add_argument('--restrict-lexicon-topk',
12141262
type=int,
12151263
default=None,
12161264
help="Specify the number of translations to load for each source word from the lexicon "
1217-
"given with --restrict-lexicon. Default: Load all entries from the lexicon.")
1265+
"given with --restrict-lexicon top-k lexicon. "
1266+
"Default: Load all entries from the lexicon.")
1267+
1268+
decode_params.add_argument('--skip-nvs',
1269+
action='store_true',
1270+
help='Manually turn off Neural Vocabulary Selection (NVS) to do a softmax over the full target vocabulary.',
1271+
default=False)
1272+
1273+
decode_params.add_argument('--nvs-thresh',
1274+
type=float,
1275+
help='The probability threshold for a word to be added to the set of target words. '
1276+
'Default: 0.5.',
1277+
default=0.5)
1278+
12181279
decode_params.add_argument('--strip-unknown-words',
12191280
action='store_true',
12201281
default=False,

0 commit comments

Comments
 (0)