Language Translation using Tensorflow's new seq2seq Library

Given that Tensorflow 1.2 has recently released a new library for sequence to sequence models and has made some of the old codes obsolete, this is an attempt to redo the Udacity Deep Learning Nanodegree Assigment 4 - Language Translation from English to French using the new seq2seq library.

Some changes were:

old APIs (simple_decoder_fn_train and dynamic_rnn_decoder) were replaced
new APIs (TrainingHelper/GreedyEmbeddingHelper, BasicDecoder and dynamic_decode) to create the training and inference decoders
additional placeholders needed to support the new APIs (source_sequence_length and target_sequence_length)

I also added some additional features to make this model seem more like the model in the paper Neural Machine Translation by Jointly Learning to Align and Translate by introducing:

Bidirectional LSTMs using tf.nn.bidirectional_dynamic_rnn. Using bidirectional LSTMs make sense as in the context of translation we do not only look backward but also forward as the use of a word at the back may influence how its used in the beginning (think grammar and vocab rules).

    enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(
                        cell_fw=enc_cell,
                        cell_bw=enc_cell,
                        sequence_length=source_sequence_length,
                        inputs=enc_embed_input,
                        dtype=tf.float32)

Bahdanau attention (additive) using a few functions as show below. Unlike the original seq2seq model by Sutskever et al. where the context vector of the encoder-decoder is fixed, the attention mechanism makes it such that every position in the output sequence has its own context vector. This allows the decoder to have a peek at every position in the encoder and builds up even longer term dependencies than with purely LSTMs.

    attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=rnn_size,
                    memory=encoder_output, memory_sequence_length=source_sequence_length)
    
    attn_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism, 
                                                   attention_layer_size=rnn_size)
    
    out_cell = tf.contrib.rnn.OutputProjectionWrapper(attn_cell, vocab_size, reuse=False)

To do (if I have time):

add beam search decoder instead of the current greedy decoding (where the output word is just chosen based on max probability). In this way, we consider top K contenders at each position instead of the top contender only. This makes more sense as the top word for position 1 may not be the top word for the entire sequence generated. It might be the second or third from top instead. See this link for more details.
train on bigger corpus (need GPU hours though)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Language Translation using Tensorflow's new seq2seq Library

Files

README.md

Latest commit

History

README.md

File metadata and controls

Language Translation using Tensorflow's new seq2seq Library