Skip to content

Latest commit

 

History

History
22 lines (11 loc) · 602 Bytes

transformer.md

File metadata and controls

22 lines (11 loc) · 602 Bytes

Transformer

Introduction

Transformer is introduced in Attention is all your need, it has become the most popular NLP model.

Let's first recall its allpication in NLP.

Encoder

Decoder

The decoder will use the memory result from encoder, here in tranformer, all docoder use the same memory from the top layer of encoder input. While in wavenet, we use each layer's interim result like FPN structure a little.

Training

Performance

Further reading

The Annotated Transformer