Transformer is introduced in Attention is all your need, it has become the most popular NLP model.
Let's first recall its allpication in NLP.
The decoder will use the memory result from encoder, here in tranformer, all docoder use the same memory from the top layer of encoder input. While in wavenet, we use each layer's interim result like FPN structure a little.