PyTorch Transformer Studies

The following repository contains:

A working Transformer implementation that uses nn.Transformer for language translation
Papers I've analyzed during my NLP research in separate Jupyter Notebooks
"Test" notebook that displays the mini-framework in action

Originally, I planned to analyze 5 papers for the Transformer architecture, but found 2 to be satisfactory. I'm not sure if this applies to everyone as I was already familiar with other NLP approaches and some techniques used in the Transformer before. For each paper, I followed these steps to make sure I grokked how Transformers work:

Identify main ideas
Try to replicate in PyTorch
Find alternatives online
Improve my solution and iterate
Hypothesize causes of inadequate results

This also stands as a documentation of obscure nn.Transformer modules. Some of their functions or passed parameters aren't explained well in the documentation. For example, I've identified the following to be unintuitive:

Feeding data - how do Transformers allow parallelization
Feeding data - when do we cut off <eos> and <sos> tokens
Data transformations and their dimensions during feed-forward
Padding masks, attention masks and in which layers they are applied

Consequently, I've documented these issues where they arise.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
.gitignore		.gitignore
0. Transformer.ipynb		0. Transformer.ipynb
1. Training Tips for the Transformer Model.ipynb		1. Training Tips for the Transformer Model.ipynb
LICENSE		LICENSE
LibTest.ipynb		LibTest.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Transformer Studies

About

Languages

License

Julius-Syvis/PyTorch-Transformer-Studies

Folders and files

Latest commit

History

Repository files navigation

PyTorch Transformer Studies

About

Topics

Resources

License

Stars

Watchers

Forks

Languages