|
| 1 | +# Extra materials for ml-mipt course |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +1. [en] Stanford lectures on Probability Theory: |
| 6 | + [link](https://web.stanford.edu/~montanar/TEACHING/Stat310A/lnotes.pdf) |
| 7 | +1. [en] Matrix calculus notes from Stanford: |
| 8 | + [link](http://cs231n.stanford.edu/vecDerivs.pdf) |
| 9 | +1. [en] Derivatives notes from Stanford: |
| 10 | + [link](http://cs231n.stanford.edu/handouts/derivatives.pdf) |
| 11 | + |
| 12 | +## Basic Machine Learning |
| 13 | + |
| 14 | +1. [en] The Hundred-page Machine Learning book: [link](http://themlbook.com) |
| 15 | + (available online, e.g. on the |
| 16 | + [github](https://github.com/ZakiaSalod/The-Hundred-Page-Machine-Learning-Book)) |
| 17 | +1. [ru] Отличные лекции Жени Соколова. Читать pdf, лучше всего наиболее |
| 18 | + актуальный год: [link](https://github.com/esokolov/ml-course-hse) |
| 19 | +1. [en] Naive Bayesian classifier explained: |
| 20 | + [link](https://machinelearningmastery.com/classification-as-conditional-probability-and-the-naive-bayes-algorithm/) |
| 21 | +1. [en] Stanford notes on linear models: |
| 22 | + [link](http://cs229.stanford.edu/notes/cs229-notes1.pdf) |
| 23 | +1. [ru] “Рукописный учебник” от студентов нашего курса на ФИВТе: |
| 24 | + [link](https://github.com/ml-mipt/ml-mipt/blob/master/ML_informal_notes.pdf) |
| 25 | +1. [ru] Методичка Воронцова, |
| 26 | + [link](http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf) |
| 27 | +1. [ru] Замечательная книжка В.Г. Спокойного про линейные оценки: |
| 28 | + [link](http://strlearn.ru/wp-content/uploads/2017/01/script2018-5.pdf) |
| 29 | + |
| 30 | +## Bootstrap and bias-variance decomposition |
| 31 | + |
| 32 | +1. [en] Detailed description of bootstrap procedure: |
| 33 | + [link](http://www.math.ntu.edu.tw/~hchen/teaching/LargeSample/notes/notebootstrap.pdf) |
| 34 | +1. [en] Bias-variance tradeoff in more general case: A Unified Bias-Variance |
| 35 | + Decomposition and its Applications |
| 36 | + [link](https://homes.cs.washington.edu/~pedrod/papers/mlc00a.pdf) |
| 37 | + |
| 38 | +## Gradient Boosting and Feature importances |
| 39 | + |
| 40 | +1. [en] Great interactive blogpost by Alex Rogozhnikov on Gradient Boosting: |
| 41 | + http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html |
| 42 | +1. [en] And great gradient boosted trees playground by Alex Rogozhnikov: |
| 43 | + http://arogozhnikov.github.io/2016/07/05/gradient_boosting_playground.html |
| 44 | +1. [en] Shap values repo and explanation: https://github.com/slundberg/shap |
| 45 | +1. [en] Kaggle tutorial on feature importances: |
| 46 | + https://www.kaggle.com/learn/machine-learning-explainability |
| 47 | + |
| 48 | +## Deep Learning |
| 49 | + |
| 50 | +1. [en] Deep Learning book.\ |
| 51 | + Classical. Delivers comprehensive overview of almost all vital themes in ML and |
| 52 | + DL. Available online at https://www.deeplearningbook.org |
| 53 | +1. [en] Notes on vector and matrix derivatives: |
| 54 | + http://cs231n.stanford.edu/vecDerivs.pdf |
| 55 | +1. [en] More notes on matrix derivatives from Stanford: |
| 56 | + [link](http://cs231n.stanford.edu/handouts/derivatives.pdf) |
| 57 | +1. [en] Stanford notes on backpropagation: |
| 58 | + http://cs231n.github.io/optimization-2/ |
| 59 | +1. [en] Stanford notes on different activation functions (and just intuition): |
| 60 | + http://cs231n.github.io/neural-networks-1/ |
| 61 | +1. [en] Great post on Medium by Andrej Karpathy: |
| 62 | + https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b |
| 63 | +1. [en] CS231n notes on data preparation (batch normalization over there): |
| 64 | + http://cs231n.github.io/neural-networks-2/ |
| 65 | +1. [en] CS231n notes on gradient methods: |
| 66 | + http://cs231n.github.io/neural-networks-3/ |
| 67 | +1. [en] Original paper introducing Batch Normalization: |
| 68 | + https://arxiv.org/pdf/1502.03167.pdf |
| 69 | +1. [en] What Every Computer Scientist Should Know About Floating-Point |
| 70 | + Arithmetic: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html |
| 71 | +1. [en] The Unreasonable Effectiveness of Recurrent Neural Networks blog post by |
| 72 | + Andrej Karpathy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ |
| 73 | +1. [en] Understanding LSTM Networks: |
| 74 | + http://colah.github.io/posts/2015-08-Understanding-LSTMs/ |
| 75 | +1. [en] CS231n notes on data preparation: |
| 76 | + http://cs231n.github.io/neural-networks-2/ |
| 77 | +1. [en] Convolutional Neural Networks: Architectures, Convolution / Pooling |
| 78 | + Layers: http://cs231n.github.io/convolutional-networks/ |
| 79 | +1. [en] Understanding and Visualizing Convolutional Neural Networks: |
| 80 | + http://cs231n.github.io/understanding-cnn/ |
| 81 | +1. [en] LR warm-up and useful tricks - |
| 82 | + [article](https://arxiv.org/abs/1812.01187) |
| 83 | + |
| 84 | +## Natural Language Processing |
| 85 | + |
| 86 | +1. [en] Great resource by Lena Voita (direct link to Word Embeddings |
| 87 | + explanation): https://lena-voita.github.io/nlp_course/word_embeddings.html |
| 88 | +1. [en] Word2vec tutorial: |
| 89 | + http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/ |
| 90 | +1. [en] Beautiful post by Jay Alammar on word2vec: |
| 91 | + http://jalammar.github.io/illustrated-word2vec/ |
| 92 | +1. [en] Blog post about text classification with RNNs and CNNs blogpost: |
| 93 | + https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f |
| 94 | +1. [en] Convolutional Neural Networks for Sentence Classification: |
| 95 | + https://arxiv.org/abs/1408.5882 |
| 96 | +1. [en] Great blog post by Jay Alammar on Transformer: |
| 97 | + https://jalammar.github.io/illustrated-transformer/ |
| 98 | +1. Notebook on positional encoding: |
| 99 | + [link](https://github.com/ml-mipt/ml-mipt/blob/advanced/week04_Transformer/week04_positional_encoding_carriers.ipynb) |
| 100 | +1. [en] Great Annotated Transformer article with code and comments by Harvard |
| 101 | + NLP group: https://nlp.seas.harvard.edu/2018/04/03/attention.html |
| 102 | +1. [en] Harvard NLP |
| 103 | + [full Transformer implementation in PyTorch](http://nlp.seas.harvard.edu/2018/04/03/attention.html) |
| 104 | +1. [en] OpenAI blog post |
| 105 | + [Better Language Models and Their Implications (GPT-2)](https://openai.com/blog/better-language-models/) |
| 106 | +1. [en] Paper describing positional encoding |
| 107 | + ["Convolutional Sequence to Sequence Learning"](https://arxiv.org/pdf/1705.03122) |
| 108 | +1. [en] Paper presenting [Layer Normalization](https://arxiv.org/abs/1607.06450) |
| 109 | +1. [en] The Illustrated BERT |
| 110 | + [blog post](http://jalammar.github.io/illustrated-bert/) |
| 111 | +1. [en] DistillBERT overview (distillation will be covered later in our course) |
| 112 | + [blog post](https://medium.com/huggingface/distilbert-8cf3380435b5) |
| 113 | +1. [en] Google AI Blog |
| 114 | + [post about open sourcing BERT](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html) |
| 115 | +1. [en] OpenAI blog post |
| 116 | + [Better Language Models and Their Implications (GPT-2)](https://openai.com/blog/better-language-models/) |
| 117 | +1. [en] One more |
| 118 | + [blog post explaining BERT](https://yashuseth.blog/2019/06/12/bert-explained-faqs-understand-bert-working/) |
| 119 | +1. [en] |
| 120 | + [Post about GPT-2 in OpenAI blog (by 04.10.2019)](https://openai.com/blog/fine-tuning-gpt-2/) |
| 121 | + |
| 122 | +## Graph Neural Networks |
| 123 | + |
| 124 | +1. [en] |
| 125 | + [Introduction to Graph Neural Networks](https://www.morganclaypool.com/doi/10.2200/S00980ED1V01Y202001AIM045) |
| 126 | +1. [en] Grear [repo](https://github.com/thunlp/GNNPapers) with must-read papers |
| 127 | + on GNN |
| 128 | +1. [en] Reinforcement Learning: An introduction by Richard S. Sutton and Andrew |
| 129 | + G. Barto: [link](http://incompleteideas.net/book/the-book-2nd.html) |
0 commit comments