You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"<h1>Finetune GPT-2 with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune GPT-2 with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
2
+
"<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
3
3
"<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n": "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n",
4
4
"<h3>Initialize the model, optimizer and dataloader</h3>\n": "<h3>Initialize the model, optimizer and dataloader</h3>\n",
5
5
"<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n": "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n",
6
6
"<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n": "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n",
"<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n": "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n",
18
+
"<p>Get cross entropy loss </p>\n": "<p>Get cross entropy loss </p>\n",
"<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n",
11
21
"<p>Initialize the data loader </p>\n": "<p>Initialize the data loader </p>\n",
12
-
"<p>Initialize the model </p>\n": "<p>Initialize the model </p>\n",
13
22
"<p>Initialize the optimizer </p>\n": "<p>Initialize the optimizer </p>\n",
14
23
"<p>LoRA rank </p>\n": "<p>LoRA rank </p>\n",
15
-
"<p>Load out model </p>\n": "<p>Load out model </p>\n",
24
+
"<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n": "<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n",
16
25
"<p>Load pre-trained model weights </p>\n": "<p>Load pre-trained model weights </p>\n",
17
26
"<p>Load the huggingface model and get the parameters </p>\n": "<p>Load the huggingface model and get the parameters </p>\n",
27
+
"<p>Log the loss </p>\n": "<p>Log the loss </p>\n",
"<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n": "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n",
36
+
"<p>make sure that only lora weights are not loaded </p>\n": "<p>make sure that only lora weights are not loaded </p>\n",
22
37
"Finetune GPT-2 with LoRA": "Finetune GPT-2 with LoRA",
23
38
"This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA.": "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA."
"<p> Splits hidden_size dim into attn_head_size and num_heads</p>\n": "<p> Splits hidden_size dim into attn_head_size and num_heads</p>\n",
2
+
"<h1>GPT-2 with <a href=\"index.html\">LoRA modules</a></h1>\n<p>Here's <a href=\"experiment.html\">the training code</a> for training a GPT2 model with LoRA on Tiny Shakespeare dataset.</p>\n": "<h1>GPT-2 with <a href=\"index.html\">LoRA modules</a></h1>\n<p>Here's <a href=\"experiment.html\">the training code</a> for training a GPT2 model with LoRA on Tiny Shakespeare dataset.</p>\n",
"<p>Projection layer to logit space </p>\n": "<p>Projection layer to logit space </p>\n",
27
+
"<p>Reorder to <span translate=no>_^_0_^_</span> </p>\n": "<p>Reorder to <span translate=no>_^_0_^_</span> </p>\n",
9
28
"<p>Run through transformer blocks </p>\n": "<p>Run through transformer blocks </p>\n",
10
-
"<p>lin1 </p>\n": "<p>lin1 </p>\n",
11
-
"<p>lin2 </p>\n": "<p>lin2 </p>\n",
12
-
"<p>out </p>\n": "<p>out </p>\n",
13
-
"<p>qkv </p>\n": "<p>qkv </p>\n",
29
+
"<p>Split last dimension to <span translate=no>_^_0_^_</span> </p>\n": "<p>Split last dimension to <span translate=no>_^_0_^_</span> </p>\n",
30
+
"<p>The linear layers and the activation </p>\n": "<p>The linear layers and the activation </p>\n",
31
+
"<p>Token and absolute positional embeddings </p>\n": "<p>Token and absolute positional embeddings </p>\n",
32
+
"<p>Transform them from shape <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p>Transform them from shape <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n",
14
33
"<ul><li><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span></li></ul>\n",
15
-
"gpt2.py": "gpt2.py"
34
+
"<ul><li><span translate=no>_^_0_^_</span> is the embeddings tensor with shape <span translate=no>_^_1_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the embeddings tensor with shape <span translate=no>_^_1_^_</span></li></ul>\n",
35
+
"<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions </li>\n<li><span translate=no>_^_1_^_</span> is the size of the hidden dimension </li>\n<li><span translate=no>_^_2_^_</span> is the lora rank</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions </li>\n<li><span translate=no>_^_1_^_</span> is the size of the hidden dimension </li>\n<li><span translate=no>_^_2_^_</span> is the lora rank</li></ul>\n",
36
+
"<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of attention heads </li>\n<li><span translate=no>_^_2_^_</span> is the number of decoder layers </li>\n<li><span translate=no>_^_3_^_</span> is the number of positional embeddings </li>\n<li><span translate=no>_^_4_^_</span> is the layer norm epsilon </li>\n<li><span translate=no>_^_5_^_</span> is the vocabulary size </li>\n<li><span translate=no>_^_6_^_</span> is the lora rank</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of attention heads </li>\n<li><span translate=no>_^_2_^_</span> is the number of decoder layers </li>\n<li><span translate=no>_^_3_^_</span> is the number of positional embeddings </li>\n<li><span translate=no>_^_4_^_</span> is the layer norm epsilon </li>\n<li><span translate=no>_^_5_^_</span> is the vocabulary size </li>\n<li><span translate=no>_^_6_^_</span> is the lora rank</li></ul>\n",
37
+
"<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of heads </li>\n<li><span translate=no>_^_2_^_</span> is the layer norm epsilon </li>\n<li><span translate=no>_^_3_^_</span> is the lora rank</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of heads </li>\n<li><span translate=no>_^_2_^_</span> is the layer norm epsilon </li>\n<li><span translate=no>_^_3_^_</span> is the lora rank</li></ul>\n",
38
+
"<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of heads </li>\n<li><span translate=no>_^_2_^_</span> is the lora rank</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the number of dimensions in the embeddings </li>\n<li><span translate=no>_^_1_^_</span> is the number of heads </li>\n<li><span translate=no>_^_2_^_</span> is the lora rank</li></ul>\n",
39
+
"<ul><li><span translate=no>_^_0_^_</span> is the tensor with shape <span translate=no>_^_1_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span> is the tensor with shape <span translate=no>_^_1_^_</span></li></ul>\n",
40
+
"GPT-2 implementation with LoRA modules": "GPT-2 implementation with LoRA modules",
0 commit comments