From 7bc134b84d83cd9a2a824d2415dd6032f5a05bd5 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 29 Jan 2025 14:41:16 +0100
Subject: [PATCH 01/30] initial copy from smol-course

---
 chapters/en/chapter11/README.md               |   30 +
 chapters/en/chapter11/chat_templates.md       |  114 +
 .../en/chapter11/chat_templates_example.ipynb | 5741 +++++++++++++++++
 .../en/chapter11/sft_finetuning_example.ipynb |  273 +
 .../en/chapter11/supervised_fine_tuning.md    |   41 +
 5 files changed, 6199 insertions(+)
 create mode 100644 chapters/en/chapter11/README.md
 create mode 100644 chapters/en/chapter11/chat_templates.md
 create mode 100644 chapters/en/chapter11/chat_templates_example.ipynb
 create mode 100644 chapters/en/chapter11/sft_finetuning_example.ipynb
 create mode 100644 chapters/en/chapter11/supervised_fine_tuning.md
diff --git a/chapters/en/chapter11/README.md b/chapters/en/chapter11/README.md
new file mode 100644
index 000000000..a7fae79c6
--- /dev/null
+++ b/chapters/en/chapter11/README.md
@@ -0,0 +1,30 @@
+# Instruction Tuning
+
+This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. 
+
+In this module, we will explore two topics: 1) Chat Templates and 2) Supervised Fine-Tuning.
+
+## 1️⃣ Chat Templates
+
+Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the [Chat Templates](./chat_templates.md) section.
+
+## 2️⃣ Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the [Supervised Fine-Tuning](./supervised_fine_tuning.md) page.
+
+## Exercise Notebooks
+
+| Title | Description | Exercise | Link | Colab |
+|-------|-------------|----------|------|-------|
+| Chat Templates | Learn how to use chat templates with SmolLM2 and process datasets into chatml format | 🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format <br> 🐕 Convert the `openai/gsm8k` dataset into chatml format | [Notebook](./notebooks/chat_templates_example.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
+| Supervised Fine-Tuning | Learn how to fine-tune SmolLM2 using the SFTTrainer | 🐢 Use the `HuggingFaceTB/smoltalk` dataset<br>🐕 Try out the `bigcode/the-stack-smol` dataset<br>🦁 Select a dataset for a real world use case | [Notebook](./notebooks/sft_finetuning_example.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
+
+## References
+
+- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)
+- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer)
+- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290)
+- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning)
+- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)
+- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)
diff --git a/chapters/en/chapter11/chat_templates.md b/chapters/en/chapter11/chat_templates.md
new file mode 100644
index 000000000..61ff65e6f
--- /dev/null
+++ b/chapters/en/chapter11/chat_templates.md
@@ -0,0 +1,114 @@
+# Chat Templates
+
+Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns.
+
+## Base Models vs Instruct Models
+
+A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant.
+
+To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
+
+It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template.
+
+## Understanding Chat Templates
+
+At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:
+
+```sh
+<|im_start|>user
+Hi there!<|im_end|>
+<|im_start|>assistant
+Nice to meet you!<|im_end|>
+<|im_start|>user
+Can I ask a question?<|im_end|>
+<|im_start|>assistant
+```
+
+The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
+    {"role": "user", "content": "Can you explain what a chat template is?"},
+    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."}
+]
+```
+
+Let's break down the above example, and see how it maps to the chat template format.
+
+## System Messages
+
+System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example:
+
+```python
+system_message = {
+    "role": "system",
+    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
+}
+```
+
+## Conversations
+
+Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:
+
+```python
+conversation = [
+    {"role": "user", "content": "I need help with my order"},
+    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
+    {"role": "user", "content": "It's ORDER-123"},
+]
+```
+
+## Implementation with Transformers
+
+The transformers library provides built-in support for chat templates. Here's how to use them:
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
+
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant."},
+    {"role": "user", "content": "Write a Python function to sort a list"},
+]
+
+# Apply the chat template
+formatted_chat = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+```
+
+## Custom Formatting
+You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles:
+
+```python
+template = """
+<|system|>{system_message}
+<|user|>{user_message}
+<|assistant|>{assistant_message}
+""".lstrip()
+```
+
+## Multi-Turn Support
+
+Templates can handle complex multi-turn conversations while maintaining context:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a math tutor."},
+    {"role": "user", "content": "What is calculus?"},
+    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
+    {"role": "user", "content": "Can you give me an example?"},
+]
+```
+
+⏭️ [Next: Supervised Fine-Tuning](./supervised_fine_tuning.md)
+
+## Resources
+
+- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Transformers Documentation](https://huggingface.co/docs/transformers)
+- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) 
diff --git a/chapters/en/chapter11/chat_templates_example.ipynb b/chapters/en/chapter11/chat_templates_example.ipynb
new file mode 100644
index 000000000..88f60c1e4
--- /dev/null
+++ b/chapters/en/chapter11/chat_templates_example.ipynb
@@ -0,0 +1,5741 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "vZAvFVIAtFlq"
+   },
+   "source": [
+    "# Exploring Chat Templates with SmolLM2\n",
+    "\n",
+    "This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "K-lZu8JvtwUN",
+    "outputId": "c3871418-15bc-4265-ae8d-6d6036036d0e"
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "c15d320002504d95bb86e87f50d43b08",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Install the requirements in Google Colab\n",
+    "# !pip install transformers datasets trl huggingface_hub\n",
+    "\n",
+    "# Authenticate to Hugging Face\n",
+    "from huggingface_hub import login\n",
+    "\n",
+    "login()\n",
+    "\n",
+    "# for convenience you can create an environment variable containing your hub token as HF_TOKEN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "id": "tnHzBR7vtFlr"
+   },
+   "outputs": [],
+   "source": [
+    "# Import necessary libraries\n",
+    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+    "from trl import setup_chat_format\n",
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "XTVOqbuetFlr"
+   },
+   "source": [
+    "## SmolLM2 Chat Template\n",
+    "\n",
+    "Let's explore how to use a chat template with the `SmolLM2` model. We'll define a simple conversation and apply the chat template."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 397,
+     "referenced_widgets": [
+      "b922b90106414644bc0e933f28dea1bf",
+      "e0a40f83ae2e4ab29376a1d48b53aa6e",
+      "547eeb64ffd34e509c0b8b8ba6d657e2",
+      "45675fb5f5c94f8cae575582f7ae41a7",
+      "016d5e929f1240cea067372b2191d107",
+      "a026a32dd6d646bea82c1ebb06147d89",
+      "0479fd3fc1ba476ab46f8c0a98f89468",
+      "cbc312cb858b48a5a0f8dbcf60b7e684",
+      "f70401b6dba74380b19bd1ef887b3bf7",
+      "7eb91920e4384194a008902d6c4a09c7",
+      "b379da78cb34463aa5a72eedc3d176cd",
+      "ae2690497e024095adb3879643cffd33",
+      "f600aa1fe4094133888ec9a2504a60eb",
+      "efe9a9fcebfe441b80075fbfe9c32674",
+      "50dbf8861ca94b0ba1f4a7e2f0d8aead",
+      "547151540399460fb9a946bbe67afbd9",
+      "5291041c86db4933816088c047d659d8",
+      "48724ba7ba4e4f00923445245640739f",
+      "04ae3f7b640c42f3a8eb1977cd1a585d",
+      "db3bd55d779947028f36a8b24a2621b6",
+      "d17c62b889754b5d88cfced5b18ff7a7",
+      "990f706db474450ba0997d1dbcd53cb7",
+      "3b881514716c47308061fe85b810a6a4",
+      "26ed0f1bae204d74a313d101d9355e90",
+      "4ff5af1784904bc9b85515105885e2d8",
+      "b3c42d7e25d6494993029531adc3866d",
+      "6227b40396ea4024b3c8710c5e65601f",
+      "7612cc9b8908471b90c9118151d6e447",
+      "b687aca79e6e470b96254c5e309d6d63",
+      "3fa18e3b50104af796bd0887f556224a",
+      "4bfa3103048a47989a09a0d90ac6b9bf",
+      "85de66e1ee3140cf85eadebe5fea1e9f",
+      "b31de9bcf83e4070be09c7d663361232",
+      "d64d50101891491f96ff80162dc6d26c",
+      "d65ec0f0dc0b44e0869c6159e6e82ad6",
+      "76febcd912404a58add3a39f80a8218d",
+      "f4ea276bdc0d4da2a04b46e3f1ed95b5",
+      "0942430d36de4677b4c2fa771d7bcd2a",
+      "10a0f37020d44156a11e9750778892e0",
+      "58fb913274b54a60a832513c09608a2f",
+      "0bab42beb845475684e9e71dd1591e1d",
+      "89ecd1b28ab64c90afe3b9736fd48306",
+      "be4e145938054f13a510fe4d04a7a60d",
+      "648c3c820b39493daf0cce5f57a55467",
+      "01e0f8a799ad479eb95eef3e5a09bd70",
+      "8fe2df9a14a0436c9124a856ac7419e4",
+      "d108e029e743419989e30f64f0c82b90",
+      "bfd11f21f197459b8f27ef364bc9b264",
+      "76a0341ebe9f4c3face32460d7023be9",
+      "da1a999fb5af4eae9f6a9d1086cbb4cf",
+      "77f6c27c3c854138b4aa9789637141a1",
+      "6ceb292f2b8544f2a9a005d16d3e8978",
+      "41a27cf0a91246599d4d1b7dae7c7863",
+      "745fb1db425e44e5b3a23b36ae7675d1",
+      "bde95b39561145548fc81fb4cc94a1bf",
+      "3cc519fd92fe4b328943ec839115b63e",
+      "e15fc503bb73476980cedb5f06b51ced",
+      "d8c5dc8df3be4e65b2bbba020d29150f",
+      "c0177c4ad18740d88acfc603ce4735f8",
+      "eb570fd159124e2cbd2df9335b3f9cd6",
+      "5de5dab3d92f4f41838a8f302d27f0c3",
+      "471b481a3e5b4d439ab31fdc49fc99c7",
+      "7a0c705334694da6b750104b28db6dba",
+      "0c336ea5c653434da49e2f0e949f83d0",
+      "ec15d99b3a604405a2b4707931d4bf44",
+      "e7f5d507d9564941bb7db742b4bf01c7",
+      "aa2d32cb76ba47ebaa5ea391efbf58a7",
+      "7b20c7c8f6be40c6815b8531ecb9c936",
+      "e90b58981bd34d0e8f975fc1a9658c4c",
+      "5b7b09d983844f7893bdda411f9a0076",
+      "70f0eaed6ef14c2db8aecb592edeb1ad",
+      "d32017fa83aa44f6b2e3443a602654be",
+      "ff8debfb713f4b88be6b9b3bf33bfca2",
+      "ed577dea3ac54884a637ad775b42bc68",
+      "d43410dfcc8c4bebb8672f10ed2aeb66",
+      "0206fb9662a349c1aa8a6d87ce01c388",
+      "881b6196dfa0446e8c55a2420e484b6b",
+      "d54fb2da9f1f4a89ae962b8816314f43",
+      "77d3d81687e6417ab988b04984fc68f4",
+      "fbce0a69847e4099a55d1e39d4118c91",
+      "1513792bad534a0c9c381a131395c519",
+      "69f38fecf8ad403898634cfdfadf8925",
+      "17023310de9b4c3ebd8cc03758d59ef9",
+      "f3e23f781bce4429954d76bfea97aff4",
+      "530fc4c2bf1244628af7dea3e4b35cdf",
+      "96c2aae9198441569362135ad4bcbc98",
+      "76d306c21214412ab44e542d82e547aa",
+      "b9e41ef9e9c54fa7b71bc333604af74e"
+     ]
+    },
+    "id": "Nrxh0oX6tFls",
+    "outputId": "953e1527-8168-4346-9338-6e188ca31a1a"
+   },
+   "outputs": [],
+   "source": [
+    "# Dynamically set the device\n",
+    "device = (\n",
+    "    \"cuda\"\n",
+    "    if torch.cuda.is_available()\n",
+    "    else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n",
+    ")\n",
+    "\n",
+    "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    pretrained_model_name_or_path=model_name\n",
+    ").to(device)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n",
+    "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "id": "zkJwILrbtFls"
+   },
+   "outputs": [],
+   "source": [
+    "# Define messages for SmolLM2\n",
+    "messages = [\n",
+    "    {\"role\": \"user\", \"content\": \"Hello, how are you?\"},\n",
+    "    {\n",
+    "        \"role\": \"assistant\",\n",
+    "        \"content\": \"I'm doing well, thank you! How can I assist you today?\",\n",
+    "    },\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Ve4dgtjstFls"
+   },
+   "source": [
+    "# Apply chat template without tokenization\n",
+    "\n",
+    "The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "pbAg-5x-tFls",
+    "outputId": "5f9482db-1fcf-4c13-ccaa-ef3f6eff7f76"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Conversation with template: <|im_start|>user\n",
+      "Hello, how are you?<|im_end|>\n",
+      "<|im_start|>assistant\n",
+      "I'm doing well, thank you! How can I assist you today?<|im_end|>\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "input_text = tokenizer.apply_chat_template(messages, tokenize=False)\n",
+    "\n",
+    "print(\"Conversation with template:\", input_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "sfvdglOqtFls"
+   },
+   "source": [
+    "# Decode the conversation\n",
+    "\n",
+    "Note that the conversation is represented as above but with a further assistant message.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "mXUVdPeytFls",
+    "outputId": "80870e53-7bc1-426e-ac33-ba6748e030fc"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Conversation decoded: <|im_start|>user\n",
+      "Hello, how are you?<|im_end|>\n",
+      "<|im_start|>assistant\n",
+      "I'm doing well, thank you! How can I assist you today?<|im_end|>\n",
+      "<|im_start|>assistant\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "input_text = tokenizer.apply_chat_template(\n",
+    "    messages, tokenize=True, add_generation_prompt=True\n",
+    ")\n",
+    "\n",
+    "print(\"Conversation decoded:\", tokenizer.decode(token_ids=input_text))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "UcZQpspEtFlt"
+   },
+   "source": [
+    "# Tokenize the conversation\n",
+    "\n",
+    "Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "jc2PLxAMtFlt",
+    "outputId": "d2098780-b3f4-41ec-a1f3-b6da2b593c62"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]\n"
+     ]
+    }
+   ],
+   "source": [
+    "input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)\n",
+    "\n",
+    "print(\"Conversation tokenized:\", input_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "m3eNp9a0tFlt"
+   },
+   "source": [
+    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
+    "    <h2 style='margin: 0;color:blue'>Exercise: Process a dataset for SFT</h2>\n",
+    "    <p>Take a dataset from the Hugging Face hub and process it for SFT. </p>\n",
+    "    <p><b>Difficulty Levels</b></p>\n",
+    "    <p>🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.</p>\n",
+    "    <p>🐕 Convert the `openai/gsm8k` dataset into chatml format.</p>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 381
+    },
+    "id": "qbkXV2_ItFlt",
+    "outputId": "06deadc3-2c63-4660-d2bd-05096ef07c9f"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<iframe\n",
+       "  src=\"https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0\"\n",
+       "  frameborder=\"0\"\n",
+       "  width=\"100%\"\n",
+       "  height=\"360px\"\n",
+       "></iframe>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from IPython.core.display import display, HTML\n",
+    "\n",
+    "display(\n",
+    "    HTML(\n",
+    "        \"\"\"<iframe\n",
+    "  src=\"https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0\"\n",
+    "  frameborder=\"0\"\n",
+    "  width=\"100%\"\n",
+    "  height=\"360px\"\n",
+    "></iframe>\n",
+    "\"\"\"\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 241,
+     "referenced_widgets": [
+      "c2d74a42fb574b8892d0a288fd92f0a6",
+      "056b9ef5706843b19cd62fce75743afb",
+      "17b4d81e40564a53bb79be9fbef4918e",
+      "951f60cddcb84dfdbbdf2058369f0541",
+      "646484cf7a36444daebe1dfe4a0e4150",
+      "e2f0c39ce1c046e8acb150dfbfaf5aa8",
+      "7eb12d70d2b542a7b651c7680f590279",
+      "ea1f9cb22abf4e7d9f6e76fc86c03387",
+      "00c9f5ca71b84df4b26acee72c97fefb",
+      "505f96bc0c7843bcb1498ba1c1ba5f06",
+      "635cc2881a1e4b8788bb26c356740e04",
+      "a6ee323c13904525a99c6f092ba96e18",
+      "67fffe7d6f8c4963972b408529e05532",
+      "0055b6b628934affaf88bc58a1572bb6",
+      "aafbbb9fc5164fa3a88193bfd33d2f79",
+      "606e39d53ed64967a60337418c71c595",
+      "26b15fa18b1b4963a1ba76a76675e7ee",
+      "db09ab1f79db4f3a8de77f0348eca0f7",
+      "de04f344a8d4428e8ba1836a563d8aa1",
+      "03c09673186046d799d6f487d6623e6b",
+      "1cc682330b24431b8812c73041e987d0",
+      "dafe748a452148038779f6a62a22a4ec",
+      "addad1c100024c44a0959978153da9a8",
+      "9bea2a23db644ad19b708d10e35d54ee",
+      "d1174b127571420593971166fbb1966b",
+      "add90ed3746d4293a1b71198137a892c",
+      "8def25e6389f4e6192b517b6e80aa05e",
+      "c9747e7a810f413ba1ea108307e3ad1d",
+      "d0ea49d1d90f4d34bf2ae70efa96946e",
+      "59d0997b85614384bbfebeee928340b6",
+      "269920491c134501873e0110367bc984",
+      "384d26051c04460e8870a3ffe9406c48",
+      "8e8a0e89a50646c897e546c4077db79e",
+      "ff60308921f9432683acbcd6d29fb78f",
+      "3bc8f6339f4e4a3b961d810255c5573e",
+      "4780ad263ec04b1a97525d985e102049",
+      "488feef55878426bbf1c753c6d58735b",
+      "560ba45d70ca431dadeb327d234c330a",
+      "04d0a6f74af346f7bc696951949063c8",
+      "2a18ce941b0f4cef8307988ef898b47f",
+      "194e3fda3635466b998f96e3dc22746a",
+      "e2ab3cb38b5a41f68d18ed5f0e6ae22c",
+      "f0b271bcac6c43a9aaddac54259bb514",
+      "0dc93d50a283472f9ca64fd0a4c6ff15",
+      "dd1a50d4497144388a1809b78bb38f58",
+      "6b72a856e5bd4812a5e0dd0c3bfb8455",
+      "4e21a567d1f6461985727823b37166e1",
+      "ec1efb7598fd496bb170673ae1b8a1df",
+      "84f393468aa74baa903243d238b2d387",
+      "a54ce365be104d27aaa15cf8c63b5ebe",
+      "1791220377d141ac9b307246177d0712",
+      "fa330d4f0fb241aebd065f6ef4a6892c",
+      "cfa1cc6eed8a4f7791a7959308456b6b",
+      "b50c9c4433854cf7a6b2593e946b7faa",
+      "7557cd24ba9b4aa3955866d59db94519",
+      "cc608dfb880c49d4bc5acf2d691b8ec6",
+      "cb838c5bed994a9a8e6fcf5c98b76d17",
+      "76bbe8c2beba4c0594085d32a68d2ee7",
+      "c9836c952b07472880649b82e2347e8d",
+      "383db57f997140d482b82b123080837a",
+      "182abc7ec4d944d9bb2ec1281c98b4c8",
+      "6934c6d1cbac44dbb08f3fffe3056edb",
+      "05fa0f6eb78b4c56b219b0e57521bd2e",
+      "012aa94e3cf24e32833c6bbca23c52f7",
+      "76c1a1cdc9054bbe90d0d3b662cf0ed1",
+      "e453f1672772400a851735ba64f42c8b",
+      "d1358f6b16644cb3a2328ca639a4a77a",
+      "c19f60d4028045399c62004027eaafd9",
+      "8055588a1fa940239c801ef66f3ecf3b",
+      "7468a9bc8bda44e5b44574c64fdc6803",
+      "a13a8f8b702e44ed88c7d358a0a8b4b4",
+      "13367fbb763747fa8de94cde40ffae32",
+      "b1fcf477db664ccdade4096fb79de327",
+      "9d1c06ac6b774d82adca58773f389161",
+      "31910159cf30463b8246ec47ffd8ab5b",
+      "72220420f9d340eabec13a01caebc92c",
+      "55b14c03a41c495aacf8ac2d0f96ba0b"
+     ]
+    },
+    "id": "4p3atw4_tFlu",
+    "outputId": "62ee9812-3819-4a9c-9e24-5687368ffcd8"
+   },
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "ds = load_dataset(\"HuggingFaceTB/smoltalk\", \"everyday-conversations\")\n",
+    "\n",
+    "\n",
+    "def process_dataset(sample):\n",
+    "    # TODO: 🐢 Convert the sample into a chat format\n",
+    "    # use the tokenizer's method to apply the chat template\n",
+    "    return sample\n",
+    "\n",
+    "\n",
+    "ds = ds.map(process_dataset)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 381
+    },
+    "id": "81fQeazltFlu",
+    "outputId": "36cf7148-9881-4f13-d0ce-76c82c4ab219"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<iframe\n",
+       "  src=\"https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train\"\n",
+       "  frameborder=\"0\"\n",
+       "  width=\"100%\"\n",
+       "  height=\"360px\"\n",
+       "></iframe>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "display(\n",
+    "    HTML(\n",
+    "        \"\"\"<iframe\n",
+    "  src=\"https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train\"\n",
+    "  frameborder=\"0\"\n",
+    "  width=\"100%\"\n",
+    "  height=\"360px\"\n",
+    "></iframe>\n",
+    "\"\"\"\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "id": "bWUSv7NMtFlu"
+   },
+   "outputs": [],
+   "source": [
+    "ds = load_dataset(\"openai/gsm8k\", \"main\")\n",
+    "\n",
+    "\n",
+    "def process_dataset(sample):\n",
+    "    # TODO: 🐕 Convert the sample into a chat format\n",
+    "\n",
+    "    # 1. create a message format with the role and content\n",
+    "\n",
+    "    # 2. apply the chat template to the samples using the tokenizer's method\n",
+    "\n",
+    "    return sample\n",
+    "\n",
+    "\n",
+    "ds = ds.map(process_dataset)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "qlXCuRKotFlu"
+   },
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.\n",
+    "\n",
+    "In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood."
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "0055b6b628934affaf88bc58a1572bb6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_de04f344a8d4428e8ba1836a563d8aa1",
+      "max": 946449,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_03c09673186046d799d6f487d6623e6b",
+      "value": 946449
+     }
+    },
+    "00c9f5ca71b84df4b26acee72c97fefb": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "012aa94e3cf24e32833c6bbca23c52f7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "016d5e929f1240cea067372b2191d107": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "01e0f8a799ad479eb95eef3e5a09bd70": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_8fe2df9a14a0436c9124a856ac7419e4",
+       "IPY_MODEL_d108e029e743419989e30f64f0c82b90",
+       "IPY_MODEL_bfd11f21f197459b8f27ef364bc9b264"
+      ],
+      "layout": "IPY_MODEL_76a0341ebe9f4c3face32460d7023be9"
+     }
+    },
+    "0206fb9662a349c1aa8a6d87ce01c388": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "03c09673186046d799d6f487d6623e6b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "0479fd3fc1ba476ab46f8c0a98f89468": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "04ae3f7b640c42f3a8eb1977cd1a585d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "04d0a6f74af346f7bc696951949063c8": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "056b9ef5706843b19cd62fce75743afb": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_e2f0c39ce1c046e8acb150dfbfaf5aa8",
+      "placeholder": "​",
+      "style": "IPY_MODEL_7eb12d70d2b542a7b651c7680f590279",
+      "value": "README.md: 100%"
+     }
+    },
+    "05fa0f6eb78b4c56b219b0e57521bd2e": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "0942430d36de4677b4c2fa771d7bcd2a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "0bab42beb845475684e9e71dd1591e1d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "0c336ea5c653434da49e2f0e949f83d0": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "0dc93d50a283472f9ca64fd0a4c6ff15": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "10a0f37020d44156a11e9750778892e0": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "13367fbb763747fa8de94cde40ffae32": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "1513792bad534a0c9c381a131395c519": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_76d306c21214412ab44e542d82e547aa",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b9e41ef9e9c54fa7b71bc333604af74e",
+      "value": " 831/831 [00:00&lt;00:00, 42.7kB/s]"
+     }
+    },
+    "17023310de9b4c3ebd8cc03758d59ef9": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "1791220377d141ac9b307246177d0712": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "17b4d81e40564a53bb79be9fbef4918e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_ea1f9cb22abf4e7d9f6e76fc86c03387",
+      "max": 9251,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_00c9f5ca71b84df4b26acee72c97fefb",
+      "value": 9251
+     }
+    },
+    "182abc7ec4d944d9bb2ec1281c98b4c8": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "194e3fda3635466b998f96e3dc22746a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "1cc682330b24431b8812c73041e987d0": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "269920491c134501873e0110367bc984": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "26b15fa18b1b4963a1ba76a76675e7ee": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "26ed0f1bae204d74a313d101d9355e90": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7612cc9b8908471b90c9118151d6e447",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b687aca79e6e470b96254c5e309d6d63",
+      "value": "generation_config.json: 100%"
+     }
+    },
+    "2a18ce941b0f4cef8307988ef898b47f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "31910159cf30463b8246ec47ffd8ab5b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "383db57f997140d482b82b123080837a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "384d26051c04460e8870a3ffe9406c48": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "3b881514716c47308061fe85b810a6a4": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_26ed0f1bae204d74a313d101d9355e90",
+       "IPY_MODEL_4ff5af1784904bc9b85515105885e2d8",
+       "IPY_MODEL_b3c42d7e25d6494993029531adc3866d"
+      ],
+      "layout": "IPY_MODEL_6227b40396ea4024b3c8710c5e65601f"
+     }
+    },
+    "3bc8f6339f4e4a3b961d810255c5573e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_04d0a6f74af346f7bc696951949063c8",
+      "placeholder": "​",
+      "style": "IPY_MODEL_2a18ce941b0f4cef8307988ef898b47f",
+      "value": "Generating train split: 100%"
+     }
+    },
+    "3cc519fd92fe4b328943ec839115b63e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_e15fc503bb73476980cedb5f06b51ced",
+       "IPY_MODEL_d8c5dc8df3be4e65b2bbba020d29150f",
+       "IPY_MODEL_c0177c4ad18740d88acfc603ce4735f8"
+      ],
+      "layout": "IPY_MODEL_eb570fd159124e2cbd2df9335b3f9cd6"
+     }
+    },
+    "3fa18e3b50104af796bd0887f556224a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "41a27cf0a91246599d4d1b7dae7c7863": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "45675fb5f5c94f8cae575582f7ae41a7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7eb91920e4384194a008902d6c4a09c7",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b379da78cb34463aa5a72eedc3d176cd",
+      "value": " 704/704 [00:00&lt;00:00, 36.5kB/s]"
+     }
+    },
+    "471b481a3e5b4d439ab31fdc49fc99c7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "4780ad263ec04b1a97525d985e102049": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_194e3fda3635466b998f96e3dc22746a",
+      "max": 2260,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_e2ab3cb38b5a41f68d18ed5f0e6ae22c",
+      "value": 2260
+     }
+    },
+    "48724ba7ba4e4f00923445245640739f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "488feef55878426bbf1c753c6d58735b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_f0b271bcac6c43a9aaddac54259bb514",
+      "placeholder": "​",
+      "style": "IPY_MODEL_0dc93d50a283472f9ca64fd0a4c6ff15",
+      "value": " 2260/2260 [00:00&lt;00:00, 21556.99 examples/s]"
+     }
+    },
+    "4bfa3103048a47989a09a0d90ac6b9bf": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "4e21a567d1f6461985727823b37166e1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_fa330d4f0fb241aebd065f6ef4a6892c",
+      "max": 119,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_cfa1cc6eed8a4f7791a7959308456b6b",
+      "value": 119
+     }
+    },
+    "4ff5af1784904bc9b85515105885e2d8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_3fa18e3b50104af796bd0887f556224a",
+      "max": 111,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_4bfa3103048a47989a09a0d90ac6b9bf",
+      "value": 111
+     }
+    },
+    "505f96bc0c7843bcb1498ba1c1ba5f06": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "50dbf8861ca94b0ba1f4a7e2f0d8aead": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d17c62b889754b5d88cfced5b18ff7a7",
+      "placeholder": "​",
+      "style": "IPY_MODEL_990f706db474450ba0997d1dbcd53cb7",
+      "value": " 269M/269M [00:06&lt;00:00, 43.2MB/s]"
+     }
+    },
+    "5291041c86db4933816088c047d659d8": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "530fc4c2bf1244628af7dea3e4b35cdf": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "547151540399460fb9a946bbe67afbd9": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "547eeb64ffd34e509c0b8b8ba6d657e2": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_cbc312cb858b48a5a0f8dbcf60b7e684",
+      "max": 704,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_f70401b6dba74380b19bd1ef887b3bf7",
+      "value": 704
+     }
+    },
+    "55b14c03a41c495aacf8ac2d0f96ba0b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "560ba45d70ca431dadeb327d234c330a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "58fb913274b54a60a832513c09608a2f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "59d0997b85614384bbfebeee928340b6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "5b7b09d983844f7893bdda411f9a0076": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_0206fb9662a349c1aa8a6d87ce01c388",
+      "placeholder": "​",
+      "style": "IPY_MODEL_881b6196dfa0446e8c55a2420e484b6b",
+      "value": " 2.10M/2.10M [00:00&lt;00:00, 20.7MB/s]"
+     }
+    },
+    "5de5dab3d92f4f41838a8f302d27f0c3": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "606e39d53ed64967a60337418c71c595": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "6227b40396ea4024b3c8710c5e65601f": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "635cc2881a1e4b8788bb26c356740e04": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "646484cf7a36444daebe1dfe4a0e4150": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "648c3c820b39493daf0cce5f57a55467": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "67fffe7d6f8c4963972b408529e05532": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_26b15fa18b1b4963a1ba76a76675e7ee",
+      "placeholder": "​",
+      "style": "IPY_MODEL_db09ab1f79db4f3a8de77f0348eca0f7",
+      "value": "train-00000-of-00001.parquet: 100%"
+     }
+    },
+    "6934c6d1cbac44dbb08f3fffe3056edb": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "69f38fecf8ad403898634cfdfadf8925": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "6b72a856e5bd4812a5e0dd0c3bfb8455": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a54ce365be104d27aaa15cf8c63b5ebe",
+      "placeholder": "​",
+      "style": "IPY_MODEL_1791220377d141ac9b307246177d0712",
+      "value": "Generating test split: 100%"
+     }
+    },
+    "6ceb292f2b8544f2a9a005d16d3e8978": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "70f0eaed6ef14c2db8aecb592edeb1ad": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "72220420f9d340eabec13a01caebc92c": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "745fb1db425e44e5b3a23b36ae7675d1": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "7468a9bc8bda44e5b44574c64fdc6803": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_72220420f9d340eabec13a01caebc92c",
+      "placeholder": "​",
+      "style": "IPY_MODEL_55b14c03a41c495aacf8ac2d0f96ba0b",
+      "value": " 119/119 [00:00&lt;00:00, 2302.28 examples/s]"
+     }
+    },
+    "7557cd24ba9b4aa3955866d59db94519": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "7612cc9b8908471b90c9118151d6e447": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "76a0341ebe9f4c3face32460d7023be9": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "76bbe8c2beba4c0594085d32a68d2ee7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_05fa0f6eb78b4c56b219b0e57521bd2e",
+      "max": 2260,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_012aa94e3cf24e32833c6bbca23c52f7",
+      "value": 2260
+     }
+    },
+    "76c1a1cdc9054bbe90d0d3b662cf0ed1": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "76d306c21214412ab44e542d82e547aa": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "76febcd912404a58add3a39f80a8218d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_0bab42beb845475684e9e71dd1591e1d",
+      "max": 3658,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_89ecd1b28ab64c90afe3b9736fd48306",
+      "value": 3658
+     }
+    },
+    "77d3d81687e6417ab988b04984fc68f4": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_17023310de9b4c3ebd8cc03758d59ef9",
+      "placeholder": "​",
+      "style": "IPY_MODEL_f3e23f781bce4429954d76bfea97aff4",
+      "value": "special_tokens_map.json: 100%"
+     }
+    },
+    "77f6c27c3c854138b4aa9789637141a1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "7a0c705334694da6b750104b28db6dba": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "7b20c7c8f6be40c6815b8531ecb9c936": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d32017fa83aa44f6b2e3443a602654be",
+      "placeholder": "​",
+      "style": "IPY_MODEL_ff8debfb713f4b88be6b9b3bf33bfca2",
+      "value": "tokenizer.json: 100%"
+     }
+    },
+    "7eb12d70d2b542a7b651c7680f590279": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "7eb91920e4384194a008902d6c4a09c7": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "8055588a1fa940239c801ef66f3ecf3b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_9d1c06ac6b774d82adca58773f389161",
+      "max": 119,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_31910159cf30463b8246ec47ffd8ab5b",
+      "value": 119
+     }
+    },
+    "84f393468aa74baa903243d238b2d387": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "85de66e1ee3140cf85eadebe5fea1e9f": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "881b6196dfa0446e8c55a2420e484b6b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "89ecd1b28ab64c90afe3b9736fd48306": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "8def25e6389f4e6192b517b6e80aa05e": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "8e8a0e89a50646c897e546c4077db79e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "8fe2df9a14a0436c9124a856ac7419e4": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_da1a999fb5af4eae9f6a9d1086cbb4cf",
+      "placeholder": "​",
+      "style": "IPY_MODEL_77f6c27c3c854138b4aa9789637141a1",
+      "value": "vocab.json: 100%"
+     }
+    },
+    "951f60cddcb84dfdbbdf2058369f0541": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_505f96bc0c7843bcb1498ba1c1ba5f06",
+      "placeholder": "​",
+      "style": "IPY_MODEL_635cc2881a1e4b8788bb26c356740e04",
+      "value": " 9.25k/9.25k [00:00&lt;00:00, 428kB/s]"
+     }
+    },
+    "96c2aae9198441569362135ad4bcbc98": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "990f706db474450ba0997d1dbcd53cb7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "9bea2a23db644ad19b708d10e35d54ee": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_c9747e7a810f413ba1ea108307e3ad1d",
+      "placeholder": "​",
+      "style": "IPY_MODEL_d0ea49d1d90f4d34bf2ae70efa96946e",
+      "value": "test-00000-of-00001.parquet: 100%"
+     }
+    },
+    "9d1c06ac6b774d82adca58773f389161": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a026a32dd6d646bea82c1ebb06147d89": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a13a8f8b702e44ed88c7d358a0a8b4b4": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a54ce365be104d27aaa15cf8c63b5ebe": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a6ee323c13904525a99c6f092ba96e18": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_67fffe7d6f8c4963972b408529e05532",
+       "IPY_MODEL_0055b6b628934affaf88bc58a1572bb6",
+       "IPY_MODEL_aafbbb9fc5164fa3a88193bfd33d2f79"
+      ],
+      "layout": "IPY_MODEL_606e39d53ed64967a60337418c71c595"
+     }
+    },
+    "aa2d32cb76ba47ebaa5ea391efbf58a7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_7b20c7c8f6be40c6815b8531ecb9c936",
+       "IPY_MODEL_e90b58981bd34d0e8f975fc1a9658c4c",
+       "IPY_MODEL_5b7b09d983844f7893bdda411f9a0076"
+      ],
+      "layout": "IPY_MODEL_70f0eaed6ef14c2db8aecb592edeb1ad"
+     }
+    },
+    "aafbbb9fc5164fa3a88193bfd33d2f79": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_1cc682330b24431b8812c73041e987d0",
+      "placeholder": "​",
+      "style": "IPY_MODEL_dafe748a452148038779f6a62a22a4ec",
+      "value": " 946k/946k [00:00&lt;00:00, 28.7MB/s]"
+     }
+    },
+    "add90ed3746d4293a1b71198137a892c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_384d26051c04460e8870a3ffe9406c48",
+      "placeholder": "​",
+      "style": "IPY_MODEL_8e8a0e89a50646c897e546c4077db79e",
+      "value": " 52.6k/52.6k [00:00&lt;00:00, 2.34MB/s]"
+     }
+    },
+    "addad1c100024c44a0959978153da9a8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_9bea2a23db644ad19b708d10e35d54ee",
+       "IPY_MODEL_d1174b127571420593971166fbb1966b",
+       "IPY_MODEL_add90ed3746d4293a1b71198137a892c"
+      ],
+      "layout": "IPY_MODEL_8def25e6389f4e6192b517b6e80aa05e"
+     }
+    },
+    "ae2690497e024095adb3879643cffd33": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_f600aa1fe4094133888ec9a2504a60eb",
+       "IPY_MODEL_efe9a9fcebfe441b80075fbfe9c32674",
+       "IPY_MODEL_50dbf8861ca94b0ba1f4a7e2f0d8aead"
+      ],
+      "layout": "IPY_MODEL_547151540399460fb9a946bbe67afbd9"
+     }
+    },
+    "b1fcf477db664ccdade4096fb79de327": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "b31de9bcf83e4070be09c7d663361232": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "b379da78cb34463aa5a72eedc3d176cd": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "b3c42d7e25d6494993029531adc3866d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_85de66e1ee3140cf85eadebe5fea1e9f",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b31de9bcf83e4070be09c7d663361232",
+      "value": " 111/111 [00:00&lt;00:00, 3.57kB/s]"
+     }
+    },
+    "b50c9c4433854cf7a6b2593e946b7faa": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "b687aca79e6e470b96254c5e309d6d63": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "b922b90106414644bc0e933f28dea1bf": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_e0a40f83ae2e4ab29376a1d48b53aa6e",
+       "IPY_MODEL_547eeb64ffd34e509c0b8b8ba6d657e2",
+       "IPY_MODEL_45675fb5f5c94f8cae575582f7ae41a7"
+      ],
+      "layout": "IPY_MODEL_016d5e929f1240cea067372b2191d107"
+     }
+    },
+    "b9e41ef9e9c54fa7b71bc333604af74e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "bde95b39561145548fc81fb4cc94a1bf": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "be4e145938054f13a510fe4d04a7a60d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "bfd11f21f197459b8f27ef364bc9b264": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_745fb1db425e44e5b3a23b36ae7675d1",
+      "placeholder": "​",
+      "style": "IPY_MODEL_bde95b39561145548fc81fb4cc94a1bf",
+      "value": " 801k/801k [00:00&lt;00:00, 5.92MB/s]"
+     }
+    },
+    "c0177c4ad18740d88acfc603ce4735f8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_ec15d99b3a604405a2b4707931d4bf44",
+      "placeholder": "​",
+      "style": "IPY_MODEL_e7f5d507d9564941bb7db742b4bf01c7",
+      "value": " 466k/466k [00:00&lt;00:00, 3.56MB/s]"
+     }
+    },
+    "c19f60d4028045399c62004027eaafd9": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_13367fbb763747fa8de94cde40ffae32",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b1fcf477db664ccdade4096fb79de327",
+      "value": "Map: 100%"
+     }
+    },
+    "c2d74a42fb574b8892d0a288fd92f0a6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_056b9ef5706843b19cd62fce75743afb",
+       "IPY_MODEL_17b4d81e40564a53bb79be9fbef4918e",
+       "IPY_MODEL_951f60cddcb84dfdbbdf2058369f0541"
+      ],
+      "layout": "IPY_MODEL_646484cf7a36444daebe1dfe4a0e4150"
+     }
+    },
+    "c9747e7a810f413ba1ea108307e3ad1d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "c9836c952b07472880649b82e2347e8d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_76c1a1cdc9054bbe90d0d3b662cf0ed1",
+      "placeholder": "​",
+      "style": "IPY_MODEL_e453f1672772400a851735ba64f42c8b",
+      "value": " 2260/2260 [00:00&lt;00:00, 10845.53 examples/s]"
+     }
+    },
+    "cb838c5bed994a9a8e6fcf5c98b76d17": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_182abc7ec4d944d9bb2ec1281c98b4c8",
+      "placeholder": "​",
+      "style": "IPY_MODEL_6934c6d1cbac44dbb08f3fffe3056edb",
+      "value": "Map: 100%"
+     }
+    },
+    "cbc312cb858b48a5a0f8dbcf60b7e684": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "cc608dfb880c49d4bc5acf2d691b8ec6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_cb838c5bed994a9a8e6fcf5c98b76d17",
+       "IPY_MODEL_76bbe8c2beba4c0594085d32a68d2ee7",
+       "IPY_MODEL_c9836c952b07472880649b82e2347e8d"
+      ],
+      "layout": "IPY_MODEL_383db57f997140d482b82b123080837a"
+     }
+    },
+    "cfa1cc6eed8a4f7791a7959308456b6b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "d0ea49d1d90f4d34bf2ae70efa96946e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "d108e029e743419989e30f64f0c82b90": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_6ceb292f2b8544f2a9a005d16d3e8978",
+      "max": 800662,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_41a27cf0a91246599d4d1b7dae7c7863",
+      "value": 800662
+     }
+    },
+    "d1174b127571420593971166fbb1966b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_59d0997b85614384bbfebeee928340b6",
+      "max": 52603,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_269920491c134501873e0110367bc984",
+      "value": 52603
+     }
+    },
+    "d1358f6b16644cb3a2328ca639a4a77a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_c19f60d4028045399c62004027eaafd9",
+       "IPY_MODEL_8055588a1fa940239c801ef66f3ecf3b",
+       "IPY_MODEL_7468a9bc8bda44e5b44574c64fdc6803"
+      ],
+      "layout": "IPY_MODEL_a13a8f8b702e44ed88c7d358a0a8b4b4"
+     }
+    },
+    "d17c62b889754b5d88cfced5b18ff7a7": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d32017fa83aa44f6b2e3443a602654be": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d43410dfcc8c4bebb8672f10ed2aeb66": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "d54fb2da9f1f4a89ae962b8816314f43": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_77d3d81687e6417ab988b04984fc68f4",
+       "IPY_MODEL_fbce0a69847e4099a55d1e39d4118c91",
+       "IPY_MODEL_1513792bad534a0c9c381a131395c519"
+      ],
+      "layout": "IPY_MODEL_69f38fecf8ad403898634cfdfadf8925"
+     }
+    },
+    "d64d50101891491f96ff80162dc6d26c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_d65ec0f0dc0b44e0869c6159e6e82ad6",
+       "IPY_MODEL_76febcd912404a58add3a39f80a8218d",
+       "IPY_MODEL_f4ea276bdc0d4da2a04b46e3f1ed95b5"
+      ],
+      "layout": "IPY_MODEL_0942430d36de4677b4c2fa771d7bcd2a"
+     }
+    },
+    "d65ec0f0dc0b44e0869c6159e6e82ad6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_10a0f37020d44156a11e9750778892e0",
+      "placeholder": "​",
+      "style": "IPY_MODEL_58fb913274b54a60a832513c09608a2f",
+      "value": "tokenizer_config.json: 100%"
+     }
+    },
+    "d8c5dc8df3be4e65b2bbba020d29150f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7a0c705334694da6b750104b28db6dba",
+      "max": 466391,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_0c336ea5c653434da49e2f0e949f83d0",
+      "value": 466391
+     }
+    },
+    "da1a999fb5af4eae9f6a9d1086cbb4cf": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "dafe748a452148038779f6a62a22a4ec": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "db09ab1f79db4f3a8de77f0348eca0f7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "db3bd55d779947028f36a8b24a2621b6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "dd1a50d4497144388a1809b78bb38f58": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_6b72a856e5bd4812a5e0dd0c3bfb8455",
+       "IPY_MODEL_4e21a567d1f6461985727823b37166e1",
+       "IPY_MODEL_ec1efb7598fd496bb170673ae1b8a1df"
+      ],
+      "layout": "IPY_MODEL_84f393468aa74baa903243d238b2d387"
+     }
+    },
+    "de04f344a8d4428e8ba1836a563d8aa1": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "e0a40f83ae2e4ab29376a1d48b53aa6e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a026a32dd6d646bea82c1ebb06147d89",
+      "placeholder": "​",
+      "style": "IPY_MODEL_0479fd3fc1ba476ab46f8c0a98f89468",
+      "value": "config.json: 100%"
+     }
+    },
+    "e15fc503bb73476980cedb5f06b51ced": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_5de5dab3d92f4f41838a8f302d27f0c3",
+      "placeholder": "​",
+      "style": "IPY_MODEL_471b481a3e5b4d439ab31fdc49fc99c7",
+      "value": "merges.txt: 100%"
+     }
+    },
+    "e2ab3cb38b5a41f68d18ed5f0e6ae22c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "e2f0c39ce1c046e8acb150dfbfaf5aa8": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "e453f1672772400a851735ba64f42c8b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "e7f5d507d9564941bb7db742b4bf01c7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "e90b58981bd34d0e8f975fc1a9658c4c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_ed577dea3ac54884a637ad775b42bc68",
+      "max": 2104556,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_d43410dfcc8c4bebb8672f10ed2aeb66",
+      "value": 2104556
+     }
+    },
+    "ea1f9cb22abf4e7d9f6e76fc86c03387": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "eb570fd159124e2cbd2df9335b3f9cd6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "ec15d99b3a604405a2b4707931d4bf44": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "ec1efb7598fd496bb170673ae1b8a1df": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_b50c9c4433854cf7a6b2593e946b7faa",
+      "placeholder": "​",
+      "style": "IPY_MODEL_7557cd24ba9b4aa3955866d59db94519",
+      "value": " 119/119 [00:00&lt;00:00, 3547.77 examples/s]"
+     }
+    },
+    "ed577dea3ac54884a637ad775b42bc68": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "efe9a9fcebfe441b80075fbfe9c32674": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_04ae3f7b640c42f3a8eb1977cd1a585d",
+      "max": 269060552,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_db3bd55d779947028f36a8b24a2621b6",
+      "value": 269060552
+     }
+    },
+    "f0b271bcac6c43a9aaddac54259bb514": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "f3e23f781bce4429954d76bfea97aff4": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "f4ea276bdc0d4da2a04b46e3f1ed95b5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_be4e145938054f13a510fe4d04a7a60d",
+      "placeholder": "​",
+      "style": "IPY_MODEL_648c3c820b39493daf0cce5f57a55467",
+      "value": " 3.66k/3.66k [00:00&lt;00:00, 197kB/s]"
+     }
+    },
+    "f600aa1fe4094133888ec9a2504a60eb": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_5291041c86db4933816088c047d659d8",
+      "placeholder": "​",
+      "style": "IPY_MODEL_48724ba7ba4e4f00923445245640739f",
+      "value": "model.safetensors: 100%"
+     }
+    },
+    "f70401b6dba74380b19bd1ef887b3bf7": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "fa330d4f0fb241aebd065f6ef4a6892c": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "fbce0a69847e4099a55d1e39d4118c91": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_530fc4c2bf1244628af7dea3e4b35cdf",
+      "max": 831,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_96c2aae9198441569362135ad4bcbc98",
+      "value": 831
+     }
+    },
+    "ff60308921f9432683acbcd6d29fb78f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_3bc8f6339f4e4a3b961d810255c5573e",
+       "IPY_MODEL_4780ad263ec04b1a97525d985e102049",
+       "IPY_MODEL_488feef55878426bbf1c753c6d58735b"
+      ],
+      "layout": "IPY_MODEL_560ba45d70ca431dadeb327d234c330a"
+     }
+    },
+    "ff8debfb713f4b88be6b9b3bf33bfca2": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/chapters/en/chapter11/sft_finetuning_example.ipynb b/chapters/en/chapter11/sft_finetuning_example.ipynb
new file mode 100644
index 000000000..d18479a91
--- /dev/null
+++ b/chapters/en/chapter11/sft_finetuning_example.ipynb
@@ -0,0 +1,273 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Supervised Fine-Tuning with SFTTrainer\n",
+    "\n",
+    "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n",
+    "\n",
+    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
+    "    <h2 style='margin: 0;color:blue'>Exercise: Fine-Tuning SmolLM2 with SFTTrainer</h2>\n",
+    "    <p>Take a dataset from the Hugging Face hub and finetune a model on it. </p> \n",
+    "    <p><b>Difficulty Levels</b></p>\n",
+    "    <p>🐢 Use the `HuggingFaceTB/smoltalk` dataset</p>\n",
+    "    <p>🐕 Try out the `bigcode/the-stack-smol` dataset and finetune a code generation model on a specific subset `data/python`.</p>\n",
+    "    <p>🦁 Select a dataset that relates to a real world use case your interested in</p>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install the requirements in Google Colab\n",
+    "# !pip install transformers datasets trl huggingface_hub\n",
+    "\n",
+    "# Authenticate to Hugging Face\n",
+    "\n",
+    "from huggingface_hub import login\n",
+    "login()\n",
+    "\n",
+    "# for convenience you can create an environment variable containing your hub token as HF_TOKEN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import necessary libraries\n",
+    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+    "from datasets import load_dataset\n",
+    "from trl import SFTConfig, SFTTrainer, setup_chat_format\n",
+    "import torch\n",
+    "\n",
+    "device = (\n",
+    "    \"cuda\"\n",
+    "    if torch.cuda.is_available()\n",
+    "    else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n",
+    ")\n",
+    "\n",
+    "# Load the model and tokenizer\n",
+    "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    pretrained_model_name_or_path=model_name\n",
+    ").to(device)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n",
+    "\n",
+    "# Set up the chat format\n",
+    "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)\n",
+    "\n",
+    "# Set our name for the finetune to be saved &/ uploaded to\n",
+    "finetune_name = \"SmolLM2-FT-MyDataset\"\n",
+    "finetune_tags = [\"smol-course\", \"module_1\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Generate with the base model\n",
+    "\n",
+    "Here we will try out the base model which does not have a chat template. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's test the base model before training\n",
+    "prompt = \"Write a haiku about programming\"\n",
+    "\n",
+    "# Format with template\n",
+    "messages = [{\"role\": \"user\", \"content\": prompt}]\n",
+    "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n",
+    "\n",
+    "# Generate response\n",
+    "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n",
+    "outputs = model.generate(**inputs, max_new_tokens=100)\n",
+    "print(\"Before training:\")\n",
+    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Dataset Preparation\n",
+    "\n",
+    "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n",
+    "\n",
+    "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load a sample dataset\n",
+    "from datasets import load_dataset\n",
+    "\n",
+    "# TODO: define your dataset and config using the path and name parameters\n",
+    "ds = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. Refer to the [module](../chat_templates.md)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Configuring the SFTTrainer\n",
+    "\n",
+    "The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Configure the SFTTrainer\n",
+    "sft_config = SFTConfig(\n",
+    "    output_dir=\"./sft_output\",\n",
+    "    max_steps=1000,  # Adjust based on dataset size and desired training duration\n",
+    "    per_device_train_batch_size=4,  # Set according to your GPU memory capacity\n",
+    "    learning_rate=5e-5,  # Common starting point for fine-tuning\n",
+    "    logging_steps=10,  # Frequency of logging training metrics\n",
+    "    save_steps=100,  # Frequency of saving model checkpoints\n",
+    "    evaluation_strategy=\"steps\",  # Evaluate the model at regular intervals\n",
+    "    eval_steps=50,  # Frequency of evaluation\n",
+    "    use_mps_device=(\n",
+    "        True if device == \"mps\" else False\n",
+    "    ),  # Use MPS for mixed precision training\n",
+    "    hub_model_id=finetune_name,  # Set a unique name for your model\n",
+    ")\n",
+    "\n",
+    "# Initialize the SFTTrainer\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=sft_config,\n",
+    "    train_dataset=ds[\"train\"],\n",
+    "    tokenizer=tokenizer,\n",
+    "    eval_dataset=ds[\"test\"],\n",
+    ")\n",
+    "\n",
+    "# TODO: 🦁 🐕 align the SFTTrainer params with your chosen dataset. For example, if you are using the `bigcode/the-stack-smol` dataset, you will need to choose the `content` column`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training the Model\n",
+    "\n",
+    "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Train the model\n",
+    "trainer.train()\n",
+    "\n",
+    "# Save the model\n",
+    "trainer.save_model(f\"./{finetune_name}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trainer.push_to_hub(tags=finetune_tags)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
+    "    <h2 style='margin: 0;color:blue'>Bonus Exercise: Generate with fine-tuned model</h2>\n",
+    "    <p>🐕 Use the fine-tuned to model generate a response, just like with the base example..</p>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test the fine-tuned model on the same prompt\n",
+    "\n",
+    "# Let's test the base model before training\n",
+    "prompt = \"Write a haiku about programming\"\n",
+    "\n",
+    "# Format with template\n",
+    "messages = [{\"role\": \"user\", \"content\": prompt}]\n",
+    "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n",
+    "\n",
+    "# Generate response\n",
+    "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n",
+    "\n",
+    "# TODO: use the fine-tuned to model generate a response, just like with the base example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 💐 You're done!\n",
+    "\n",
+    "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n",
+    "\n",
+    "- Try this notebook on a harder difficulty\n",
+    "- Review a colleagues PR\n",
+    "- Improve the course material via an Issue or PR."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "py310",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/chapters/en/chapter11/supervised_fine_tuning.md b/chapters/en/chapter11/supervised_fine_tuning.md
new file mode 100644
index 000000000..dc236962e
--- /dev/null
+++ b/chapters/en/chapter11/supervised_fine_tuning.md
@@ -0,0 +1,41 @@
+# Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on carefully curated datasets with human-validated examples.
+
+## Understanding Supervised Fine-Tuning
+
+At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case.
+
+SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
+
+## When to Use Supervised Fine-Tuning
+
+The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
+
+For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise.
+
+## The Fine-Tuning Process
+
+The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. 
+
+First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. 
+
+Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. In [module 4](../4_evaluation), we'll cover how to evaluate your model.
+
+## The Role of SFT in Preference Alignment
+
+SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations.
+
+## Supervised Fine-Tuning With Transformer Reinforcement Learning
+
+A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models models using reinforcement learning (RL).
+
+Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo.
+
+# Next Steps
+
+Try out the following tutorials to get hands on experience with SFT using TRL:
+
+⏭️ [Chat Templates Tutorial](./notebooks/chat_templates_example.ipynb)
+
+⏭️ [Supervised Fine-Tuning Tutorial](./notebooks/sft_finetuning_example.ipynb)

From 995493bac10a8d7356747d12be1b7543b042ca1f Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Thu, 30 Jan 2025 14:42:00 +0100
Subject: [PATCH 02/30] convert smol course material into nlp course style

---
 chapters/en/chapter11/1.mdx                   |   34 +
 chapters/en/chapter11/10.mdx                  |   56 +
 chapters/en/chapter11/11.mdx                  |   13 +
 .../en/chapter11/{chat_templates.md => 2.mdx} |   58 +-
 chapters/en/chapter11/3.mdx                   |   83 +
 chapters/en/chapter11/4.mdx                   |  125 +
 chapters/en/chapter11/5.mdx                   |   84 +
 chapters/en/chapter11/6.mdx                   |  109 +
 chapters/en/chapter11/7.mdx                   |  120 +
 chapters/en/chapter11/8.mdx                   |   47 +
 chapters/en/chapter11/9.mdx                   |  185 +
 chapters/en/chapter11/README.md               |   30 -
 .../en/chapter11/chat_templates_example.ipynb | 5741 -----------------
 .../en/chapter11/sft_finetuning_example.ipynb |  273 -
 .../en/chapter11/supervised_fine_tuning.md    |   41 -
 15 files changed, 861 insertions(+), 6138 deletions(-)
 create mode 100644 chapters/en/chapter11/1.mdx
 create mode 100644 chapters/en/chapter11/10.mdx
 create mode 100644 chapters/en/chapter11/11.mdx
 rename chapters/en/chapter11/{chat_templates.md => 2.mdx} (55%)
 create mode 100644 chapters/en/chapter11/3.mdx
 create mode 100644 chapters/en/chapter11/4.mdx
 create mode 100644 chapters/en/chapter11/5.mdx
 create mode 100644 chapters/en/chapter11/6.mdx
 create mode 100644 chapters/en/chapter11/7.mdx
 create mode 100644 chapters/en/chapter11/8.mdx
 create mode 100644 chapters/en/chapter11/9.mdx
 delete mode 100644 chapters/en/chapter11/README.md
 delete mode 100644 chapters/en/chapter11/chat_templates_example.ipynb
 delete mode 100644 chapters/en/chapter11/sft_finetuning_example.ipynb
 delete mode 100644 chapters/en/chapter11/supervised_fine_tuning.md

diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
new file mode 100644
index 000000000..5acfb97fd
--- /dev/null
+++ b/chapters/en/chapter11/1.mdx
@@ -0,0 +1,34 @@
+# Supervised Fine-Tuning
+
+This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. We will separate this chapter into three sections:
+
+## 1️⃣ Chat Templates
+
+Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages.
+
+## 2️⃣ Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices.
+
+## 3️⃣ Low Rank Adaptation (LoRA)
+
+Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge.
+
+
+## 4️⃣ Evaluation
+
+Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset.
+
+<Tip>
+⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
+</Tip>
+
+## References
+
+- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)
+- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer)
+- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290)
+- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning)
+- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)
+- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)
diff --git a/chapters/en/chapter11/10.mdx b/chapters/en/chapter11/10.mdx
new file mode 100644
index 000000000..bd307e7e9
--- /dev/null
+++ b/chapters/en/chapter11/10.mdx
@@ -0,0 +1,56 @@
+# Implementing Evaluation
+
+In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
+
+LightEval tasks are defined using a specific format:
+
+```
+{suite}|{task}|{num_few_shot}|{auto_reduce}
+```
+
+| Parameter | Description |
+|-----------|-------------|
+| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
+| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
+| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
+| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
+
+Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
+
+## Example Evaluation Pipeline
+
+Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
+
+Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
+
+```bash
+lighteval vllm \
+    "pretrained=your-model-name" \
+    "mmlu|anatomy|0|0" \
+    "mmlu|high_school_biology|0|0" \
+    "mmlu|high_school_chemistry|0|0" \
+    "mmlu|professional_medicine|0|0" \
+    --max_samples 40 \
+    --batch_size 1 \
+    --output_path "./results" \
+    --save_generations true
+```
+
+Results are displayed in a tabular format showing:
+
+```
+|                  Task                  |Version|Metric|Value |   |Stderr|
+|----------------------------------------|------:|------|-----:|---|-----:|
+|all                                     |       |acc   |0.3333|±  |0.1169|
+|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
+|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
+|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
+```
+
+Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
+
+<Tip>
+
+✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
+
+</Tip>
\ No newline at end of file
diff --git a/chapters/en/chapter11/11.mdx b/chapters/en/chapter11/11.mdx
new file mode 100644
index 000000000..093de47d6
--- /dev/null
+++ b/chapters/en/chapter11/11.mdx
@@ -0,0 +1,13 @@
+# Conclusion
+
+In this chapter, we explored the essential components of fine-tuning language models:
+
+1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
+
+2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
+
+3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
+
+4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
+
+These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
diff --git a/chapters/en/chapter11/chat_templates.md b/chapters/en/chapter11/2.mdx
similarity index 55%
rename from chapters/en/chapter11/chat_templates.md
rename to chapters/en/chapter11/2.mdx
index 61ff65e6f..87a82f026 100644
--- a/chapters/en/chapter11/chat_templates.md
+++ b/chapters/en/chapter11/2.mdx
@@ -12,7 +12,7 @@ It's important to note that a base model could be fine-tuned on different chat t
 
 ## Understanding Chat Templates
 
-At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:
+At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:
 
 ```sh
 <|im_start|>user
@@ -49,7 +49,7 @@ system_message = {
 
 ## Conversations
 
-Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:
+Chat templates can maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:
 
 ```python
 conversation = [
@@ -59,56 +59,8 @@ conversation = [
 ]
 ```
 
-## Implementation with Transformers
+<Tip>
 
-The transformers library provides built-in support for chat templates. Here's how to use them:
+✏️ **Try it out!** Create a chat template for a conversation between a user and an assistant. Then, use the `transformers` library to tokenize the conversation and see how the model responds. You won't need to download the model to do this, as the tokenizer will handle the formatting.
 
-```python
-from transformers import AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
-
-messages = [
-    {"role": "system", "content": "You are a helpful coding assistant."},
-    {"role": "user", "content": "Write a Python function to sort a list"},
-]
-
-# Apply the chat template
-formatted_chat = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-```
-
-## Custom Formatting
-You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles:
-
-```python
-template = """
-<|system|>{system_message}
-<|user|>{user_message}
-<|assistant|>{assistant_message}
-""".lstrip()
-```
-
-## Multi-Turn Support
-
-Templates can handle complex multi-turn conversations while maintaining context:
-
-```python
-messages = [
-    {"role": "system", "content": "You are a math tutor."},
-    {"role": "user", "content": "What is calculus?"},
-    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
-    {"role": "user", "content": "Can you give me an example?"},
-]
-```
-
-⏭️ [Next: Supervised Fine-Tuning](./supervised_fine_tuning.md)
-
-## Resources
-
-- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
-- [Transformers Documentation](https://huggingface.co/docs/transformers)
-- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) 
+</Tip>
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
new file mode 100644
index 000000000..baa3972c1
--- /dev/null
+++ b/chapters/en/chapter11/3.mdx
@@ -0,0 +1,83 @@
+# Implementation with Transformers
+
+Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages.
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
+
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant."},
+    {"role": "user", "content": "Write a Python function to sort a list"},
+]
+
+# Apply the chat template
+formatted_chat = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+```
+
+This will return a formatted string that can be passed to the model. It would look like this for the SmolLM2-135M-Instruct model specified:
+
+```sh
+<|im_start|>system
+You are a helpful coding assistant.<|im_end|>
+<|im_start|>user
+Write a Python function to sort a list<|im_end|>
+```
+
+Note that the `im_start` and `im_end` tokens are used to indicate the start and end of a message. The tokenizer will also have corresponding special tokens for the start and end of messages. For a refresher on how these tokens work, see the [Tokenizers](../chapter2/5.mdx) section.
+
+Chat templates can handle multi-turn conversations while maintaining context:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a math tutor."},
+    {"role": "user", "content": "What is calculus?"},
+    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
+    {"role": "user", "content": "Can you give me an example?"},
+]
+```
+
+## Working with Chat Templates
+
+When working with chat templates, you have several options for processing the conversation:
+
+1. Apply the template without tokenization to return the raw formatted string
+2. Apply the template with tokenization to return the token IDs
+3. Add a generation prompt to prepare for model inference
+
+The tokenizer's `apply_chat_template()` method handles all these cases through its parameters:
+
+- `tokenize`: Whether to return token IDs (True) or the formatted string (False)
+- `add_generation_prompt`: Whether to add a prompt for the model to generate a response
+
+<Tip>
+
+✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file.
+
+For this exercise, you'll need to:
+1. Load the dataset using the Hugging Face datasets library
+2. Create a processing function that converts the samples into the correct chat format
+3. Apply the chat template using the tokenizer's methods
+
+</Tip>
+
+## Conclusion
+
+Chat templates are a crucial component for working with language models, especially when fine-tuning or deploying models for chat applications. They provide structure and consistency to conversations, making it easier for models to understand context and generate appropriate responses.
+
+Understanding how to work with chat templates is essential for:
+- Converting datasets for fine-tuning
+- Preparing inputs for model inference
+- Maintaining conversation context
+- Ensuring consistent model behavior
+
+## Resources
+
+- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Transformers Documentation](https://huggingface.co/docs/transformers)
+- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) 
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
new file mode 100644
index 000000000..5ae7b7605
--- /dev/null
+++ b/chapters/en/chapter11/4.mdx
@@ -0,0 +1,125 @@
+# Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
+
+Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections.
+
+## Understanding Supervised Fine-Tuning
+
+Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. 
+
+SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
+
+## When to Use Supervised Fine-Tuning
+
+The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
+
+Two core reasons to use SFT are:
+
+1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs.
+
+2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise.
+
+## Quiz
+
+### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)?
+
+<Question
+	choices={[
+		{
+			text: "To train a language model from scratch",
+			explain: "SFT builds upon pre-trained models rather than training from scratch."
+		},
+		{
+			text: "To adapt a pre-trained model to specific tasks or domains while maintaining its foundational knowledge",
+			explain: "Correct! SFT allows models to learn specific tasks while leveraging their pre-trained capabilities.",
+			correct: true
+		},
+		{
+			text: "To compress a large language model into a smaller one",
+			explain: "This is more related to model distillation, not SFT."
+		}
+	]}
+/>
+
+### 2. Which of the following are valid reasons to use SFT?
+
+<Question
+	choices={[
+		{
+			text: "Template Control - ensuring the model generates outputs in a specific format",
+			explain: "Yes! SFT helps enforce specific output structures through training examples.",
+			correct: true
+		},
+		{
+			text: "Domain Adaptation - teaching the model domain-specific knowledge and terminology",
+			explain: "Correct! SFT is excellent for adapting models to specialized domains.",
+			correct: true
+		},
+		{
+			text: "Model Architecture Changes - modifying the underlying structure of the model",
+			explain: "SFT doesn't change the model architecture, it only updates the weights."
+		}
+	]}
+/>
+
+### 3. What is required for effective Supervised Fine-Tuning?
+
+<Question
+	choices={[
+		{
+			text: "A pre-trained language model",
+			explain: "Yes! SFT starts with a pre-trained model as its foundation.",
+			correct: true
+		},
+		{
+			text: "Validated examples of desired input-output behavior",
+			explain: "Correct! Quality training data is crucial for successful SFT.",
+			correct: true
+		},
+		{
+			text: "A high performing reference model",
+			explain: "SFT uses existing architectures rather than creating new ones."
+		}
+	]}
+/>
+
+### 4. How does SFT relate to chat templates?
+
+<Question
+	choices={[
+		{
+			text: "SFT can train models to consistently follow specific chat templates",
+			explain: "Correct! SFT helps models learn to generate responses in the desired template format.",
+			correct: true
+		},
+		{
+			text: "Chat templates are not compatible with SFT",
+			explain: "Incorrect! Chat templates are commonly used with SFT for structured outputs."
+		},
+		{
+			text: "SFT automatically creates chat templates",
+			explain: "SFT doesn't create templates, it trains models to use existing templates."
+		}
+	]}
+/>
+
+### 5. What distinguishes SFT from pre-training?
+
+<Question
+	choices={[
+		{
+			text: "SFT uses labeled data for specific tasks",
+			explain: "Yes! SFT requires examples of desired behavior for specific tasks.",
+			correct: true
+		},
+		{
+			text: "SFT is faster than pre-training",
+			explain: "The speed difference isn't a defining characteristic; it depends on various factors."
+		},
+		{
+			text: "SFT requires more data than pre-training",
+			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
+		}
+	]}
+/>
\ No newline at end of file
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
new file mode 100644
index 000000000..d19d3f5c5
--- /dev/null
+++ b/chapters/en/chapter11/5.mdx
@@ -0,0 +1,84 @@
+# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+
+The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Let's work through the process step by step.
+
+## Dataset Preparation
+
+Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+
+<iframe
+  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
+  frameborder="0"
+  width="100%"
+  height="360px"
+></iframe>
+
+## Training Configuration
+
+We will configure SFT trainer with the following parameters:
+
+| Parameter | Description |
+|-----------|-------------|
+| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
+| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
+| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
+| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
+| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
+| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
+| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
+| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+
+In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+
+## Training and Evaluation
+
+Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
+
+- Iterating over the dataset
+- Computing the loss
+- Updating the model's parameters
+- Regular evaluation on a validation set
+
+Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
+
+## `SFTTrainer` from Transformer Reinforcement Learning
+
+Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO).
+
+Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model.
+
+```python
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer
+
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+
+training_args = SFTConfig(
+    max_seq_length=512,
+    output_dir="/tmp",
+)
+
+trainer = SFTTrainer(
+    model_name="HuggingFaceTB/SmolLM2-135M",
+    train_dataset=dataset,
+    args=training_args,
+)
+trainer.train()
+```
+
+<Tip>
+
+✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+
+For this exercise, you'll need to:
+1. Load and prepare your chosen dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
+</Tip>
+
+## Resources
+
+- [TRL Documentation](https://huggingface.co/docs/trl)
+- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx
new file mode 100644
index 000000000..a81bfe762
--- /dev/null
+++ b/chapters/en/chapter11/6.mdx
@@ -0,0 +1,109 @@
+# Supervised Fine-Tuning with SFTTrainer
+
+This page demonstrates how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.
+
+## Load the base model
+
+Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
+
+```python
+# Import necessary libraries
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer, setup_chat_format
+import torch
+
+# Set the device to use for training
+device = (
+    "cuda"
+    if torch.cuda.is_available()
+    else "mps" if torch.backends.mps.is_available() else "cpu"
+)
+
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
+).to(device)
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
+
+# Set up the chat format
+model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
+
+# Set our name for the finetune to be saved &/ uploaded to
+finetune_name = "SmolLM2-FT-MyDataset"
+finetune_tags = ["smol-course", "module_1"]
+```
+
+## Generate with the base model
+
+First we will try out the base model which does not have a chat template. 
+
+```python
+# Let's test the base model before training
+prompt = "Write a haiku about programming"
+
+# Format with template
+messages = [{"role": "user", "content": prompt}]
+formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+
+# Generate response
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+```
+
+## Dataset Preparation
+
+We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
+
+**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+
+```python
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+```
+
+## Configuring the SFTTrainer
+
+The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
+
+```python
+# Configure the SFTTrainer
+sft_config = SFTConfig(
+    output_dir="./sft_output",
+    max_steps=1000,  # Adjust based on dataset size and desired training duration
+    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
+    learning_rate=5e-5,  # Common starting point for fine-tuning
+    logging_steps=10,  # Frequency of logging training metrics
+    save_steps=100,  # Frequency of saving model checkpoints
+    evaluation_strategy="steps",  # Evaluate the model at regular intervals
+    eval_steps=50,  # Frequency of evaluation
+    use_mps_device=(
+        True if device == "mps" else False
+    ),  # Use MPS for mixed precision training
+    hub_model_id=finetune_name,  # Set a unique name for your model
+)
+
+# Initialize the SFTTrainer
+trainer = SFTTrainer(
+    model=model,
+    args=sft_config,
+    train_dataset=ds["train"],
+    tokenizer=tokenizer,
+    eval_dataset=ds["test"],
+)
+```
+
+## Training the model
+
+With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+
+```python
+trainer.train()
+```
+
+## 💐 Nice work!
+
+This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
+
+- Try this notebook on a harder difficulty
+- Review a colleagues PR
+- Improve the course material via an Issue or PR.
diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
new file mode 100644
index 000000000..a47bd1f42
--- /dev/null
+++ b/chapters/en/chapter11/7.mdx
@@ -0,0 +1,120 @@
+# LoRA (Low-Rank Adaptation)
+
+Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. 
+
+## Understanding LoRA
+
+LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685).
+
+LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.
+
+## Key advantages of LoRA
+
+1. **Memory Efficiency**: 
+   - Only adapter parameters are stored in GPU memory
+   - Base model weights remain frozen and can be loaded in lower precision
+   - Enables fine-tuning of large models on consumer GPUs
+
+2. **Training Features**:
+   - Native PEFT/LoRA integration with minimal setup
+   - Support for QLoRA (Quantized LoRA) for even better memory efficiency
+
+3. **Adapter Management**:
+   - Adapter weight saving during checkpoints
+   - Features to merge adapters back into base model
+
+## Loading LoRA Adapters with PEFT
+
+PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
+
+Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
+
+```python
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+
+base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
+peft_model_id = "<peft_adapter_id>"
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+```
+
+<!-- TODO: Add image -->
+![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png)
+
+## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA
+
+The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
+
+We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
+
+1. Define the LoRA configuration (rank, alpha, dropout)
+2. Create the SFTTrainer with PEFT config
+3. Train and save the adapter weights
+
+## LoRA Configuration
+
+Let's walk through the LoRA configuration and key parameters.
+
+| Parameter | Description |
+|-----------|-------------|
+| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. |
+| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. |
+| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. |
+| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
+| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
+
+When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
+
+## Using TRL with PEFT
+
+PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
+
+```python
+from peft import LoraConfig
+
+# TODO: Configure LoRA parameters
+# r: rank dimension for LoRA update matrices (smaller = more compression)
+rank_dimension = 6
+# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
+lora_alpha = 8
+# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
+lora_dropout = 0.05
+
+peft_config = LoraConfig(
+    r=rank_dimension,  # Rank dimension - typically between 4-32
+    lora_alpha=lora_alpha,  # LoRA scaling factor - typically 2x rank
+    lora_dropout=lora_dropout,  # Dropout probability for LoRA layers
+    bias="none",  # Bias type for LoRA. the corresponding biases will be updated during training.
+    target_modules="all-linear",  # Which modules to apply LoRA to
+    task_type="CAUSAL_LM",  # Task type for model architecture
+)
+```
+
+Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. 
+
+We will also need to define the `SFTTrainer` with the LoRA configuration.
+
+```python
+# Create SFTTrainer with LoRA configuration
+trainer = SFTTrainer(
+    model=model,
+    args=args,
+    train_dataset=dataset["train"],
+    peft_config=lora_config,  # LoRA configuration
+    max_seq_length=max_seq_length,  # Maximum sequence length
+    tokenizer=tokenizer,
+
+)
+```
+
+<Tip>
+
+✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
+
+</Tip>
+
+# Resources
+
+- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685)
+- [PEFT Documentation](https://huggingface.co/docs/peft)
+- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx
new file mode 100644
index 000000000..acc3fdabf
--- /dev/null
+++ b/chapters/en/chapter11/8.mdx
@@ -0,0 +1,47 @@
+## Merging LoRA Adapters
+
+After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
+
+The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will help with automatic memory management. Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. Before deploying, always validate the merged model by comparing its outputs and performance metrics with the adapter-based version.
+
+## Merging Implementation
+
+After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it:
+
+```python
+import torch
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+
+# 1. Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "base_model_name",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+
+# 2. Load the PEFT model with adapter
+peft_model = PeftModel.from_pretrained(
+    base_model,
+    "path/to/adapter",
+    torch_dtype=torch.float16
+)
+
+# 3. Merge adapter weights with base model
+merged_model = peft_model.merge_and_unload()
+```
+
+If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer:
+
+```python
+# Save both model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("base_model_name")
+merged_model.save_pretrained("path/to/save/merged_model")
+tokenizer.save_pretrained("path/to/save/merged_model")
+```
+
+<Tip>
+
+✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
+
+</Tip>
diff --git a/chapters/en/chapter11/9.mdx b/chapters/en/chapter11/9.mdx
new file mode 100644
index 000000000..df0b28765
--- /dev/null
+++ b/chapters/en/chapter11/9.mdx
@@ -0,0 +1,185 @@
+# Evaluation
+
+With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
+
+## Automatic Benchmarks
+
+Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy.
+
+## Understanding Automatic Benchmarks
+
+Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results.
+
+However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases.
+
+## General Knowledge Benchmarks
+
+MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
+
+## Reasoning Benchmarks
+BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
+
+## Language Understanding
+HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
+
+## Alternative Evaluation Approaches
+
+Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
+
+### LLM-as-Judge
+Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
+
+### Evaluation Arenas
+Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
+
+### Custom Benchmark Suites
+Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
+
+## Creating Your Own Evaluation Strategy
+
+Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
+
+While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
+
+1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models.
+
+2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic?
+
+3. Develop custom evaluation datasets that reflect your actual use case. This might include:
+   - Real user queries from your domain
+   - Common edge cases you've encountered
+   - Examples of particularly challenging scenarios
+
+4. Consider implementing a multi-layered evaluation strategy:
+   - Automated metrics for quick feedback
+   - Human evaluation for nuanced understanding
+   - Domain expert review for specialized applications
+   - A/B testing in controlled environments
+
+# End-of-chapter quiz[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={11}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+### 1. What are the main advantages of using automatic benchmarks for model evaluation?
+
+<Question
+	choices={[
+		{
+			text: "They provide perfect real-world performance metrics",
+			explain: "Incorrect! While automatic benchmarks are useful, they don't always translate directly to real-world performance."
+		},
+		{
+			text: "They allow for standardized comparison between models and provide reproducible results",
+			explain: "Correct! This is one of the key benefits of automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They eliminate the need for any other form of evaluation",
+			explain: "Incorrect! Automatic benchmarks should be part of a comprehensive evaluation strategy, not the only method."
+		}
+	]}
+/>
+
+### 2. Which benchmark specifically tests knowledge across 57 different subjects?
+
+<Question
+	choices={[
+		{
+			text: "BBH (Big Bench Hard)",
+			explain: "Incorrect! BBH focuses on complex reasoning tasks, not broad subject knowledge."
+		},
+		{
+			text: "GSM8K",
+			explain: "Incorrect! GSM8K specifically targets mathematical problem-solving."
+		},
+		{
+			text: "MMLU",
+			explain: "Correct! MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities.",
+			correct: true
+		}
+	]}
+/>
+
+### 3. What is LLM-as-Judge?
+
+<Question
+	choices={[
+		{
+			text: "Using one language model to evaluate another's outputs",
+			explain: "Correct! This is an alternative evaluation approach that can provide more nuanced feedback.",
+			correct: true
+		},
+		{
+			text: "A benchmark that tests judicial reasoning",
+			explain: "Incorrect! LLM-as-Judge refers to using one model to evaluate another, not testing judicial reasoning."
+		},
+		{
+			text: "A method for training models on legal datasets",
+			explain: "Incorrect! This isn't related to training on legal data, but rather using one model to evaluate another's outputs."
+		}
+	]}
+/>
+
+### 4. What should be included in a comprehensive evaluation strategy?
+
+<Question
+	choices={[
+		{
+			text: "Only standard benchmarks",
+			explain: "Incorrect! A comprehensive strategy should include multiple evaluation methods."
+		},
+		{
+			text: "Standard benchmarks, custom evaluation datasets, and domain-specific testing",
+			explain: "Correct! A comprehensive strategy should include multiple layers of evaluation.",
+			correct: true
+		},
+		{
+			text: "Only custom datasets specific to your use case",
+			explain: "Incorrect! While custom datasets are important, they shouldn't be the only evaluation method."
+		}
+	]}
+/>
+
+### 5. What is a limitation of automatic benchmarks?
+
+<Question
+	choices={[
+		{
+			text: "They are too expensive to run",
+			explain: "Incorrect! Cost isn't typically the main limitation of automatic benchmarks."
+		},
+		{
+			text: "Benchmark performance doesn't always translate directly to real-world effectiveness",
+			explain: "Correct! This is a key limitation to keep in mind when using automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They can only evaluate small models",
+			explain: "Incorrect! Automatic benchmarks can be used to evaluate models of various sizes."
+		}
+	]}
+/>
+
+### 6. What is the purpose of creating custom evaluation datasets?
+
+<Question
+	choices={[
+		{
+			text: "To reflect your specific use case and include real user queries from your domain",
+			explain: "Correct! Custom datasets help ensure evaluation is relevant to your specific needs.",
+			correct: true
+		},
+		{
+			text: "To replace standard benchmarks entirely",
+			explain: "Incorrect! Custom datasets should complement, not replace, standard benchmarks."
+		},
+		{
+			text: "To make evaluation easier",
+			explain: "Incorrect! Creating custom datasets requires additional effort but provides more relevant evaluation."
+		}
+	]}
+/>
+
diff --git a/chapters/en/chapter11/README.md b/chapters/en/chapter11/README.md
deleted file mode 100644
index a7fae79c6..000000000
--- a/chapters/en/chapter11/README.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# Instruction Tuning
-
-This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. 
-
-In this module, we will explore two topics: 1) Chat Templates and 2) Supervised Fine-Tuning.
-
-## 1️⃣ Chat Templates
-
-Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the [Chat Templates](./chat_templates.md) section.
-
-## 2️⃣ Supervised Fine-Tuning
-
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the [Supervised Fine-Tuning](./supervised_fine_tuning.md) page.
-
-## Exercise Notebooks
-
-| Title | Description | Exercise | Link | Colab |
-|-------|-------------|----------|------|-------|
-| Chat Templates | Learn how to use chat templates with SmolLM2 and process datasets into chatml format | 🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format <br> 🐕 Convert the `openai/gsm8k` dataset into chatml format | [Notebook](./notebooks/chat_templates_example.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
-| Supervised Fine-Tuning | Learn how to fine-tune SmolLM2 using the SFTTrainer | 🐢 Use the `HuggingFaceTB/smoltalk` dataset<br>🐕 Try out the `bigcode/the-stack-smol` dataset<br>🦁 Select a dataset for a real world use case | [Notebook](./notebooks/sft_finetuning_example.ipynb) | <a target="_blank" href="https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
-
-## References
-
-- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating)
-- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)
-- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer)
-- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290)
-- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning)
-- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)
-- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)
diff --git a/chapters/en/chapter11/chat_templates_example.ipynb b/chapters/en/chapter11/chat_templates_example.ipynb
deleted file mode 100644
index 88f60c1e4..000000000
--- a/chapters/en/chapter11/chat_templates_example.ipynb
+++ /dev/null
@@ -1,5741 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "vZAvFVIAtFlq"
-   },
-   "source": [
-    "# Exploring Chat Templates with SmolLM2\n",
-    "\n",
-    "This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "K-lZu8JvtwUN",
-    "outputId": "c3871418-15bc-4265-ae8d-6d6036036d0e"
-   },
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "c15d320002504d95bb86e87f50d43b08",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "# Install the requirements in Google Colab\n",
-    "# !pip install transformers datasets trl huggingface_hub\n",
-    "\n",
-    "# Authenticate to Hugging Face\n",
-    "from huggingface_hub import login\n",
-    "\n",
-    "login()\n",
-    "\n",
-    "# for convenience you can create an environment variable containing your hub token as HF_TOKEN"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "id": "tnHzBR7vtFlr"
-   },
-   "outputs": [],
-   "source": [
-    "# Import necessary libraries\n",
-    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
-    "from trl import setup_chat_format\n",
-    "import torch"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "XTVOqbuetFlr"
-   },
-   "source": [
-    "## SmolLM2 Chat Template\n",
-    "\n",
-    "Let's explore how to use a chat template with the `SmolLM2` model. We'll define a simple conversation and apply the chat template."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/",
-     "height": 397,
-     "referenced_widgets": [
-      "b922b90106414644bc0e933f28dea1bf",
-      "e0a40f83ae2e4ab29376a1d48b53aa6e",
-      "547eeb64ffd34e509c0b8b8ba6d657e2",
-      "45675fb5f5c94f8cae575582f7ae41a7",
-      "016d5e929f1240cea067372b2191d107",
-      "a026a32dd6d646bea82c1ebb06147d89",
-      "0479fd3fc1ba476ab46f8c0a98f89468",
-      "cbc312cb858b48a5a0f8dbcf60b7e684",
-      "f70401b6dba74380b19bd1ef887b3bf7",
-      "7eb91920e4384194a008902d6c4a09c7",
-      "b379da78cb34463aa5a72eedc3d176cd",
-      "ae2690497e024095adb3879643cffd33",
-      "f600aa1fe4094133888ec9a2504a60eb",
-      "efe9a9fcebfe441b80075fbfe9c32674",
-      "50dbf8861ca94b0ba1f4a7e2f0d8aead",
-      "547151540399460fb9a946bbe67afbd9",
-      "5291041c86db4933816088c047d659d8",
-      "48724ba7ba4e4f00923445245640739f",
-      "04ae3f7b640c42f3a8eb1977cd1a585d",
-      "db3bd55d779947028f36a8b24a2621b6",
-      "d17c62b889754b5d88cfced5b18ff7a7",
-      "990f706db474450ba0997d1dbcd53cb7",
-      "3b881514716c47308061fe85b810a6a4",
-      "26ed0f1bae204d74a313d101d9355e90",
-      "4ff5af1784904bc9b85515105885e2d8",
-      "b3c42d7e25d6494993029531adc3866d",
-      "6227b40396ea4024b3c8710c5e65601f",
-      "7612cc9b8908471b90c9118151d6e447",
-      "b687aca79e6e470b96254c5e309d6d63",
-      "3fa18e3b50104af796bd0887f556224a",
-      "4bfa3103048a47989a09a0d90ac6b9bf",
-      "85de66e1ee3140cf85eadebe5fea1e9f",
-      "b31de9bcf83e4070be09c7d663361232",
-      "d64d50101891491f96ff80162dc6d26c",
-      "d65ec0f0dc0b44e0869c6159e6e82ad6",
-      "76febcd912404a58add3a39f80a8218d",
-      "f4ea276bdc0d4da2a04b46e3f1ed95b5",
-      "0942430d36de4677b4c2fa771d7bcd2a",
-      "10a0f37020d44156a11e9750778892e0",
-      "58fb913274b54a60a832513c09608a2f",
-      "0bab42beb845475684e9e71dd1591e1d",
-      "89ecd1b28ab64c90afe3b9736fd48306",
-      "be4e145938054f13a510fe4d04a7a60d",
-      "648c3c820b39493daf0cce5f57a55467",
-      "01e0f8a799ad479eb95eef3e5a09bd70",
-      "8fe2df9a14a0436c9124a856ac7419e4",
-      "d108e029e743419989e30f64f0c82b90",
-      "bfd11f21f197459b8f27ef364bc9b264",
-      "76a0341ebe9f4c3face32460d7023be9",
-      "da1a999fb5af4eae9f6a9d1086cbb4cf",
-      "77f6c27c3c854138b4aa9789637141a1",
-      "6ceb292f2b8544f2a9a005d16d3e8978",
-      "41a27cf0a91246599d4d1b7dae7c7863",
-      "745fb1db425e44e5b3a23b36ae7675d1",
-      "bde95b39561145548fc81fb4cc94a1bf",
-      "3cc519fd92fe4b328943ec839115b63e",
-      "e15fc503bb73476980cedb5f06b51ced",
-      "d8c5dc8df3be4e65b2bbba020d29150f",
-      "c0177c4ad18740d88acfc603ce4735f8",
-      "eb570fd159124e2cbd2df9335b3f9cd6",
-      "5de5dab3d92f4f41838a8f302d27f0c3",
-      "471b481a3e5b4d439ab31fdc49fc99c7",
-      "7a0c705334694da6b750104b28db6dba",
-      "0c336ea5c653434da49e2f0e949f83d0",
-      "ec15d99b3a604405a2b4707931d4bf44",
-      "e7f5d507d9564941bb7db742b4bf01c7",
-      "aa2d32cb76ba47ebaa5ea391efbf58a7",
-      "7b20c7c8f6be40c6815b8531ecb9c936",
-      "e90b58981bd34d0e8f975fc1a9658c4c",
-      "5b7b09d983844f7893bdda411f9a0076",
-      "70f0eaed6ef14c2db8aecb592edeb1ad",
-      "d32017fa83aa44f6b2e3443a602654be",
-      "ff8debfb713f4b88be6b9b3bf33bfca2",
-      "ed577dea3ac54884a637ad775b42bc68",
-      "d43410dfcc8c4bebb8672f10ed2aeb66",
-      "0206fb9662a349c1aa8a6d87ce01c388",
-      "881b6196dfa0446e8c55a2420e484b6b",
-      "d54fb2da9f1f4a89ae962b8816314f43",
-      "77d3d81687e6417ab988b04984fc68f4",
-      "fbce0a69847e4099a55d1e39d4118c91",
-      "1513792bad534a0c9c381a131395c519",
-      "69f38fecf8ad403898634cfdfadf8925",
-      "17023310de9b4c3ebd8cc03758d59ef9",
-      "f3e23f781bce4429954d76bfea97aff4",
-      "530fc4c2bf1244628af7dea3e4b35cdf",
-      "96c2aae9198441569362135ad4bcbc98",
-      "76d306c21214412ab44e542d82e547aa",
-      "b9e41ef9e9c54fa7b71bc333604af74e"
-     ]
-    },
-    "id": "Nrxh0oX6tFls",
-    "outputId": "953e1527-8168-4346-9338-6e188ca31a1a"
-   },
-   "outputs": [],
-   "source": [
-    "# Dynamically set the device\n",
-    "device = (\n",
-    "    \"cuda\"\n",
-    "    if torch.cuda.is_available()\n",
-    "    else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n",
-    ")\n",
-    "\n",
-    "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n",
-    "model = AutoModelForCausalLM.from_pretrained(\n",
-    "    pretrained_model_name_or_path=model_name\n",
-    ").to(device)\n",
-    "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n",
-    "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "id": "zkJwILrbtFls"
-   },
-   "outputs": [],
-   "source": [
-    "# Define messages for SmolLM2\n",
-    "messages = [\n",
-    "    {\"role\": \"user\", \"content\": \"Hello, how are you?\"},\n",
-    "    {\n",
-    "        \"role\": \"assistant\",\n",
-    "        \"content\": \"I'm doing well, thank you! How can I assist you today?\",\n",
-    "    },\n",
-    "]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "Ve4dgtjstFls"
-   },
-   "source": [
-    "# Apply chat template without tokenization\n",
-    "\n",
-    "The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "pbAg-5x-tFls",
-    "outputId": "5f9482db-1fcf-4c13-ccaa-ef3f6eff7f76"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Conversation with template: <|im_start|>user\n",
-      "Hello, how are you?<|im_end|>\n",
-      "<|im_start|>assistant\n",
-      "I'm doing well, thank you! How can I assist you today?<|im_end|>\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "input_text = tokenizer.apply_chat_template(messages, tokenize=False)\n",
-    "\n",
-    "print(\"Conversation with template:\", input_text)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "sfvdglOqtFls"
-   },
-   "source": [
-    "# Decode the conversation\n",
-    "\n",
-    "Note that the conversation is represented as above but with a further assistant message.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "mXUVdPeytFls",
-    "outputId": "80870e53-7bc1-426e-ac33-ba6748e030fc"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Conversation decoded: <|im_start|>user\n",
-      "Hello, how are you?<|im_end|>\n",
-      "<|im_start|>assistant\n",
-      "I'm doing well, thank you! How can I assist you today?<|im_end|>\n",
-      "<|im_start|>assistant\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "input_text = tokenizer.apply_chat_template(\n",
-    "    messages, tokenize=True, add_generation_prompt=True\n",
-    ")\n",
-    "\n",
-    "print(\"Conversation decoded:\", tokenizer.decode(token_ids=input_text))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "UcZQpspEtFlt"
-   },
-   "source": [
-    "# Tokenize the conversation\n",
-    "\n",
-    "Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "jc2PLxAMtFlt",
-    "outputId": "d2098780-b3f4-41ec-a1f3-b6da2b593c62"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]\n"
-     ]
-    }
-   ],
-   "source": [
-    "input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)\n",
-    "\n",
-    "print(\"Conversation tokenized:\", input_text)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "m3eNp9a0tFlt"
-   },
-   "source": [
-    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
-    "    <h2 style='margin: 0;color:blue'>Exercise: Process a dataset for SFT</h2>\n",
-    "    <p>Take a dataset from the Hugging Face hub and process it for SFT. </p>\n",
-    "    <p><b>Difficulty Levels</b></p>\n",
-    "    <p>🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.</p>\n",
-    "    <p>🐕 Convert the `openai/gsm8k` dataset into chatml format.</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/",
-     "height": 381
-    },
-    "id": "qbkXV2_ItFlt",
-    "outputId": "06deadc3-2c63-4660-d2bd-05096ef07c9f"
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<iframe\n",
-       "  src=\"https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0\"\n",
-       "  frameborder=\"0\"\n",
-       "  width=\"100%\"\n",
-       "  height=\"360px\"\n",
-       "></iframe>\n"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "from IPython.core.display import display, HTML\n",
-    "\n",
-    "display(\n",
-    "    HTML(\n",
-    "        \"\"\"<iframe\n",
-    "  src=\"https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0\"\n",
-    "  frameborder=\"0\"\n",
-    "  width=\"100%\"\n",
-    "  height=\"360px\"\n",
-    "></iframe>\n",
-    "\"\"\"\n",
-    "    )\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/",
-     "height": 241,
-     "referenced_widgets": [
-      "c2d74a42fb574b8892d0a288fd92f0a6",
-      "056b9ef5706843b19cd62fce75743afb",
-      "17b4d81e40564a53bb79be9fbef4918e",
-      "951f60cddcb84dfdbbdf2058369f0541",
-      "646484cf7a36444daebe1dfe4a0e4150",
-      "e2f0c39ce1c046e8acb150dfbfaf5aa8",
-      "7eb12d70d2b542a7b651c7680f590279",
-      "ea1f9cb22abf4e7d9f6e76fc86c03387",
-      "00c9f5ca71b84df4b26acee72c97fefb",
-      "505f96bc0c7843bcb1498ba1c1ba5f06",
-      "635cc2881a1e4b8788bb26c356740e04",
-      "a6ee323c13904525a99c6f092ba96e18",
-      "67fffe7d6f8c4963972b408529e05532",
-      "0055b6b628934affaf88bc58a1572bb6",
-      "aafbbb9fc5164fa3a88193bfd33d2f79",
-      "606e39d53ed64967a60337418c71c595",
-      "26b15fa18b1b4963a1ba76a76675e7ee",
-      "db09ab1f79db4f3a8de77f0348eca0f7",
-      "de04f344a8d4428e8ba1836a563d8aa1",
-      "03c09673186046d799d6f487d6623e6b",
-      "1cc682330b24431b8812c73041e987d0",
-      "dafe748a452148038779f6a62a22a4ec",
-      "addad1c100024c44a0959978153da9a8",
-      "9bea2a23db644ad19b708d10e35d54ee",
-      "d1174b127571420593971166fbb1966b",
-      "add90ed3746d4293a1b71198137a892c",
-      "8def25e6389f4e6192b517b6e80aa05e",
-      "c9747e7a810f413ba1ea108307e3ad1d",
-      "d0ea49d1d90f4d34bf2ae70efa96946e",
-      "59d0997b85614384bbfebeee928340b6",
-      "269920491c134501873e0110367bc984",
-      "384d26051c04460e8870a3ffe9406c48",
-      "8e8a0e89a50646c897e546c4077db79e",
-      "ff60308921f9432683acbcd6d29fb78f",
-      "3bc8f6339f4e4a3b961d810255c5573e",
-      "4780ad263ec04b1a97525d985e102049",
-      "488feef55878426bbf1c753c6d58735b",
-      "560ba45d70ca431dadeb327d234c330a",
-      "04d0a6f74af346f7bc696951949063c8",
-      "2a18ce941b0f4cef8307988ef898b47f",
-      "194e3fda3635466b998f96e3dc22746a",
-      "e2ab3cb38b5a41f68d18ed5f0e6ae22c",
-      "f0b271bcac6c43a9aaddac54259bb514",
-      "0dc93d50a283472f9ca64fd0a4c6ff15",
-      "dd1a50d4497144388a1809b78bb38f58",
-      "6b72a856e5bd4812a5e0dd0c3bfb8455",
-      "4e21a567d1f6461985727823b37166e1",
-      "ec1efb7598fd496bb170673ae1b8a1df",
-      "84f393468aa74baa903243d238b2d387",
-      "a54ce365be104d27aaa15cf8c63b5ebe",
-      "1791220377d141ac9b307246177d0712",
-      "fa330d4f0fb241aebd065f6ef4a6892c",
-      "cfa1cc6eed8a4f7791a7959308456b6b",
-      "b50c9c4433854cf7a6b2593e946b7faa",
-      "7557cd24ba9b4aa3955866d59db94519",
-      "cc608dfb880c49d4bc5acf2d691b8ec6",
-      "cb838c5bed994a9a8e6fcf5c98b76d17",
-      "76bbe8c2beba4c0594085d32a68d2ee7",
-      "c9836c952b07472880649b82e2347e8d",
-      "383db57f997140d482b82b123080837a",
-      "182abc7ec4d944d9bb2ec1281c98b4c8",
-      "6934c6d1cbac44dbb08f3fffe3056edb",
-      "05fa0f6eb78b4c56b219b0e57521bd2e",
-      "012aa94e3cf24e32833c6bbca23c52f7",
-      "76c1a1cdc9054bbe90d0d3b662cf0ed1",
-      "e453f1672772400a851735ba64f42c8b",
-      "d1358f6b16644cb3a2328ca639a4a77a",
-      "c19f60d4028045399c62004027eaafd9",
-      "8055588a1fa940239c801ef66f3ecf3b",
-      "7468a9bc8bda44e5b44574c64fdc6803",
-      "a13a8f8b702e44ed88c7d358a0a8b4b4",
-      "13367fbb763747fa8de94cde40ffae32",
-      "b1fcf477db664ccdade4096fb79de327",
-      "9d1c06ac6b774d82adca58773f389161",
-      "31910159cf30463b8246ec47ffd8ab5b",
-      "72220420f9d340eabec13a01caebc92c",
-      "55b14c03a41c495aacf8ac2d0f96ba0b"
-     ]
-    },
-    "id": "4p3atw4_tFlu",
-    "outputId": "62ee9812-3819-4a9c-9e24-5687368ffcd8"
-   },
-   "outputs": [],
-   "source": [
-    "from datasets import load_dataset\n",
-    "\n",
-    "ds = load_dataset(\"HuggingFaceTB/smoltalk\", \"everyday-conversations\")\n",
-    "\n",
-    "\n",
-    "def process_dataset(sample):\n",
-    "    # TODO: 🐢 Convert the sample into a chat format\n",
-    "    # use the tokenizer's method to apply the chat template\n",
-    "    return sample\n",
-    "\n",
-    "\n",
-    "ds = ds.map(process_dataset)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/",
-     "height": 381
-    },
-    "id": "81fQeazltFlu",
-    "outputId": "36cf7148-9881-4f13-d0ce-76c82c4ab219"
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<iframe\n",
-       "  src=\"https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train\"\n",
-       "  frameborder=\"0\"\n",
-       "  width=\"100%\"\n",
-       "  height=\"360px\"\n",
-       "></iframe>\n"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "display(\n",
-    "    HTML(\n",
-    "        \"\"\"<iframe\n",
-    "  src=\"https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train\"\n",
-    "  frameborder=\"0\"\n",
-    "  width=\"100%\"\n",
-    "  height=\"360px\"\n",
-    "></iframe>\n",
-    "\"\"\"\n",
-    "    )\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true,
-    "id": "bWUSv7NMtFlu"
-   },
-   "outputs": [],
-   "source": [
-    "ds = load_dataset(\"openai/gsm8k\", \"main\")\n",
-    "\n",
-    "\n",
-    "def process_dataset(sample):\n",
-    "    # TODO: 🐕 Convert the sample into a chat format\n",
-    "\n",
-    "    # 1. create a message format with the role and content\n",
-    "\n",
-    "    # 2. apply the chat template to the samples using the tokenizer's method\n",
-    "\n",
-    "    return sample\n",
-    "\n",
-    "\n",
-    "ds = ds.map(process_dataset)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "qlXCuRKotFlu"
-   },
-   "source": [
-    "## Conclusion\n",
-    "\n",
-    "This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.\n",
-    "\n",
-    "In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood."
-   ]
-  }
- ],
- "metadata": {
-  "colab": {
-   "provenance": []
-  },
-  "kernelspec": {
-   "display_name": ".venv",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.10"
-  },
-  "widgets": {
-   "application/vnd.jupyter.widget-state+json": {
-    "0055b6b628934affaf88bc58a1572bb6": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_de04f344a8d4428e8ba1836a563d8aa1",
-      "max": 946449,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_03c09673186046d799d6f487d6623e6b",
-      "value": 946449
-     }
-    },
-    "00c9f5ca71b84df4b26acee72c97fefb": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "012aa94e3cf24e32833c6bbca23c52f7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "016d5e929f1240cea067372b2191d107": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "01e0f8a799ad479eb95eef3e5a09bd70": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_8fe2df9a14a0436c9124a856ac7419e4",
-       "IPY_MODEL_d108e029e743419989e30f64f0c82b90",
-       "IPY_MODEL_bfd11f21f197459b8f27ef364bc9b264"
-      ],
-      "layout": "IPY_MODEL_76a0341ebe9f4c3face32460d7023be9"
-     }
-    },
-    "0206fb9662a349c1aa8a6d87ce01c388": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "03c09673186046d799d6f487d6623e6b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "0479fd3fc1ba476ab46f8c0a98f89468": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "04ae3f7b640c42f3a8eb1977cd1a585d": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "04d0a6f74af346f7bc696951949063c8": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "056b9ef5706843b19cd62fce75743afb": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_e2f0c39ce1c046e8acb150dfbfaf5aa8",
-      "placeholder": "​",
-      "style": "IPY_MODEL_7eb12d70d2b542a7b651c7680f590279",
-      "value": "README.md: 100%"
-     }
-    },
-    "05fa0f6eb78b4c56b219b0e57521bd2e": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "0942430d36de4677b4c2fa771d7bcd2a": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "0bab42beb845475684e9e71dd1591e1d": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "0c336ea5c653434da49e2f0e949f83d0": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "0dc93d50a283472f9ca64fd0a4c6ff15": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "10a0f37020d44156a11e9750778892e0": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "13367fbb763747fa8de94cde40ffae32": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "1513792bad534a0c9c381a131395c519": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_76d306c21214412ab44e542d82e547aa",
-      "placeholder": "​",
-      "style": "IPY_MODEL_b9e41ef9e9c54fa7b71bc333604af74e",
-      "value": " 831/831 [00:00&lt;00:00, 42.7kB/s]"
-     }
-    },
-    "17023310de9b4c3ebd8cc03758d59ef9": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "1791220377d141ac9b307246177d0712": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "17b4d81e40564a53bb79be9fbef4918e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_ea1f9cb22abf4e7d9f6e76fc86c03387",
-      "max": 9251,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_00c9f5ca71b84df4b26acee72c97fefb",
-      "value": 9251
-     }
-    },
-    "182abc7ec4d944d9bb2ec1281c98b4c8": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "194e3fda3635466b998f96e3dc22746a": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "1cc682330b24431b8812c73041e987d0": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "269920491c134501873e0110367bc984": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "26b15fa18b1b4963a1ba76a76675e7ee": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "26ed0f1bae204d74a313d101d9355e90": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_7612cc9b8908471b90c9118151d6e447",
-      "placeholder": "​",
-      "style": "IPY_MODEL_b687aca79e6e470b96254c5e309d6d63",
-      "value": "generation_config.json: 100%"
-     }
-    },
-    "2a18ce941b0f4cef8307988ef898b47f": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "31910159cf30463b8246ec47ffd8ab5b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "383db57f997140d482b82b123080837a": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "384d26051c04460e8870a3ffe9406c48": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "3b881514716c47308061fe85b810a6a4": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_26ed0f1bae204d74a313d101d9355e90",
-       "IPY_MODEL_4ff5af1784904bc9b85515105885e2d8",
-       "IPY_MODEL_b3c42d7e25d6494993029531adc3866d"
-      ],
-      "layout": "IPY_MODEL_6227b40396ea4024b3c8710c5e65601f"
-     }
-    },
-    "3bc8f6339f4e4a3b961d810255c5573e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_04d0a6f74af346f7bc696951949063c8",
-      "placeholder": "​",
-      "style": "IPY_MODEL_2a18ce941b0f4cef8307988ef898b47f",
-      "value": "Generating train split: 100%"
-     }
-    },
-    "3cc519fd92fe4b328943ec839115b63e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_e15fc503bb73476980cedb5f06b51ced",
-       "IPY_MODEL_d8c5dc8df3be4e65b2bbba020d29150f",
-       "IPY_MODEL_c0177c4ad18740d88acfc603ce4735f8"
-      ],
-      "layout": "IPY_MODEL_eb570fd159124e2cbd2df9335b3f9cd6"
-     }
-    },
-    "3fa18e3b50104af796bd0887f556224a": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "41a27cf0a91246599d4d1b7dae7c7863": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "45675fb5f5c94f8cae575582f7ae41a7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_7eb91920e4384194a008902d6c4a09c7",
-      "placeholder": "​",
-      "style": "IPY_MODEL_b379da78cb34463aa5a72eedc3d176cd",
-      "value": " 704/704 [00:00&lt;00:00, 36.5kB/s]"
-     }
-    },
-    "471b481a3e5b4d439ab31fdc49fc99c7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "4780ad263ec04b1a97525d985e102049": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_194e3fda3635466b998f96e3dc22746a",
-      "max": 2260,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_e2ab3cb38b5a41f68d18ed5f0e6ae22c",
-      "value": 2260
-     }
-    },
-    "48724ba7ba4e4f00923445245640739f": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "488feef55878426bbf1c753c6d58735b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_f0b271bcac6c43a9aaddac54259bb514",
-      "placeholder": "​",
-      "style": "IPY_MODEL_0dc93d50a283472f9ca64fd0a4c6ff15",
-      "value": " 2260/2260 [00:00&lt;00:00, 21556.99 examples/s]"
-     }
-    },
-    "4bfa3103048a47989a09a0d90ac6b9bf": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "4e21a567d1f6461985727823b37166e1": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_fa330d4f0fb241aebd065f6ef4a6892c",
-      "max": 119,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_cfa1cc6eed8a4f7791a7959308456b6b",
-      "value": 119
-     }
-    },
-    "4ff5af1784904bc9b85515105885e2d8": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_3fa18e3b50104af796bd0887f556224a",
-      "max": 111,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_4bfa3103048a47989a09a0d90ac6b9bf",
-      "value": 111
-     }
-    },
-    "505f96bc0c7843bcb1498ba1c1ba5f06": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "50dbf8861ca94b0ba1f4a7e2f0d8aead": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_d17c62b889754b5d88cfced5b18ff7a7",
-      "placeholder": "​",
-      "style": "IPY_MODEL_990f706db474450ba0997d1dbcd53cb7",
-      "value": " 269M/269M [00:06&lt;00:00, 43.2MB/s]"
-     }
-    },
-    "5291041c86db4933816088c047d659d8": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "530fc4c2bf1244628af7dea3e4b35cdf": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "547151540399460fb9a946bbe67afbd9": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "547eeb64ffd34e509c0b8b8ba6d657e2": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_cbc312cb858b48a5a0f8dbcf60b7e684",
-      "max": 704,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_f70401b6dba74380b19bd1ef887b3bf7",
-      "value": 704
-     }
-    },
-    "55b14c03a41c495aacf8ac2d0f96ba0b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "560ba45d70ca431dadeb327d234c330a": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "58fb913274b54a60a832513c09608a2f": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "59d0997b85614384bbfebeee928340b6": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "5b7b09d983844f7893bdda411f9a0076": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_0206fb9662a349c1aa8a6d87ce01c388",
-      "placeholder": "​",
-      "style": "IPY_MODEL_881b6196dfa0446e8c55a2420e484b6b",
-      "value": " 2.10M/2.10M [00:00&lt;00:00, 20.7MB/s]"
-     }
-    },
-    "5de5dab3d92f4f41838a8f302d27f0c3": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "606e39d53ed64967a60337418c71c595": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "6227b40396ea4024b3c8710c5e65601f": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "635cc2881a1e4b8788bb26c356740e04": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "646484cf7a36444daebe1dfe4a0e4150": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "648c3c820b39493daf0cce5f57a55467": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "67fffe7d6f8c4963972b408529e05532": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_26b15fa18b1b4963a1ba76a76675e7ee",
-      "placeholder": "​",
-      "style": "IPY_MODEL_db09ab1f79db4f3a8de77f0348eca0f7",
-      "value": "train-00000-of-00001.parquet: 100%"
-     }
-    },
-    "6934c6d1cbac44dbb08f3fffe3056edb": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "69f38fecf8ad403898634cfdfadf8925": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "6b72a856e5bd4812a5e0dd0c3bfb8455": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_a54ce365be104d27aaa15cf8c63b5ebe",
-      "placeholder": "​",
-      "style": "IPY_MODEL_1791220377d141ac9b307246177d0712",
-      "value": "Generating test split: 100%"
-     }
-    },
-    "6ceb292f2b8544f2a9a005d16d3e8978": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "70f0eaed6ef14c2db8aecb592edeb1ad": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "72220420f9d340eabec13a01caebc92c": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "745fb1db425e44e5b3a23b36ae7675d1": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "7468a9bc8bda44e5b44574c64fdc6803": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_72220420f9d340eabec13a01caebc92c",
-      "placeholder": "​",
-      "style": "IPY_MODEL_55b14c03a41c495aacf8ac2d0f96ba0b",
-      "value": " 119/119 [00:00&lt;00:00, 2302.28 examples/s]"
-     }
-    },
-    "7557cd24ba9b4aa3955866d59db94519": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "7612cc9b8908471b90c9118151d6e447": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "76a0341ebe9f4c3face32460d7023be9": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "76bbe8c2beba4c0594085d32a68d2ee7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_05fa0f6eb78b4c56b219b0e57521bd2e",
-      "max": 2260,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_012aa94e3cf24e32833c6bbca23c52f7",
-      "value": 2260
-     }
-    },
-    "76c1a1cdc9054bbe90d0d3b662cf0ed1": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "76d306c21214412ab44e542d82e547aa": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "76febcd912404a58add3a39f80a8218d": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_0bab42beb845475684e9e71dd1591e1d",
-      "max": 3658,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_89ecd1b28ab64c90afe3b9736fd48306",
-      "value": 3658
-     }
-    },
-    "77d3d81687e6417ab988b04984fc68f4": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_17023310de9b4c3ebd8cc03758d59ef9",
-      "placeholder": "​",
-      "style": "IPY_MODEL_f3e23f781bce4429954d76bfea97aff4",
-      "value": "special_tokens_map.json: 100%"
-     }
-    },
-    "77f6c27c3c854138b4aa9789637141a1": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "7a0c705334694da6b750104b28db6dba": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "7b20c7c8f6be40c6815b8531ecb9c936": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_d32017fa83aa44f6b2e3443a602654be",
-      "placeholder": "​",
-      "style": "IPY_MODEL_ff8debfb713f4b88be6b9b3bf33bfca2",
-      "value": "tokenizer.json: 100%"
-     }
-    },
-    "7eb12d70d2b542a7b651c7680f590279": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "7eb91920e4384194a008902d6c4a09c7": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "8055588a1fa940239c801ef66f3ecf3b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_9d1c06ac6b774d82adca58773f389161",
-      "max": 119,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_31910159cf30463b8246ec47ffd8ab5b",
-      "value": 119
-     }
-    },
-    "84f393468aa74baa903243d238b2d387": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "85de66e1ee3140cf85eadebe5fea1e9f": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "881b6196dfa0446e8c55a2420e484b6b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "89ecd1b28ab64c90afe3b9736fd48306": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "8def25e6389f4e6192b517b6e80aa05e": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "8e8a0e89a50646c897e546c4077db79e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "8fe2df9a14a0436c9124a856ac7419e4": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_da1a999fb5af4eae9f6a9d1086cbb4cf",
-      "placeholder": "​",
-      "style": "IPY_MODEL_77f6c27c3c854138b4aa9789637141a1",
-      "value": "vocab.json: 100%"
-     }
-    },
-    "951f60cddcb84dfdbbdf2058369f0541": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_505f96bc0c7843bcb1498ba1c1ba5f06",
-      "placeholder": "​",
-      "style": "IPY_MODEL_635cc2881a1e4b8788bb26c356740e04",
-      "value": " 9.25k/9.25k [00:00&lt;00:00, 428kB/s]"
-     }
-    },
-    "96c2aae9198441569362135ad4bcbc98": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "990f706db474450ba0997d1dbcd53cb7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "9bea2a23db644ad19b708d10e35d54ee": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_c9747e7a810f413ba1ea108307e3ad1d",
-      "placeholder": "​",
-      "style": "IPY_MODEL_d0ea49d1d90f4d34bf2ae70efa96946e",
-      "value": "test-00000-of-00001.parquet: 100%"
-     }
-    },
-    "9d1c06ac6b774d82adca58773f389161": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "a026a32dd6d646bea82c1ebb06147d89": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "a13a8f8b702e44ed88c7d358a0a8b4b4": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "a54ce365be104d27aaa15cf8c63b5ebe": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "a6ee323c13904525a99c6f092ba96e18": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_67fffe7d6f8c4963972b408529e05532",
-       "IPY_MODEL_0055b6b628934affaf88bc58a1572bb6",
-       "IPY_MODEL_aafbbb9fc5164fa3a88193bfd33d2f79"
-      ],
-      "layout": "IPY_MODEL_606e39d53ed64967a60337418c71c595"
-     }
-    },
-    "aa2d32cb76ba47ebaa5ea391efbf58a7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_7b20c7c8f6be40c6815b8531ecb9c936",
-       "IPY_MODEL_e90b58981bd34d0e8f975fc1a9658c4c",
-       "IPY_MODEL_5b7b09d983844f7893bdda411f9a0076"
-      ],
-      "layout": "IPY_MODEL_70f0eaed6ef14c2db8aecb592edeb1ad"
-     }
-    },
-    "aafbbb9fc5164fa3a88193bfd33d2f79": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_1cc682330b24431b8812c73041e987d0",
-      "placeholder": "​",
-      "style": "IPY_MODEL_dafe748a452148038779f6a62a22a4ec",
-      "value": " 946k/946k [00:00&lt;00:00, 28.7MB/s]"
-     }
-    },
-    "add90ed3746d4293a1b71198137a892c": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_384d26051c04460e8870a3ffe9406c48",
-      "placeholder": "​",
-      "style": "IPY_MODEL_8e8a0e89a50646c897e546c4077db79e",
-      "value": " 52.6k/52.6k [00:00&lt;00:00, 2.34MB/s]"
-     }
-    },
-    "addad1c100024c44a0959978153da9a8": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_9bea2a23db644ad19b708d10e35d54ee",
-       "IPY_MODEL_d1174b127571420593971166fbb1966b",
-       "IPY_MODEL_add90ed3746d4293a1b71198137a892c"
-      ],
-      "layout": "IPY_MODEL_8def25e6389f4e6192b517b6e80aa05e"
-     }
-    },
-    "ae2690497e024095adb3879643cffd33": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_f600aa1fe4094133888ec9a2504a60eb",
-       "IPY_MODEL_efe9a9fcebfe441b80075fbfe9c32674",
-       "IPY_MODEL_50dbf8861ca94b0ba1f4a7e2f0d8aead"
-      ],
-      "layout": "IPY_MODEL_547151540399460fb9a946bbe67afbd9"
-     }
-    },
-    "b1fcf477db664ccdade4096fb79de327": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "b31de9bcf83e4070be09c7d663361232": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "b379da78cb34463aa5a72eedc3d176cd": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "b3c42d7e25d6494993029531adc3866d": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_85de66e1ee3140cf85eadebe5fea1e9f",
-      "placeholder": "​",
-      "style": "IPY_MODEL_b31de9bcf83e4070be09c7d663361232",
-      "value": " 111/111 [00:00&lt;00:00, 3.57kB/s]"
-     }
-    },
-    "b50c9c4433854cf7a6b2593e946b7faa": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "b687aca79e6e470b96254c5e309d6d63": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "b922b90106414644bc0e933f28dea1bf": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_e0a40f83ae2e4ab29376a1d48b53aa6e",
-       "IPY_MODEL_547eeb64ffd34e509c0b8b8ba6d657e2",
-       "IPY_MODEL_45675fb5f5c94f8cae575582f7ae41a7"
-      ],
-      "layout": "IPY_MODEL_016d5e929f1240cea067372b2191d107"
-     }
-    },
-    "b9e41ef9e9c54fa7b71bc333604af74e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "bde95b39561145548fc81fb4cc94a1bf": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "be4e145938054f13a510fe4d04a7a60d": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "bfd11f21f197459b8f27ef364bc9b264": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_745fb1db425e44e5b3a23b36ae7675d1",
-      "placeholder": "​",
-      "style": "IPY_MODEL_bde95b39561145548fc81fb4cc94a1bf",
-      "value": " 801k/801k [00:00&lt;00:00, 5.92MB/s]"
-     }
-    },
-    "c0177c4ad18740d88acfc603ce4735f8": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_ec15d99b3a604405a2b4707931d4bf44",
-      "placeholder": "​",
-      "style": "IPY_MODEL_e7f5d507d9564941bb7db742b4bf01c7",
-      "value": " 466k/466k [00:00&lt;00:00, 3.56MB/s]"
-     }
-    },
-    "c19f60d4028045399c62004027eaafd9": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_13367fbb763747fa8de94cde40ffae32",
-      "placeholder": "​",
-      "style": "IPY_MODEL_b1fcf477db664ccdade4096fb79de327",
-      "value": "Map: 100%"
-     }
-    },
-    "c2d74a42fb574b8892d0a288fd92f0a6": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_056b9ef5706843b19cd62fce75743afb",
-       "IPY_MODEL_17b4d81e40564a53bb79be9fbef4918e",
-       "IPY_MODEL_951f60cddcb84dfdbbdf2058369f0541"
-      ],
-      "layout": "IPY_MODEL_646484cf7a36444daebe1dfe4a0e4150"
-     }
-    },
-    "c9747e7a810f413ba1ea108307e3ad1d": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "c9836c952b07472880649b82e2347e8d": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_76c1a1cdc9054bbe90d0d3b662cf0ed1",
-      "placeholder": "​",
-      "style": "IPY_MODEL_e453f1672772400a851735ba64f42c8b",
-      "value": " 2260/2260 [00:00&lt;00:00, 10845.53 examples/s]"
-     }
-    },
-    "cb838c5bed994a9a8e6fcf5c98b76d17": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_182abc7ec4d944d9bb2ec1281c98b4c8",
-      "placeholder": "​",
-      "style": "IPY_MODEL_6934c6d1cbac44dbb08f3fffe3056edb",
-      "value": "Map: 100%"
-     }
-    },
-    "cbc312cb858b48a5a0f8dbcf60b7e684": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "cc608dfb880c49d4bc5acf2d691b8ec6": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_cb838c5bed994a9a8e6fcf5c98b76d17",
-       "IPY_MODEL_76bbe8c2beba4c0594085d32a68d2ee7",
-       "IPY_MODEL_c9836c952b07472880649b82e2347e8d"
-      ],
-      "layout": "IPY_MODEL_383db57f997140d482b82b123080837a"
-     }
-    },
-    "cfa1cc6eed8a4f7791a7959308456b6b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "d0ea49d1d90f4d34bf2ae70efa96946e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "d108e029e743419989e30f64f0c82b90": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_6ceb292f2b8544f2a9a005d16d3e8978",
-      "max": 800662,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_41a27cf0a91246599d4d1b7dae7c7863",
-      "value": 800662
-     }
-    },
-    "d1174b127571420593971166fbb1966b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_59d0997b85614384bbfebeee928340b6",
-      "max": 52603,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_269920491c134501873e0110367bc984",
-      "value": 52603
-     }
-    },
-    "d1358f6b16644cb3a2328ca639a4a77a": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_c19f60d4028045399c62004027eaafd9",
-       "IPY_MODEL_8055588a1fa940239c801ef66f3ecf3b",
-       "IPY_MODEL_7468a9bc8bda44e5b44574c64fdc6803"
-      ],
-      "layout": "IPY_MODEL_a13a8f8b702e44ed88c7d358a0a8b4b4"
-     }
-    },
-    "d17c62b889754b5d88cfced5b18ff7a7": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "d32017fa83aa44f6b2e3443a602654be": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "d43410dfcc8c4bebb8672f10ed2aeb66": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "d54fb2da9f1f4a89ae962b8816314f43": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_77d3d81687e6417ab988b04984fc68f4",
-       "IPY_MODEL_fbce0a69847e4099a55d1e39d4118c91",
-       "IPY_MODEL_1513792bad534a0c9c381a131395c519"
-      ],
-      "layout": "IPY_MODEL_69f38fecf8ad403898634cfdfadf8925"
-     }
-    },
-    "d64d50101891491f96ff80162dc6d26c": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_d65ec0f0dc0b44e0869c6159e6e82ad6",
-       "IPY_MODEL_76febcd912404a58add3a39f80a8218d",
-       "IPY_MODEL_f4ea276bdc0d4da2a04b46e3f1ed95b5"
-      ],
-      "layout": "IPY_MODEL_0942430d36de4677b4c2fa771d7bcd2a"
-     }
-    },
-    "d65ec0f0dc0b44e0869c6159e6e82ad6": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_10a0f37020d44156a11e9750778892e0",
-      "placeholder": "​",
-      "style": "IPY_MODEL_58fb913274b54a60a832513c09608a2f",
-      "value": "tokenizer_config.json: 100%"
-     }
-    },
-    "d8c5dc8df3be4e65b2bbba020d29150f": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_7a0c705334694da6b750104b28db6dba",
-      "max": 466391,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_0c336ea5c653434da49e2f0e949f83d0",
-      "value": 466391
-     }
-    },
-    "da1a999fb5af4eae9f6a9d1086cbb4cf": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "dafe748a452148038779f6a62a22a4ec": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "db09ab1f79db4f3a8de77f0348eca0f7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "db3bd55d779947028f36a8b24a2621b6": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "dd1a50d4497144388a1809b78bb38f58": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_6b72a856e5bd4812a5e0dd0c3bfb8455",
-       "IPY_MODEL_4e21a567d1f6461985727823b37166e1",
-       "IPY_MODEL_ec1efb7598fd496bb170673ae1b8a1df"
-      ],
-      "layout": "IPY_MODEL_84f393468aa74baa903243d238b2d387"
-     }
-    },
-    "de04f344a8d4428e8ba1836a563d8aa1": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "e0a40f83ae2e4ab29376a1d48b53aa6e": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_a026a32dd6d646bea82c1ebb06147d89",
-      "placeholder": "​",
-      "style": "IPY_MODEL_0479fd3fc1ba476ab46f8c0a98f89468",
-      "value": "config.json: 100%"
-     }
-    },
-    "e15fc503bb73476980cedb5f06b51ced": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_5de5dab3d92f4f41838a8f302d27f0c3",
-      "placeholder": "​",
-      "style": "IPY_MODEL_471b481a3e5b4d439ab31fdc49fc99c7",
-      "value": "merges.txt: 100%"
-     }
-    },
-    "e2ab3cb38b5a41f68d18ed5f0e6ae22c": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "e2f0c39ce1c046e8acb150dfbfaf5aa8": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "e453f1672772400a851735ba64f42c8b": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "e7f5d507d9564941bb7db742b4bf01c7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "e90b58981bd34d0e8f975fc1a9658c4c": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_ed577dea3ac54884a637ad775b42bc68",
-      "max": 2104556,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_d43410dfcc8c4bebb8672f10ed2aeb66",
-      "value": 2104556
-     }
-    },
-    "ea1f9cb22abf4e7d9f6e76fc86c03387": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "eb570fd159124e2cbd2df9335b3f9cd6": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "ec15d99b3a604405a2b4707931d4bf44": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "ec1efb7598fd496bb170673ae1b8a1df": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_b50c9c4433854cf7a6b2593e946b7faa",
-      "placeholder": "​",
-      "style": "IPY_MODEL_7557cd24ba9b4aa3955866d59db94519",
-      "value": " 119/119 [00:00&lt;00:00, 3547.77 examples/s]"
-     }
-    },
-    "ed577dea3ac54884a637ad775b42bc68": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "efe9a9fcebfe441b80075fbfe9c32674": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_04ae3f7b640c42f3a8eb1977cd1a585d",
-      "max": 269060552,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_db3bd55d779947028f36a8b24a2621b6",
-      "value": 269060552
-     }
-    },
-    "f0b271bcac6c43a9aaddac54259bb514": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "f3e23f781bce4429954d76bfea97aff4": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    },
-    "f4ea276bdc0d4da2a04b46e3f1ed95b5": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_be4e145938054f13a510fe4d04a7a60d",
-      "placeholder": "​",
-      "style": "IPY_MODEL_648c3c820b39493daf0cce5f57a55467",
-      "value": " 3.66k/3.66k [00:00&lt;00:00, 197kB/s]"
-     }
-    },
-    "f600aa1fe4094133888ec9a2504a60eb": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HTMLModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HTMLModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HTMLView",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_5291041c86db4933816088c047d659d8",
-      "placeholder": "​",
-      "style": "IPY_MODEL_48724ba7ba4e4f00923445245640739f",
-      "value": "model.safetensors: 100%"
-     }
-    },
-    "f70401b6dba74380b19bd1ef887b3bf7": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "ProgressStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "ProgressStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "bar_color": null,
-      "description_width": ""
-     }
-    },
-    "fa330d4f0fb241aebd065f6ef4a6892c": {
-     "model_module": "@jupyter-widgets/base",
-     "model_module_version": "1.2.0",
-     "model_name": "LayoutModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/base",
-      "_model_module_version": "1.2.0",
-      "_model_name": "LayoutModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "LayoutView",
-      "align_content": null,
-      "align_items": null,
-      "align_self": null,
-      "border": null,
-      "bottom": null,
-      "display": null,
-      "flex": null,
-      "flex_flow": null,
-      "grid_area": null,
-      "grid_auto_columns": null,
-      "grid_auto_flow": null,
-      "grid_auto_rows": null,
-      "grid_column": null,
-      "grid_gap": null,
-      "grid_row": null,
-      "grid_template_areas": null,
-      "grid_template_columns": null,
-      "grid_template_rows": null,
-      "height": null,
-      "justify_content": null,
-      "justify_items": null,
-      "left": null,
-      "margin": null,
-      "max_height": null,
-      "max_width": null,
-      "min_height": null,
-      "min_width": null,
-      "object_fit": null,
-      "object_position": null,
-      "order": null,
-      "overflow": null,
-      "overflow_x": null,
-      "overflow_y": null,
-      "padding": null,
-      "right": null,
-      "top": null,
-      "visibility": null,
-      "width": null
-     }
-    },
-    "fbce0a69847e4099a55d1e39d4118c91": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "FloatProgressModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "FloatProgressModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "ProgressView",
-      "bar_style": "success",
-      "description": "",
-      "description_tooltip": null,
-      "layout": "IPY_MODEL_530fc4c2bf1244628af7dea3e4b35cdf",
-      "max": 831,
-      "min": 0,
-      "orientation": "horizontal",
-      "style": "IPY_MODEL_96c2aae9198441569362135ad4bcbc98",
-      "value": 831
-     }
-    },
-    "ff60308921f9432683acbcd6d29fb78f": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "HBoxModel",
-     "state": {
-      "_dom_classes": [],
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "HBoxModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/controls",
-      "_view_module_version": "1.5.0",
-      "_view_name": "HBoxView",
-      "box_style": "",
-      "children": [
-       "IPY_MODEL_3bc8f6339f4e4a3b961d810255c5573e",
-       "IPY_MODEL_4780ad263ec04b1a97525d985e102049",
-       "IPY_MODEL_488feef55878426bbf1c753c6d58735b"
-      ],
-      "layout": "IPY_MODEL_560ba45d70ca431dadeb327d234c330a"
-     }
-    },
-    "ff8debfb713f4b88be6b9b3bf33bfca2": {
-     "model_module": "@jupyter-widgets/controls",
-     "model_module_version": "1.5.0",
-     "model_name": "DescriptionStyleModel",
-     "state": {
-      "_model_module": "@jupyter-widgets/controls",
-      "_model_module_version": "1.5.0",
-      "_model_name": "DescriptionStyleModel",
-      "_view_count": null,
-      "_view_module": "@jupyter-widgets/base",
-      "_view_module_version": "1.2.0",
-      "_view_name": "StyleView",
-      "description_width": ""
-     }
-    }
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/chapters/en/chapter11/sft_finetuning_example.ipynb b/chapters/en/chapter11/sft_finetuning_example.ipynb
deleted file mode 100644
index d18479a91..000000000
--- a/chapters/en/chapter11/sft_finetuning_example.ipynb
+++ /dev/null
@@ -1,273 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Supervised Fine-Tuning with SFTTrainer\n",
-    "\n",
-    "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n",
-    "\n",
-    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
-    "    <h2 style='margin: 0;color:blue'>Exercise: Fine-Tuning SmolLM2 with SFTTrainer</h2>\n",
-    "    <p>Take a dataset from the Hugging Face hub and finetune a model on it. </p> \n",
-    "    <p><b>Difficulty Levels</b></p>\n",
-    "    <p>🐢 Use the `HuggingFaceTB/smoltalk` dataset</p>\n",
-    "    <p>🐕 Try out the `bigcode/the-stack-smol` dataset and finetune a code generation model on a specific subset `data/python`.</p>\n",
-    "    <p>🦁 Select a dataset that relates to a real world use case your interested in</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Install the requirements in Google Colab\n",
-    "# !pip install transformers datasets trl huggingface_hub\n",
-    "\n",
-    "# Authenticate to Hugging Face\n",
-    "\n",
-    "from huggingface_hub import login\n",
-    "login()\n",
-    "\n",
-    "# for convenience you can create an environment variable containing your hub token as HF_TOKEN"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Import necessary libraries\n",
-    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
-    "from datasets import load_dataset\n",
-    "from trl import SFTConfig, SFTTrainer, setup_chat_format\n",
-    "import torch\n",
-    "\n",
-    "device = (\n",
-    "    \"cuda\"\n",
-    "    if torch.cuda.is_available()\n",
-    "    else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n",
-    ")\n",
-    "\n",
-    "# Load the model and tokenizer\n",
-    "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n",
-    "model = AutoModelForCausalLM.from_pretrained(\n",
-    "    pretrained_model_name_or_path=model_name\n",
-    ").to(device)\n",
-    "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n",
-    "\n",
-    "# Set up the chat format\n",
-    "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)\n",
-    "\n",
-    "# Set our name for the finetune to be saved &/ uploaded to\n",
-    "finetune_name = \"SmolLM2-FT-MyDataset\"\n",
-    "finetune_tags = [\"smol-course\", \"module_1\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Generate with the base model\n",
-    "\n",
-    "Here we will try out the base model which does not have a chat template. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Let's test the base model before training\n",
-    "prompt = \"Write a haiku about programming\"\n",
-    "\n",
-    "# Format with template\n",
-    "messages = [{\"role\": \"user\", \"content\": prompt}]\n",
-    "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n",
-    "\n",
-    "# Generate response\n",
-    "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n",
-    "outputs = model.generate(**inputs, max_new_tokens=100)\n",
-    "print(\"Before training:\")\n",
-    "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Dataset Preparation\n",
-    "\n",
-    "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n",
-    "\n",
-    "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Load a sample dataset\n",
-    "from datasets import load_dataset\n",
-    "\n",
-    "# TODO: define your dataset and config using the path and name parameters\n",
-    "ds = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. Refer to the [module](../chat_templates.md)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Configuring the SFTTrainer\n",
-    "\n",
-    "The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Configure the SFTTrainer\n",
-    "sft_config = SFTConfig(\n",
-    "    output_dir=\"./sft_output\",\n",
-    "    max_steps=1000,  # Adjust based on dataset size and desired training duration\n",
-    "    per_device_train_batch_size=4,  # Set according to your GPU memory capacity\n",
-    "    learning_rate=5e-5,  # Common starting point for fine-tuning\n",
-    "    logging_steps=10,  # Frequency of logging training metrics\n",
-    "    save_steps=100,  # Frequency of saving model checkpoints\n",
-    "    evaluation_strategy=\"steps\",  # Evaluate the model at regular intervals\n",
-    "    eval_steps=50,  # Frequency of evaluation\n",
-    "    use_mps_device=(\n",
-    "        True if device == \"mps\" else False\n",
-    "    ),  # Use MPS for mixed precision training\n",
-    "    hub_model_id=finetune_name,  # Set a unique name for your model\n",
-    ")\n",
-    "\n",
-    "# Initialize the SFTTrainer\n",
-    "trainer = SFTTrainer(\n",
-    "    model=model,\n",
-    "    args=sft_config,\n",
-    "    train_dataset=ds[\"train\"],\n",
-    "    tokenizer=tokenizer,\n",
-    "    eval_dataset=ds[\"test\"],\n",
-    ")\n",
-    "\n",
-    "# TODO: 🦁 🐕 align the SFTTrainer params with your chosen dataset. For example, if you are using the `bigcode/the-stack-smol` dataset, you will need to choose the `content` column`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Training the Model\n",
-    "\n",
-    "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Train the model\n",
-    "trainer.train()\n",
-    "\n",
-    "# Save the model\n",
-    "trainer.save_model(f\"./{finetune_name}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "trainer.push_to_hub(tags=finetune_tags)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>\n",
-    "    <h2 style='margin: 0;color:blue'>Bonus Exercise: Generate with fine-tuned model</h2>\n",
-    "    <p>🐕 Use the fine-tuned to model generate a response, just like with the base example..</p>\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Test the fine-tuned model on the same prompt\n",
-    "\n",
-    "# Let's test the base model before training\n",
-    "prompt = \"Write a haiku about programming\"\n",
-    "\n",
-    "# Format with template\n",
-    "messages = [{\"role\": \"user\", \"content\": prompt}]\n",
-    "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n",
-    "\n",
-    "# Generate response\n",
-    "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n",
-    "\n",
-    "# TODO: use the fine-tuned to model generate a response, just like with the base example."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 💐 You're done!\n",
-    "\n",
-    "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n",
-    "\n",
-    "- Try this notebook on a harder difficulty\n",
-    "- Review a colleagues PR\n",
-    "- Improve the course material via an Issue or PR."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "py310",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.15"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/chapters/en/chapter11/supervised_fine_tuning.md b/chapters/en/chapter11/supervised_fine_tuning.md
deleted file mode 100644
index dc236962e..000000000
--- a/chapters/en/chapter11/supervised_fine_tuning.md
+++ /dev/null
@@ -1,41 +0,0 @@
-# Supervised Fine-Tuning
-
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on carefully curated datasets with human-validated examples.
-
-## Understanding Supervised Fine-Tuning
-
-At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case.
-
-SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
-
-## When to Use Supervised Fine-Tuning
-
-The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
-
-For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise.
-
-## The Fine-Tuning Process
-
-The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. 
-
-First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. 
-
-Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. In [module 4](../4_evaluation), we'll cover how to evaluate your model.
-
-## The Role of SFT in Preference Alignment
-
-SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations.
-
-## Supervised Fine-Tuning With Transformer Reinforcement Learning
-
-A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models models using reinforcement learning (RL).
-
-Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo.
-
-# Next Steps
-
-Try out the following tutorials to get hands on experience with SFT using TRL:
-
-⏭️ [Chat Templates Tutorial](./notebooks/chat_templates_example.ipynb)
-
-⏭️ [Supervised Fine-Tuning Tutorial](./notebooks/sft_finetuning_example.ipynb)

From beec8b5be758e5d6702aeb03f23345119815ecf7 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Thu, 30 Jan 2025 15:30:15 +0100
Subject: [PATCH 03/30] review text and read through

---
 chapters/en/chapter11/1.mdx |  1 -
 chapters/en/chapter11/5.mdx | 15 +++++++++++----
 chapters/en/chapter11/6.mdx |  8 ++------
 chapters/en/chapter11/7.mdx |  2 +-
 chapters/en/chapter11/8.mdx |  4 +++-
 chapters/en/chapter11/9.mdx |  5 ++++-
 6 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
index 5acfb97fd..b5e75efc6 100644
--- a/chapters/en/chapter11/1.mdx
+++ b/chapters/en/chapter11/1.mdx
@@ -14,7 +14,6 @@ Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained lang
 
 Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge.
 
-
 ## 4️⃣ Evaluation
 
 Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset.
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index d19d3f5c5..b63c2c38d 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -1,10 +1,10 @@
 # Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
 
-The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Let's work through the process step by step.
+In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
 
 ## Dataset Preparation
 
-Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
 
 <iframe
   src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
@@ -43,9 +43,9 @@ Throughout the process, continuous evaluation is essential. You'll want to monit
 
 ## `SFTTrainer` from Transformer Reinforcement Learning
 
-Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO).
+Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
 
-Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model.
+Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
 
 ```python
 from datasets import load_dataset
@@ -66,6 +66,13 @@ trainer = SFTTrainer(
 trainer.train()
 ```
 
+Just like in `transformers`, we work through the following steps:
+
+1. Load the dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
 <Tip>
 
 ✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx
index a81bfe762..f06c2d4df 100644
--- a/chapters/en/chapter11/6.mdx
+++ b/chapters/en/chapter11/6.mdx
@@ -1,6 +1,6 @@
 # Supervised Fine-Tuning with SFTTrainer
 
-This page demonstrates how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.
+In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
 
 ## Load the base model
 
@@ -28,15 +28,11 @@ tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_na
 
 # Set up the chat format
 model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
-
-# Set our name for the finetune to be saved &/ uploaded to
-finetune_name = "SmolLM2-FT-MyDataset"
-finetune_tags = ["smol-course", "module_1"]
 ```
 
 ## Generate with the base model
 
-First we will try out the base model which does not have a chat template. 
+First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
 
 ```python
 # Let's test the base model before training
diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
index a47bd1f42..820d689a6 100644
--- a/chapters/en/chapter11/7.mdx
+++ b/chapters/en/chapter11/7.mdx
@@ -1,6 +1,6 @@
 # LoRA (Low-Rank Adaptation)
 
-Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. 
+Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
 
 ## Understanding LoRA
 
diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx
index acc3fdabf..0f2b0cef5 100644
--- a/chapters/en/chapter11/8.mdx
+++ b/chapters/en/chapter11/8.mdx
@@ -2,7 +2,9 @@
 
 After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
 
-The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will help with automatic memory management. Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. Before deploying, always validate the merged model by comparing its outputs and performance metrics with the adapter-based version.
+The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. 
+
+Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. 
 
 ## Merging Implementation
 
diff --git a/chapters/en/chapter11/9.mdx b/chapters/en/chapter11/9.mdx
index df0b28765..ab88e9d71 100644
--- a/chapters/en/chapter11/9.mdx
+++ b/chapters/en/chapter11/9.mdx
@@ -20,6 +20,7 @@ MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjec
 BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
 
 ## Language Understanding
+
 HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
 
 ## Alternative Evaluation Approaches
@@ -27,12 +28,15 @@ HELM provides a holistic evaluation framework, while WinoGrande tests common sen
 Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
 
 ### LLM-as-Judge
+
 Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
 
 ### Evaluation Arenas
+
 Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
 
 ### Custom Benchmark Suites
+
 Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
 
 ## Creating Your Own Evaluation Strategy
@@ -182,4 +186,3 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only
 		}
 	]}
 />
-

From 4cb5f936c89447eb9e03f478fa96e6ffcf40027d Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 20:18:48 +0100
Subject: [PATCH 04/30] add links to colab

---
 chapters/en/chapter11/3.mdx | 6 ++++++
 chapters/en/chapter11/6.mdx | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index baa3972c1..cf8cc90b1 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -1,5 +1,11 @@
 # Implementation with Transformers
 
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"},
+]} />
+
 Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages.
 
 ```python
diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx
index f06c2d4df..cb103fd6a 100644
--- a/chapters/en/chapter11/6.mdx
+++ b/chapters/en/chapter11/6.mdx
@@ -1,5 +1,11 @@
 # Supervised Fine-Tuning with SFTTrainer
 
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
+]} />
+
 In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
 
 ## Load the base model

From f7fc25d3a0a228d41afab83ff601d9682f8d54e4 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 20:18:57 +0100
Subject: [PATCH 05/30] add quiz app

---
 chapters/en/chapter11/12.mdx | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
 create mode 100644 chapters/en/chapter11/12.mdx

diff --git a/chapters/en/chapter11/12.mdx b/chapters/en/chapter11/12.mdx
new file mode 100644
index 000000000..690547c5f
--- /dev/null
+++ b/chapters/en/chapter11/12.mdx
@@ -0,0 +1,16 @@
+# Exam Time!
+
+It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter.
+
+To take the quiz, you will need to follow these steps:
+
+1. Sign in to your Hugging Face account.
+2. Answer the questions in the quiz.
+3. Submit your answers.
+
+<iframe
+	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>

From 564d9ecfcd7dc3670258572289bb077938cca5c4 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 20:24:06 +0100
Subject: [PATCH 06/30] add toc

---
 chapters/en/_toctree.yml | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
index 12b6c3726..cefeda057 100644
--- a/chapters/en/_toctree.yml
+++ b/chapters/en/_toctree.yml
@@ -210,6 +210,34 @@
     title: End-of-chapter quiz
     quiz: 10
 
+- title: 11. Supervised fine-tuning
+  sections:
+  - local: chapter11/1
+    title: Introduction
+  - local: chapter11/2
+    title: Chat templates
+  - local: chapter11/3
+    title: Implementing Chat Templates with Transformers
+  - local: chapter11/4
+    title: Introduction to  Supervised Fine-Tuning
+  - local: chapter11/5
+    title: Introduction to SFTTrainer in TRL
+  - local: chapter11/6
+    title: Fine-Tuning a Model with SFTTrainer
+  - local: chapter11/7
+    title: LoRA (Low-Rank Adaptation)
+  - local: chapter11/8
+    title: Merging LoRA Adapters
+  - local: chapter11/9
+    title: Evaluating Fine-Tuned Models
+  - local: chapter11/10
+    title: Implementing Evaluation
+  - local: chapter11/11
+    title: Conclusion
+  - local: chapter11/12
+    title: Exam Time!
+    quiz: 11
+
 - title: Course Events
   sections:
   - local: events/1

From edcf0490dd0a9bbcb053ffac389327ec66b78de0 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 20:28:38 +0100
Subject: [PATCH 07/30] format code blocks

---
 chapters/en/chapter11/2.mdx | 17 +++++++++++++----
 chapters/en/chapter11/3.mdx |  4 +---
 chapters/en/chapter11/7.mdx |  1 -
 chapters/en/chapter11/8.mdx |  8 ++------
 4 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 87a82f026..d4582348a 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -28,9 +28,15 @@ The `transformers` library will take care of chat templates for you in relation
 
 ```python
 messages = [
-    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
+    {
+        "role": "system",
+        "content": "You are a helpful assistant focused on technical topics.",
+    },
     {"role": "user", "content": "Can you explain what a chat template is?"},
-    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."}
+    {
+        "role": "assistant",
+        "content": "A chat template structures conversations between users and AI models...",
+    },
 ]
 ```
 
@@ -43,7 +49,7 @@ System messages set the foundation for how the model should behave. They act as
 ```python
 system_message = {
     "role": "system",
-    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
+    "content": "You are a professional customer service agent. Always be polite, clear, and helpful.",
 }
 ```
 
@@ -54,7 +60,10 @@ Chat templates can maintain context through conversation history, storing previo
 ```python
 conversation = [
     {"role": "user", "content": "I need help with my order"},
-    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
+    {
+        "role": "assistant",
+        "content": "I'd be happy to help. Could you provide your order number?",
+    },
     {"role": "user", "content": "It's ORDER-123"},
 ]
 ```
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index cf8cc90b1..748bd74a1 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -20,9 +20,7 @@ messages = [
 
 # Apply the chat template
 formatted_chat = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
+    messages, tokenize=False, add_generation_prompt=True
 )
 ```
 
diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
index 820d689a6..561acf61e 100644
--- a/chapters/en/chapter11/7.mdx
+++ b/chapters/en/chapter11/7.mdx
@@ -103,7 +103,6 @@ trainer = SFTTrainer(
     peft_config=lora_config,  # LoRA configuration
     max_seq_length=max_seq_length,  # Maximum sequence length
     tokenizer=tokenizer,
-
 )
 ```
 
diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx
index 0f2b0cef5..323a3b646 100644
--- a/chapters/en/chapter11/8.mdx
+++ b/chapters/en/chapter11/8.mdx
@@ -17,16 +17,12 @@ from peft import PeftModel
 
 # 1. Load the base model
 base_model = AutoModelForCausalLM.from_pretrained(
-    "base_model_name",
-    torch_dtype=torch.float16,
-    device_map="auto"
+    "base_model_name", torch_dtype=torch.float16, device_map="auto"
 )
 
 # 2. Load the PEFT model with adapter
 peft_model = PeftModel.from_pretrained(
-    base_model,
-    "path/to/adapter",
-    torch_dtype=torch.float16
+    base_model, "path/to/adapter", torch_dtype=torch.float16
 )
 
 # 3. Merge adapter weights with base model

From 267c1719fb4a6de87de1e0a29e2c2b3b358590c7 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 21:39:59 +0100
Subject: [PATCH 08/30] combine pages together and add extra guidance

---
 chapters/en/_toctree.yml     |   2 -
 chapters/en/chapter11/10.mdx |  56 ------
 chapters/en/chapter11/11.mdx |  13 --
 chapters/en/chapter11/12.mdx |  16 --
 chapters/en/chapter11/2.mdx  | 216 +++++++++++++++++++-
 chapters/en/chapter11/3.mdx  | 212 ++++++++++++--------
 chapters/en/chapter11/4.mdx  | 379 +++++++++++++++++++++++------------
 chapters/en/chapter11/5.mdx  | 191 ++++++++++++------
 chapters/en/chapter11/6.mdx  | 320 ++++++++++++++++++++---------
 chapters/en/chapter11/7.mdx  | 120 +----------
 chapters/en/chapter11/8.mdx  |  53 ++---
 chapters/en/chapter11/9.mdx  | 188 -----------------
 12 files changed, 967 insertions(+), 799 deletions(-)
 delete mode 100644 chapters/en/chapter11/10.mdx
 delete mode 100644 chapters/en/chapter11/11.mdx
 delete mode 100644 chapters/en/chapter11/12.mdx
 delete mode 100644 chapters/en/chapter11/9.mdx

diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
index cefeda057..8cd568bc1 100644
--- a/chapters/en/_toctree.yml
+++ b/chapters/en/_toctree.yml
@@ -216,8 +216,6 @@
     title: Introduction
   - local: chapter11/2
     title: Chat templates
-  - local: chapter11/3
-    title: Implementing Chat Templates with Transformers
   - local: chapter11/4
     title: Introduction to  Supervised Fine-Tuning
   - local: chapter11/5
diff --git a/chapters/en/chapter11/10.mdx b/chapters/en/chapter11/10.mdx
deleted file mode 100644
index bd307e7e9..000000000
--- a/chapters/en/chapter11/10.mdx
+++ /dev/null
@@ -1,56 +0,0 @@
-# Implementing Evaluation
-
-In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
-
-LightEval tasks are defined using a specific format:
-
-```
-{suite}|{task}|{num_few_shot}|{auto_reduce}
-```
-
-| Parameter | Description |
-|-----------|-------------|
-| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
-| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
-| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
-| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
-
-Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
-
-## Example Evaluation Pipeline
-
-Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
-
-Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
-
-```bash
-lighteval vllm \
-    "pretrained=your-model-name" \
-    "mmlu|anatomy|0|0" \
-    "mmlu|high_school_biology|0|0" \
-    "mmlu|high_school_chemistry|0|0" \
-    "mmlu|professional_medicine|0|0" \
-    --max_samples 40 \
-    --batch_size 1 \
-    --output_path "./results" \
-    --save_generations true
-```
-
-Results are displayed in a tabular format showing:
-
-```
-|                  Task                  |Version|Metric|Value |   |Stderr|
-|----------------------------------------|------:|------|-----:|---|-----:|
-|all                                     |       |acc   |0.3333|±  |0.1169|
-|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
-|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
-|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
-```
-
-Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
-
-<Tip>
-
-✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
-
-</Tip>
\ No newline at end of file
diff --git a/chapters/en/chapter11/11.mdx b/chapters/en/chapter11/11.mdx
deleted file mode 100644
index 093de47d6..000000000
--- a/chapters/en/chapter11/11.mdx
+++ /dev/null
@@ -1,13 +0,0 @@
-# Conclusion
-
-In this chapter, we explored the essential components of fine-tuning language models:
-
-1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
-
-2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
-
-3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
-
-4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
-
-These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
diff --git a/chapters/en/chapter11/12.mdx b/chapters/en/chapter11/12.mdx
deleted file mode 100644
index 690547c5f..000000000
--- a/chapters/en/chapter11/12.mdx
+++ /dev/null
@@ -1,16 +0,0 @@
-# Exam Time!
-
-It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter.
-
-To take the quiz, you will need to follow these steps:
-
-1. Sign in to your Hugging Face account.
-2. Answer the questions in the quiz.
-3. Submit your answers.
-
-<iframe
-	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
-	frameborder="0"
-	width="850"
-	height="450"
-></iframe>
diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index d4582348a..5b59f4b11 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -1,3 +1,9 @@
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"},
+]} />
+
 # Chat Templates
 
 Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns.
@@ -10,9 +16,91 @@ To make a base model behave like an instruct model, we need to format our prompt
 
 It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template.
 
+## Common Chat Template Formats
+
+Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models:
+
+We'll use the following conversation structure for all examples:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Hello!"},
+    {"role": "assistant", "content": "Hi! How can I help you today?"},
+    {"role": "user", "content": "What's the weather?"}
+]
+```
+
+This is using the `mistral` template format:
+
+```sh
+<s>[INST] You are a helpful assistant. [/INST]
+Hi! How can I help you today?</s>
+[INST] Hello! [/INST]
+```
+
+This is the chat template for a Qwen 2 model:
+
+```sh
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+Hello!<|im_end|>
+<|im_start|>assistant
+Hi! How can I help you today?<|im_end|>
+<|im_start|>user
+What's the weather?<|im_end|>
+<|im_start|>assistant
+```
+
+Key differences between these formats include:
+1. **System Message Handling**: 
+   - Llama 2 wraps system messages in `<<SYS>>` tags
+   - Llama 3 uses `<|system|>` tags with `</s>` endings
+   - Mistral includes system message in the first instruction
+   - Qwen uses explicit `system` role with `<|im_start|>` tags
+   - ChatGPT uses `SYSTEM:` prefix
+
+2. **Message Boundaries**:
+   - Llama 2 uses `[INST]` and `[/INST]` tags
+   - Llama 3 uses role-specific tags (`<|system|>`, `<|user|>`, `<|assistant|>`) with `</s>` endings
+   - Mistral uses `[INST]` and `[/INST]` with `<s>` and `</s>`
+   - Qwen uses role-specific start/end tokens
+
+3. **Special Tokens**:
+   - Llama 2 uses `<s>` and `</s>` for conversation boundaries
+   - Llama 3 uses `</s>` to end each message
+   - Mistral uses `<s>` and `</s>` for turn boundaries
+   - Qwen uses role-specific start/end tokens
+
+The transformers library handles these differences through model-specific chat templates. When you load a tokenizer, it automatically uses the correct template for that model:
+
+```python
+from transformers import AutoTokenizer
+
+# These will use different templates automatically
+llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
+mistral_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
+qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat")
+
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Hello!"}
+]
+
+# Each will format according to its model's template
+llama_chat = llama_tokenizer.apply_chat_template(messages, tokenize=False)
+mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False)
+qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False)
+```
+
 ## Understanding Chat Templates
 
-At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:
+At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs.
+
+### Basic Chat Template Example
+
+Here's a basic example of a chat template:
 
 ```sh
 <|im_start|>user
@@ -24,23 +112,116 @@ Can I ask a question?<|im_end|>
 <|im_start|>assistant
 ```
 
-The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation:
+### Implementation with Transformers
+
+The transformers library provides built-in support for chat templates through the `apply_chat_template()` method:
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
+
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant."},
+    {"role": "user", "content": "Write a Python function to sort a list"},
+]
+
+# Apply the chat template
+formatted_chat = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+```
+
+This will return a formatted string that looks like:
+
+```sh
+<|im_start|>system
+You are a helpful coding assistant.<|im_end|>
+<|im_start|>user
+Write a Python function to sort a list<|im_end|>
+```
+
+### Advanced Chat Templates
+
+Chat templates can handle more complex scenarios, including:
+
+1. **Tool Use**: When models need to interact with external tools or APIs
+2. **Multimodal Inputs**: For handling images, audio, or other media types
+3. **Function Calling**: For structured function execution
+4. **Multi-turn Context**: For maintaining conversation history
+
+Here's an example of a chat template with tool use:
 
 ```python
 messages = [
     {
         "role": "system",
-        "content": "You are a helpful assistant focused on technical topics.",
+        "content": "You are an AI assistant that can use tools. Available tools: calculator, weather_api",
     },
-    {"role": "user", "content": "Can you explain what a chat template is?"},
+    {"role": "user", "content": "What's 123 * 456 and is it raining in Paris?"},
     {
         "role": "assistant",
-        "content": "A chat template structures conversations between users and AI models...",
+        "content": "Let me help you with that.",
+        "tool_calls": [
+            {
+                "tool": "calculator",
+                "parameters": {"operation": "multiply", "x": 123, "y": 456}
+            },
+            {
+                "tool": "weather_api",
+                "parameters": {"city": "Paris", "country": "France"}
+            }
+        ]
+    },
+    {
+        "role": "tool", 
+        "tool_name": "calculator",
+        "content": "56088"
+    },
+    {
+        "role": "tool",
+        "tool_name": "weather_api",
+        "content": "{'condition': 'rain', 'temperature': 15}"
+    }
+]
+```
+
+For multimodal conversations, chat templates can include image references or base64-encoded images:
+
+```python
+messages = [
+    {
+        "role": "system",
+        "content": "You are a helpful vision assistant that can analyze images."
     },
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "What's in this image?"
+            },
+            {
+                "type": "image",
+                "image_url": "https://example.com/image.jpg"
+            }
+        ]
+    }
 ]
 ```
 
-Let's break down the above example, and see how it maps to the chat template format.
+## Working with Chat Templates
+
+When working with chat templates, you have several options for processing the conversation:
+
+1. Apply the template without tokenization to return the raw formatted string
+2. Apply the template with tokenization to return the token IDs
+3. Add a generation prompt to prepare for model inference
+
+The tokenizer's `apply_chat_template()` method handles all these cases through its parameters:
+
+- `tokenize`: Whether to return token IDs (True) or the formatted string (False)
+- `add_generation_prompt`: Whether to add a prompt for the model to generate a response
 
 ## System Messages
 
@@ -68,8 +249,29 @@ conversation = [
 ]
 ```
 
+## Best Practices
+
+When working with chat templates, consider these best practices:
+
+1. **Consistent Formatting**: Always use the same template format throughout your application
+2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message
+3. **Context Management**: Be mindful of token limits when maintaining conversation history
+4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs
+5. **Validation**: Validate message structure before sending to the model
+
 <Tip>
 
-✏️ **Try it out!** Create a chat template for a conversation between a user and an assistant. Then, use the `transformers` library to tokenize the conversation and see how the model responds. You won't need to download the model to do this, as the tokenizer will handle the formatting.
+✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file.
+
+For this exercise, you'll need to:
+1. Load the dataset using the Hugging Face datasets library
+2. Create a processing function that converts the samples into the correct chat format
+3. Apply the chat template using the tokenizer's methods
 
 </Tip>
+
+## Resources
+
+- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Transformers Documentation](https://huggingface.co/docs/transformers)
+- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates)
\ No newline at end of file
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 748bd74a1..5ae7b7605 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -1,87 +1,125 @@
-# Implementation with Transformers
-
-<CourseFloatingBanner chapter={2}
-  classNames="absolute z-10 right-0 top-0"
-  notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"},
-]} />
-
-Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages.
-
-```python
-from transformers import AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
-
-messages = [
-    {"role": "system", "content": "You are a helpful coding assistant."},
-    {"role": "user", "content": "Write a Python function to sort a list"},
-]
-
-# Apply the chat template
-formatted_chat = tokenizer.apply_chat_template(
-    messages, tokenize=False, add_generation_prompt=True
-)
-```
-
-This will return a formatted string that can be passed to the model. It would look like this for the SmolLM2-135M-Instruct model specified:
-
-```sh
-<|im_start|>system
-You are a helpful coding assistant.<|im_end|>
-<|im_start|>user
-Write a Python function to sort a list<|im_end|>
-```
-
-Note that the `im_start` and `im_end` tokens are used to indicate the start and end of a message. The tokenizer will also have corresponding special tokens for the start and end of messages. For a refresher on how these tokens work, see the [Tokenizers](../chapter2/5.mdx) section.
-
-Chat templates can handle multi-turn conversations while maintaining context:
-
-```python
-messages = [
-    {"role": "system", "content": "You are a math tutor."},
-    {"role": "user", "content": "What is calculus?"},
-    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
-    {"role": "user", "content": "Can you give me an example?"},
-]
-```
-
-## Working with Chat Templates
-
-When working with chat templates, you have several options for processing the conversation:
-
-1. Apply the template without tokenization to return the raw formatted string
-2. Apply the template with tokenization to return the token IDs
-3. Add a generation prompt to prepare for model inference
-
-The tokenizer's `apply_chat_template()` method handles all these cases through its parameters:
-
-- `tokenize`: Whether to return token IDs (True) or the formatted string (False)
-- `add_generation_prompt`: Whether to add a prompt for the model to generate a response
-
-<Tip>
-
-✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file.
-
-For this exercise, you'll need to:
-1. Load the dataset using the Hugging Face datasets library
-2. Create a processing function that converts the samples into the correct chat format
-3. Apply the chat template using the tokenizer's methods
-
-</Tip>
-
-## Conclusion
-
-Chat templates are a crucial component for working with language models, especially when fine-tuning or deploying models for chat applications. They provide structure and consistency to conversations, making it easier for models to understand context and generate appropriate responses.
-
-Understanding how to work with chat templates is essential for:
-- Converting datasets for fine-tuning
-- Preparing inputs for model inference
-- Maintaining conversation context
-- Ensuring consistent model behavior
-
-## Resources
-
-- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
-- [Transformers Documentation](https://huggingface.co/docs/transformers)
-- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) 
+# Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
+
+Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections.
+
+## Understanding Supervised Fine-Tuning
+
+Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. 
+
+SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
+
+## When to Use Supervised Fine-Tuning
+
+The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
+
+Two core reasons to use SFT are:
+
+1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs.
+
+2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise.
+
+## Quiz
+
+### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)?
+
+<Question
+	choices={[
+		{
+			text: "To train a language model from scratch",
+			explain: "SFT builds upon pre-trained models rather than training from scratch."
+		},
+		{
+			text: "To adapt a pre-trained model to specific tasks or domains while maintaining its foundational knowledge",
+			explain: "Correct! SFT allows models to learn specific tasks while leveraging their pre-trained capabilities.",
+			correct: true
+		},
+		{
+			text: "To compress a large language model into a smaller one",
+			explain: "This is more related to model distillation, not SFT."
+		}
+	]}
+/>
+
+### 2. Which of the following are valid reasons to use SFT?
+
+<Question
+	choices={[
+		{
+			text: "Template Control - ensuring the model generates outputs in a specific format",
+			explain: "Yes! SFT helps enforce specific output structures through training examples.",
+			correct: true
+		},
+		{
+			text: "Domain Adaptation - teaching the model domain-specific knowledge and terminology",
+			explain: "Correct! SFT is excellent for adapting models to specialized domains.",
+			correct: true
+		},
+		{
+			text: "Model Architecture Changes - modifying the underlying structure of the model",
+			explain: "SFT doesn't change the model architecture, it only updates the weights."
+		}
+	]}
+/>
+
+### 3. What is required for effective Supervised Fine-Tuning?
+
+<Question
+	choices={[
+		{
+			text: "A pre-trained language model",
+			explain: "Yes! SFT starts with a pre-trained model as its foundation.",
+			correct: true
+		},
+		{
+			text: "Validated examples of desired input-output behavior",
+			explain: "Correct! Quality training data is crucial for successful SFT.",
+			correct: true
+		},
+		{
+			text: "A high performing reference model",
+			explain: "SFT uses existing architectures rather than creating new ones."
+		}
+	]}
+/>
+
+### 4. How does SFT relate to chat templates?
+
+<Question
+	choices={[
+		{
+			text: "SFT can train models to consistently follow specific chat templates",
+			explain: "Correct! SFT helps models learn to generate responses in the desired template format.",
+			correct: true
+		},
+		{
+			text: "Chat templates are not compatible with SFT",
+			explain: "Incorrect! Chat templates are commonly used with SFT for structured outputs."
+		},
+		{
+			text: "SFT automatically creates chat templates",
+			explain: "SFT doesn't create templates, it trains models to use existing templates."
+		}
+	]}
+/>
+
+### 5. What distinguishes SFT from pre-training?
+
+<Question
+	choices={[
+		{
+			text: "SFT uses labeled data for specific tasks",
+			explain: "Yes! SFT requires examples of desired behavior for specific tasks.",
+			correct: true
+		},
+		{
+			text: "SFT is faster than pre-training",
+			explain: "The speed difference isn't a defining characteristic; it depends on various factors."
+		},
+		{
+			text: "SFT requires more data than pre-training",
+			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
+		}
+	]}
+/>
\ No newline at end of file
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index 5ae7b7605..183a9fa06 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -1,125 +1,254 @@
-# Supervised Fine-Tuning
-
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
-
-Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections.
-
-## Understanding Supervised Fine-Tuning
-
-Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. 
-
-SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
-
-## When to Use Supervised Fine-Tuning
-
-The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
-
-Two core reasons to use SFT are:
-
-1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs.
-
-2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise.
-
-## Quiz
-
-### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)?
-
-<Question
-	choices={[
-		{
-			text: "To train a language model from scratch",
-			explain: "SFT builds upon pre-trained models rather than training from scratch."
-		},
-		{
-			text: "To adapt a pre-trained model to specific tasks or domains while maintaining its foundational knowledge",
-			explain: "Correct! SFT allows models to learn specific tasks while leveraging their pre-trained capabilities.",
-			correct: true
-		},
-		{
-			text: "To compress a large language model into a smaller one",
-			explain: "This is more related to model distillation, not SFT."
-		}
-	]}
-/>
-
-### 2. Which of the following are valid reasons to use SFT?
-
-<Question
-	choices={[
-		{
-			text: "Template Control - ensuring the model generates outputs in a specific format",
-			explain: "Yes! SFT helps enforce specific output structures through training examples.",
-			correct: true
-		},
-		{
-			text: "Domain Adaptation - teaching the model domain-specific knowledge and terminology",
-			explain: "Correct! SFT is excellent for adapting models to specialized domains.",
-			correct: true
-		},
-		{
-			text: "Model Architecture Changes - modifying the underlying structure of the model",
-			explain: "SFT doesn't change the model architecture, it only updates the weights."
-		}
-	]}
-/>
-
-### 3. What is required for effective Supervised Fine-Tuning?
-
-<Question
-	choices={[
-		{
-			text: "A pre-trained language model",
-			explain: "Yes! SFT starts with a pre-trained model as its foundation.",
-			correct: true
-		},
-		{
-			text: "Validated examples of desired input-output behavior",
-			explain: "Correct! Quality training data is crucial for successful SFT.",
-			correct: true
-		},
-		{
-			text: "A high performing reference model",
-			explain: "SFT uses existing architectures rather than creating new ones."
-		}
-	]}
-/>
-
-### 4. How does SFT relate to chat templates?
-
-<Question
-	choices={[
-		{
-			text: "SFT can train models to consistently follow specific chat templates",
-			explain: "Correct! SFT helps models learn to generate responses in the desired template format.",
-			correct: true
-		},
-		{
-			text: "Chat templates are not compatible with SFT",
-			explain: "Incorrect! Chat templates are commonly used with SFT for structured outputs."
-		},
-		{
-			text: "SFT automatically creates chat templates",
-			explain: "SFT doesn't create templates, it trains models to use existing templates."
-		}
-	]}
-/>
-
-### 5. What distinguishes SFT from pre-training?
-
-<Question
-	choices={[
-		{
-			text: "SFT uses labeled data for specific tasks",
-			explain: "Yes! SFT requires examples of desired behavior for specific tasks.",
-			correct: true
-		},
-		{
-			text: "SFT is faster than pre-training",
-			explain: "The speed difference isn't a defining characteristic; it depends on various factors."
-		},
-		{
-			text: "SFT requires more data than pre-training",
-			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
-		}
-	]}
-/>
\ No newline at end of file
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
+]} />
+
+# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+
+In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
+
+## Dataset Preparation
+
+The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+
+<iframe
+  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
+  frameborder="0"
+  width="100%"
+  height="360px"
+></iframe>
+
+## Understanding Training Dynamics
+
+When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves.
+
+### Loss Patterns
+
+The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data.
+
+### The Path to Convergence
+
+As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset.
+
+### Monitoring Training Progress
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves showing healthy convergence" width="600"/>
+</div>
+
+The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
+
+### Warning Signs to Watch For
+
+Several patterns in the loss curves can indicate potential issues:
+
+1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
+   - Reducing the model size or training time
+   - Adding regularization
+   - Increasing the dataset size
+   - Using techniques like early stopping
+
+2. If the loss doesn't show significant improvement, the model might be:
+   - Learning too slowly (try increasing the learning rate)
+   - Struggling with the task (check data quality and task complexity)
+   - Hitting architecture limitations (consider a different model)
+
+3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
+   - The model performs poorly on new, similar examples
+   - The outputs lack diversity
+   - The responses are too similar to training examples
+
+<Tip warning={true}>
+
+Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
+
+</Tip>
+
+## Training Configuration
+
+We will configure SFT trainer with the following parameters:
+
+| Parameter | Description |
+|-----------|-------------|
+| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
+| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
+| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
+| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
+| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
+| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
+| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
+| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+
+In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+
+## Training and Evaluation
+
+Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
+
+- Iterating over the dataset
+- Computing the loss
+- Updating the model's parameters
+- Regular evaluation on a validation set
+
+Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
+
+## `SFTTrainer` from Transformer Reinforcement Learning
+
+Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
+
+Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
+
+```python
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer
+
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+
+training_args = SFTConfig(
+    max_seq_length=512,
+    output_dir="/tmp",
+)
+
+trainer = SFTTrainer(
+    model_name="HuggingFaceTB/SmolLM2-135M",
+    train_dataset=dataset,
+    args=training_args,
+)
+trainer.train()
+```
+
+Just like in `transformers`, we work through the following steps:
+
+1. Load the dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
+<Tip>
+
+✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+
+For this exercise, you'll need to:
+1. Load and prepare your chosen dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
+</Tip>
+
+# Supervised Fine-Tuning with SFTTrainer
+
+In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+
+## Load the base model
+
+Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
+
+```python
+# Import necessary libraries
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer, setup_chat_format
+import torch
+
+# Set the device to use for training
+device = (
+    "cuda"
+    if torch.cuda.is_available()
+    else "mps" if torch.backends.mps.is_available() else "cpu"
+)
+
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
+).to(device)
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
+
+# Set up the chat format
+model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
+```
+
+## Generate with the base model
+
+First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
+
+```python
+# Let's test the base model before training
+prompt = "Write a haiku about programming"
+
+# Format with template
+messages = [{"role": "user", "content": prompt}]
+formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+
+# Generate response
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+```
+
+## Dataset Preparation
+
+We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
+
+**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+
+```python
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+```
+
+## Configuring the SFTTrainer
+
+The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
+
+```python
+# Configure the SFTTrainer
+sft_config = SFTConfig(
+    output_dir="./sft_output",
+    max_steps=1000,  # Adjust based on dataset size and desired training duration
+    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
+    learning_rate=5e-5,  # Common starting point for fine-tuning
+    logging_steps=10,  # Frequency of logging training metrics
+    save_steps=100,  # Frequency of saving model checkpoints
+    evaluation_strategy="steps",  # Evaluate the model at regular intervals
+    eval_steps=50,  # Frequency of evaluation
+    use_mps_device=(
+        True if device == "mps" else False
+    ),  # Use MPS for mixed precision training
+    hub_model_id=finetune_name,  # Set a unique name for your model
+)
+
+# Initialize the SFTTrainer
+trainer = SFTTrainer(
+    model=model,
+    args=sft_config,
+    train_dataset=ds["train"],
+    tokenizer=tokenizer,
+    eval_dataset=ds["test"],
+)
+```
+
+## Training the model
+
+With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+
+```python
+trainer.train()
+```
+
+
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/nlp_course_sft_loss_graphic.png" alt="SFTTrainer Training" />
+
+## 💐 Nice work!
+
+This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
+
+- Try this notebook on a harder difficulty
+- Review a colleagues PR
+- Improve the course material via an Issue or PR.
+
+
+## Resources
+
+- [TRL Documentation](https://huggingface.co/docs/trl)
+- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index b63c2c38d..28583bb58 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -1,91 +1,166 @@
-# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+# LoRA (Low-Rank Adaptation)
 
-In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
+Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
 
-## Dataset Preparation
+## Understanding LoRA
 
-The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685).
 
-<iframe
-  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
-  frameborder="0"
-  width="100%"
-  height="360px"
-></iframe>
+LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.
 
-## Training Configuration
+## Key advantages of LoRA
 
-We will configure SFT trainer with the following parameters:
+1. **Memory Efficiency**: 
+   - Only adapter parameters are stored in GPU memory
+   - Base model weights remain frozen and can be loaded in lower precision
+   - Enables fine-tuning of large models on consumer GPUs
+
+2. **Training Features**:
+   - Native PEFT/LoRA integration with minimal setup
+   - Support for QLoRA (Quantized LoRA) for even better memory efficiency
+
+3. **Adapter Management**:
+   - Adapter weight saving during checkpoints
+   - Features to merge adapters back into base model
+
+## Loading LoRA Adapters with PEFT
+
+PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
+
+Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
+
+```python
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+
+base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
+peft_model_id = "<peft_adapter_id>"
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+```
+
+<!-- TODO: Add image -->
+![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png)
+
+## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA
+
+The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
+
+We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
+
+1. Define the LoRA configuration (rank, alpha, dropout)
+2. Create the SFTTrainer with PEFT config
+3. Train and save the adapter weights
+
+## LoRA Configuration
+
+Let's walk through the LoRA configuration and key parameters.
 
 | Parameter | Description |
 |-----------|-------------|
-| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
-| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
-| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
-| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
-| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
-| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
-| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
-| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. |
+| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. |
+| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. |
+| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
+| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
+
+When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
+
+## Using TRL with PEFT
+
+PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
+
+```python
+from peft import LoraConfig
+
+# TODO: Configure LoRA parameters
+# r: rank dimension for LoRA update matrices (smaller = more compression)
+rank_dimension = 6
+# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
+lora_alpha = 8
+# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
+lora_dropout = 0.05
+
+peft_config = LoraConfig(
+    r=rank_dimension,  # Rank dimension - typically between 4-32
+    lora_alpha=lora_alpha,  # LoRA scaling factor - typically 2x rank
+    lora_dropout=lora_dropout,  # Dropout probability for LoRA layers
+    bias="none",  # Bias type for LoRA. the corresponding biases will be updated during training.
+    target_modules="all-linear",  # Which modules to apply LoRA to
+    task_type="CAUSAL_LM",  # Task type for model architecture
+)
+```
 
-In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. 
 
-## Training and Evaluation
+We will also need to define the `SFTTrainer` with the LoRA configuration.
 
-Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
+```python
+# Create SFTTrainer with LoRA configuration
+trainer = SFTTrainer(
+    model=model,
+    args=args,
+    train_dataset=dataset["train"],
+    peft_config=lora_config,  # LoRA configuration
+    max_seq_length=max_seq_length,  # Maximum sequence length
+    tokenizer=tokenizer,
+)
+```
 
-- Iterating over the dataset
-- Computing the loss
-- Updating the model's parameters
-- Regular evaluation on a validation set
+<Tip>
 
-Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
+✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
-## `SFTTrainer` from Transformer Reinforcement Learning
+</Tip>
 
-Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
+## Merging LoRA Adapters
 
-Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
+After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
 
-```python
-from datasets import load_dataset
-from trl import SFTConfig, SFTTrainer
+The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. 
 
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. 
 
-training_args = SFTConfig(
-    max_seq_length=512,
-    output_dir="/tmp",
+## Merging Implementation
+
+After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it:
+
+```python
+import torch
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+
+# 1. Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "base_model_name", torch_dtype=torch.float16, device_map="auto"
 )
 
-trainer = SFTTrainer(
-    model_name="HuggingFaceTB/SmolLM2-135M",
-    train_dataset=dataset,
-    args=training_args,
+# 2. Load the PEFT model with adapter
+peft_model = PeftModel.from_pretrained(
+    base_model, "path/to/adapter", torch_dtype=torch.float16
 )
-trainer.train()
+
+# 3. Merge adapter weights with base model
+merged_model = peft_model.merge_and_unload()
 ```
 
-Just like in `transformers`, we work through the following steps:
+If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer:
 
-1. Load the dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
+```python
+# Save both model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("base_model_name")
+merged_model.save_pretrained("path/to/save/merged_model")
+tokenizer.save_pretrained("path/to/save/merged_model")
+```
 
 <Tip>
 
-✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
-
-For this exercise, you'll need to:
-1. Load and prepare your chosen dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
+✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
 </Tip>
 
-## Resources
 
-- [TRL Documentation](https://huggingface.co/docs/trl)
-- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
+# Resources
+
+- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685)
+- [PEFT Documentation](https://huggingface.co/docs/peft)
+- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx
index cb103fd6a..d02183d40 100644
--- a/chapters/en/chapter11/6.mdx
+++ b/chapters/en/chapter11/6.mdx
@@ -1,111 +1,245 @@
-# Supervised Fine-Tuning with SFTTrainer
-
-<CourseFloatingBanner chapter={2}
-  classNames="absolute z-10 right-0 top-0"
-  notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
-]} />
-
-In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
-
-## Load the base model
-
-Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
-
-```python
-# Import necessary libraries
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from datasets import load_dataset
-from trl import SFTConfig, SFTTrainer, setup_chat_format
-import torch
-
-# Set the device to use for training
-device = (
-    "cuda"
-    if torch.cuda.is_available()
-    else "mps" if torch.backends.mps.is_available() else "cpu"
-)
-
-# Load the model and tokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
-).to(device)
-tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
-
-# Set up the chat format
-model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
-```
+# Evaluation
 
-## Generate with the base model
+With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
 
-First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
+## Automatic Benchmarks
 
-```python
-# Let's test the base model before training
-prompt = "Write a haiku about programming"
+Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy.
 
-# Format with template
-messages = [{"role": "user", "content": prompt}]
-formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+## Understanding Automatic Benchmarks
 
-# Generate response
-inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
-outputs = model.generate(**inputs, max_new_tokens=100)
-```
+Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results.
 
-## Dataset Preparation
+However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases.
 
-We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
+## General Knowledge Benchmarks
 
-**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
 
-```python
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
-```
+## Reasoning Benchmarks
+BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
+
+## Language Understanding
+
+HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
+
+## Alternative Evaluation Approaches
+
+Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
+
+### LLM-as-Judge
+
+Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
+
+### Evaluation Arenas
+
+Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
+
+### Custom Benchmark Suites
+
+Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
+
+## Creating Your Own Evaluation Strategy
+
+Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
 
-## Configuring the SFTTrainer
-
-The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
-
-```python
-# Configure the SFTTrainer
-sft_config = SFTConfig(
-    output_dir="./sft_output",
-    max_steps=1000,  # Adjust based on dataset size and desired training duration
-    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
-    learning_rate=5e-5,  # Common starting point for fine-tuning
-    logging_steps=10,  # Frequency of logging training metrics
-    save_steps=100,  # Frequency of saving model checkpoints
-    evaluation_strategy="steps",  # Evaluate the model at regular intervals
-    eval_steps=50,  # Frequency of evaluation
-    use_mps_device=(
-        True if device == "mps" else False
-    ),  # Use MPS for mixed precision training
-    hub_model_id=finetune_name,  # Set a unique name for your model
-)
-
-# Initialize the SFTTrainer
-trainer = SFTTrainer(
-    model=model,
-    args=sft_config,
-    train_dataset=ds["train"],
-    tokenizer=tokenizer,
-    eval_dataset=ds["test"],
-)
+While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
+
+1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models.
+
+2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic?
+
+3. Develop custom evaluation datasets that reflect your actual use case. This might include:
+   - Real user queries from your domain
+   - Common edge cases you've encountered
+   - Examples of particularly challenging scenarios
+
+4. Consider implementing a multi-layered evaluation strategy:
+   - Automated metrics for quick feedback
+   - Human evaluation for nuanced understanding
+   - Domain expert review for specialized applications
+   - A/B testing in controlled environments
+
+# Implementing Evaluation
+
+In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
+
+LightEval tasks are defined using a specific format:
+
+```
+{suite}|{task}|{num_few_shot}|{auto_reduce}
 ```
 
-## Training the model
+| Parameter | Description |
+|-----------|-------------|
+| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
+| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
+| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
+| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
+
+Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
+
+## Example Evaluation Pipeline
 
-With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
 
-```python
-trainer.train()
+Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
+
+```bash
+lighteval vllm \
+    "pretrained=your-model-name" \
+    "mmlu|anatomy|0|0" \
+    "mmlu|high_school_biology|0|0" \
+    "mmlu|high_school_chemistry|0|0" \
+    "mmlu|professional_medicine|0|0" \
+    --max_samples 40 \
+    --batch_size 1 \
+    --output_path "./results" \
+    --save_generations true
 ```
 
-## 💐 Nice work!
+Results are displayed in a tabular format showing:
 
-This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
+```
+|                  Task                  |Version|Metric|Value |   |Stderr|
+|----------------------------------------|------:|------|-----:|---|-----:|
+|all                                     |       |acc   |0.3333|±  |0.1169|
+|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
+|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
+|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
+```
 
-- Try this notebook on a harder difficulty
-- Review a colleagues PR
-- Improve the course material via an Issue or PR.
+Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
+
+<Tip>
+
+✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
+
+</Tip>
+
+# End-of-chapter quiz[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={11}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+### 1. What are the main advantages of using automatic benchmarks for model evaluation?
+
+<Question
+	choices={[
+		{
+			text: "They provide perfect real-world performance metrics",
+			explain: "Incorrect! While automatic benchmarks are useful, they don't always translate directly to real-world performance."
+		},
+		{
+			text: "They allow for standardized comparison between models and provide reproducible results",
+			explain: "Correct! This is one of the key benefits of automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They eliminate the need for any other form of evaluation",
+			explain: "Incorrect! Automatic benchmarks should be part of a comprehensive evaluation strategy, not the only method."
+		}
+	]}
+/>
+
+### 2. Which benchmark specifically tests knowledge across 57 different subjects?
+
+<Question
+	choices={[
+		{
+			text: "BBH (Big Bench Hard)",
+			explain: "Incorrect! BBH focuses on complex reasoning tasks, not broad subject knowledge."
+		},
+		{
+			text: "GSM8K",
+			explain: "Incorrect! GSM8K specifically targets mathematical problem-solving."
+		},
+		{
+			text: "MMLU",
+			explain: "Correct! MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities.",
+			correct: true
+		}
+	]}
+/>
+
+### 3. What is LLM-as-Judge?
+
+<Question
+	choices={[
+		{
+			text: "Using one language model to evaluate another's outputs",
+			explain: "Correct! This is an alternative evaluation approach that can provide more nuanced feedback.",
+			correct: true
+		},
+		{
+			text: "A benchmark that tests judicial reasoning",
+			explain: "Incorrect! LLM-as-Judge refers to using one model to evaluate another, not testing judicial reasoning."
+		},
+		{
+			text: "A method for training models on legal datasets",
+			explain: "Incorrect! This isn't related to training on legal data, but rather using one model to evaluate another's outputs."
+		}
+	]}
+/>
+
+### 4. What should be included in a comprehensive evaluation strategy?
+
+<Question
+	choices={[
+		{
+			text: "Only standard benchmarks",
+			explain: "Incorrect! A comprehensive strategy should include multiple evaluation methods."
+		},
+		{
+			text: "Standard benchmarks, custom evaluation datasets, and domain-specific testing",
+			explain: "Correct! A comprehensive strategy should include multiple layers of evaluation.",
+			correct: true
+		},
+		{
+			text: "Only custom datasets specific to your use case",
+			explain: "Incorrect! While custom datasets are important, they shouldn't be the only evaluation method."
+		}
+	]}
+/>
+
+### 5. What is a limitation of automatic benchmarks?
+
+<Question
+	choices={[
+		{
+			text: "They are too expensive to run",
+			explain: "Incorrect! Cost isn't typically the main limitation of automatic benchmarks."
+		},
+		{
+			text: "Benchmark performance doesn't always translate directly to real-world effectiveness",
+			explain: "Correct! This is a key limitation to keep in mind when using automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They can only evaluate small models",
+			explain: "Incorrect! Automatic benchmarks can be used to evaluate models of various sizes."
+		}
+	]}
+/>
+
+### 6. What is the purpose of creating custom evaluation datasets?
+
+<Question
+	choices={[
+		{
+			text: "To reflect your specific use case and include real user queries from your domain",
+			explain: "Correct! Custom datasets help ensure evaluation is relevant to your specific needs.",
+			correct: true
+		},
+		{
+			text: "To replace standard benchmarks entirely",
+			explain: "Incorrect! Custom datasets should complement, not replace, standard benchmarks."
+		},
+		{
+			text: "To make evaluation easier",
+			explain: "Incorrect! Creating custom datasets requires additional effort but provides more relevant evaluation."
+		}
+	]}
+/>
diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
index 561acf61e..093de47d6 100644
--- a/chapters/en/chapter11/7.mdx
+++ b/chapters/en/chapter11/7.mdx
@@ -1,119 +1,13 @@
-# LoRA (Low-Rank Adaptation)
+# Conclusion
 
-Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
+In this chapter, we explored the essential components of fine-tuning language models:
 
-## Understanding LoRA
+1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
 
-LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685).
+2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
 
-LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.
+3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
 
-## Key advantages of LoRA
+4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
 
-1. **Memory Efficiency**: 
-   - Only adapter parameters are stored in GPU memory
-   - Base model weights remain frozen and can be loaded in lower precision
-   - Enables fine-tuning of large models on consumer GPUs
-
-2. **Training Features**:
-   - Native PEFT/LoRA integration with minimal setup
-   - Support for QLoRA (Quantized LoRA) for even better memory efficiency
-
-3. **Adapter Management**:
-   - Adapter weight saving during checkpoints
-   - Features to merge adapters back into base model
-
-## Loading LoRA Adapters with PEFT
-
-PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
-
-Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
-
-```python
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
-
-base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
-peft_model_id = "<peft_adapter_id>"
-model = PeftModel.from_pretrained(base_model, peft_model_id)
-```
-
-<!-- TODO: Add image -->
-![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png)
-
-## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA
-
-The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
-
-We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
-
-1. Define the LoRA configuration (rank, alpha, dropout)
-2. Create the SFTTrainer with PEFT config
-3. Train and save the adapter weights
-
-## LoRA Configuration
-
-Let's walk through the LoRA configuration and key parameters.
-
-| Parameter | Description |
-|-----------|-------------|
-| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. |
-| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. |
-| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. |
-| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
-| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
-
-When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
-
-## Using TRL with PEFT
-
-PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
-
-```python
-from peft import LoraConfig
-
-# TODO: Configure LoRA parameters
-# r: rank dimension for LoRA update matrices (smaller = more compression)
-rank_dimension = 6
-# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
-lora_alpha = 8
-# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
-lora_dropout = 0.05
-
-peft_config = LoraConfig(
-    r=rank_dimension,  # Rank dimension - typically between 4-32
-    lora_alpha=lora_alpha,  # LoRA scaling factor - typically 2x rank
-    lora_dropout=lora_dropout,  # Dropout probability for LoRA layers
-    bias="none",  # Bias type for LoRA. the corresponding biases will be updated during training.
-    target_modules="all-linear",  # Which modules to apply LoRA to
-    task_type="CAUSAL_LM",  # Task type for model architecture
-)
-```
-
-Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. 
-
-We will also need to define the `SFTTrainer` with the LoRA configuration.
-
-```python
-# Create SFTTrainer with LoRA configuration
-trainer = SFTTrainer(
-    model=model,
-    args=args,
-    train_dataset=dataset["train"],
-    peft_config=lora_config,  # LoRA configuration
-    max_seq_length=max_seq_length,  # Maximum sequence length
-    tokenizer=tokenizer,
-)
-```
-
-<Tip>
-
-✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
-
-</Tip>
-
-# Resources
-
-- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685)
-- [PEFT Documentation](https://huggingface.co/docs/peft)
-- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft)
\ No newline at end of file
+These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx
index 323a3b646..690547c5f 100644
--- a/chapters/en/chapter11/8.mdx
+++ b/chapters/en/chapter11/8.mdx
@@ -1,45 +1,16 @@
-## Merging LoRA Adapters
+# Exam Time!
 
-After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
+It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter.
 
-The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. 
+To take the quiz, you will need to follow these steps:
 
-Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. 
+1. Sign in to your Hugging Face account.
+2. Answer the questions in the quiz.
+3. Submit your answers.
 
-## Merging Implementation
-
-After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it:
-
-```python
-import torch
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
-
-# 1. Load the base model
-base_model = AutoModelForCausalLM.from_pretrained(
-    "base_model_name", torch_dtype=torch.float16, device_map="auto"
-)
-
-# 2. Load the PEFT model with adapter
-peft_model = PeftModel.from_pretrained(
-    base_model, "path/to/adapter", torch_dtype=torch.float16
-)
-
-# 3. Merge adapter weights with base model
-merged_model = peft_model.merge_and_unload()
-```
-
-If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer:
-
-```python
-# Save both model and tokenizer
-tokenizer = AutoTokenizer.from_pretrained("base_model_name")
-merged_model.save_pretrained("path/to/save/merged_model")
-tokenizer.save_pretrained("path/to/save/merged_model")
-```
-
-<Tip>
-
-✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
-
-</Tip>
+<iframe
+	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
diff --git a/chapters/en/chapter11/9.mdx b/chapters/en/chapter11/9.mdx
deleted file mode 100644
index ab88e9d71..000000000
--- a/chapters/en/chapter11/9.mdx
+++ /dev/null
@@ -1,188 +0,0 @@
-# Evaluation
-
-With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
-
-## Automatic Benchmarks
-
-Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy.
-
-## Understanding Automatic Benchmarks
-
-Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results.
-
-However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases.
-
-## General Knowledge Benchmarks
-
-MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
-
-## Reasoning Benchmarks
-BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
-
-## Language Understanding
-
-HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
-
-## Alternative Evaluation Approaches
-
-Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
-
-### LLM-as-Judge
-
-Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
-
-### Evaluation Arenas
-
-Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
-
-### Custom Benchmark Suites
-
-Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
-
-## Creating Your Own Evaluation Strategy
-
-Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
-
-While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
-
-1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models.
-
-2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic?
-
-3. Develop custom evaluation datasets that reflect your actual use case. This might include:
-   - Real user queries from your domain
-   - Common edge cases you've encountered
-   - Examples of particularly challenging scenarios
-
-4. Consider implementing a multi-layered evaluation strategy:
-   - Automated metrics for quick feedback
-   - Human evaluation for nuanced understanding
-   - Domain expert review for specialized applications
-   - A/B testing in controlled environments
-
-# End-of-chapter quiz[[end-of-chapter-quiz]]
-
-<CourseFloatingBanner
-    chapter={11}
-    classNames="absolute z-10 right-0 top-0"
-/>
-
-### 1. What are the main advantages of using automatic benchmarks for model evaluation?
-
-<Question
-	choices={[
-		{
-			text: "They provide perfect real-world performance metrics",
-			explain: "Incorrect! While automatic benchmarks are useful, they don't always translate directly to real-world performance."
-		},
-		{
-			text: "They allow for standardized comparison between models and provide reproducible results",
-			explain: "Correct! This is one of the key benefits of automatic benchmarks.",
-			correct: true
-		},
-		{
-			text: "They eliminate the need for any other form of evaluation",
-			explain: "Incorrect! Automatic benchmarks should be part of a comprehensive evaluation strategy, not the only method."
-		}
-	]}
-/>
-
-### 2. Which benchmark specifically tests knowledge across 57 different subjects?
-
-<Question
-	choices={[
-		{
-			text: "BBH (Big Bench Hard)",
-			explain: "Incorrect! BBH focuses on complex reasoning tasks, not broad subject knowledge."
-		},
-		{
-			text: "GSM8K",
-			explain: "Incorrect! GSM8K specifically targets mathematical problem-solving."
-		},
-		{
-			text: "MMLU",
-			explain: "Correct! MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities.",
-			correct: true
-		}
-	]}
-/>
-
-### 3. What is LLM-as-Judge?
-
-<Question
-	choices={[
-		{
-			text: "Using one language model to evaluate another's outputs",
-			explain: "Correct! This is an alternative evaluation approach that can provide more nuanced feedback.",
-			correct: true
-		},
-		{
-			text: "A benchmark that tests judicial reasoning",
-			explain: "Incorrect! LLM-as-Judge refers to using one model to evaluate another, not testing judicial reasoning."
-		},
-		{
-			text: "A method for training models on legal datasets",
-			explain: "Incorrect! This isn't related to training on legal data, but rather using one model to evaluate another's outputs."
-		}
-	]}
-/>
-
-### 4. What should be included in a comprehensive evaluation strategy?
-
-<Question
-	choices={[
-		{
-			text: "Only standard benchmarks",
-			explain: "Incorrect! A comprehensive strategy should include multiple evaluation methods."
-		},
-		{
-			text: "Standard benchmarks, custom evaluation datasets, and domain-specific testing",
-			explain: "Correct! A comprehensive strategy should include multiple layers of evaluation.",
-			correct: true
-		},
-		{
-			text: "Only custom datasets specific to your use case",
-			explain: "Incorrect! While custom datasets are important, they shouldn't be the only evaluation method."
-		}
-	]}
-/>
-
-### 5. What is a limitation of automatic benchmarks?
-
-<Question
-	choices={[
-		{
-			text: "They are too expensive to run",
-			explain: "Incorrect! Cost isn't typically the main limitation of automatic benchmarks."
-		},
-		{
-			text: "Benchmark performance doesn't always translate directly to real-world effectiveness",
-			explain: "Correct! This is a key limitation to keep in mind when using automatic benchmarks.",
-			correct: true
-		},
-		{
-			text: "They can only evaluate small models",
-			explain: "Incorrect! Automatic benchmarks can be used to evaluate models of various sizes."
-		}
-	]}
-/>
-
-### 6. What is the purpose of creating custom evaluation datasets?
-
-<Question
-	choices={[
-		{
-			text: "To reflect your specific use case and include real user queries from your domain",
-			explain: "Correct! Custom datasets help ensure evaluation is relevant to your specific needs.",
-			correct: true
-		},
-		{
-			text: "To replace standard benchmarks entirely",
-			explain: "Incorrect! Custom datasets should complement, not replace, standard benchmarks."
-		},
-		{
-			text: "To make evaluation easier",
-			explain: "Incorrect! Creating custom datasets requires additional effort but provides more relevant evaluation."
-		}
-	]}
-/>

From a9847d08f681f324f83b9fdeb3635af43d85cd93 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 21:44:16 +0100
Subject: [PATCH 09/30] update toc and format snippets

---
 chapters/en/_toctree.yml    | 18 +++++------------
 chapters/en/chapter11/2.mdx | 39 +++++++++++++------------------------
 2 files changed, 18 insertions(+), 39 deletions(-)

diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
index 8cd568bc1..a0b6a61fe 100644
--- a/chapters/en/_toctree.yml
+++ b/chapters/en/_toctree.yml
@@ -215,24 +215,16 @@
   - local: chapter11/1
     title: Introduction
   - local: chapter11/2
-    title: Chat templates
+    title: Chat Templates
   - local: chapter11/4
-    title: Introduction to  Supervised Fine-Tuning
+    title: Fine-Tuning with SFTTrainer
   - local: chapter11/5
-    title: Introduction to SFTTrainer in TRL
+    title: LoRA (Low-Rank Adaptation)
   - local: chapter11/6
-    title: Fine-Tuning a Model with SFTTrainer
+    title: Evaluation
   - local: chapter11/7
-    title: LoRA (Low-Rank Adaptation)
-  - local: chapter11/8
-    title: Merging LoRA Adapters
-  - local: chapter11/9
-    title: Evaluating Fine-Tuned Models
-  - local: chapter11/10
-    title: Implementing Evaluation
-  - local: chapter11/11
     title: Conclusion
-  - local: chapter11/12
+  - local: chapter11/8
     title: Exam Time!
     quiz: 11
 
diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 5b59f4b11..8798e1d05 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -27,7 +27,7 @@ messages = [
     {"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": "Hello!"},
     {"role": "assistant", "content": "Hi! How can I help you today?"},
-    {"role": "user", "content": "What's the weather?"}
+    {"role": "user", "content": "What's the weather?"},
 ]
 ```
 
@@ -85,7 +85,7 @@ qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat")
 
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
-    {"role": "user", "content": "Hello!"}
+    {"role": "user", "content": "Hello!"},
 ]
 
 # Each will format according to its model's template
@@ -165,24 +165,17 @@ messages = [
         "tool_calls": [
             {
                 "tool": "calculator",
-                "parameters": {"operation": "multiply", "x": 123, "y": 456}
+                "parameters": {"operation": "multiply", "x": 123, "y": 456},
             },
-            {
-                "tool": "weather_api",
-                "parameters": {"city": "Paris", "country": "France"}
-            }
-        ]
-    },
-    {
-        "role": "tool", 
-        "tool_name": "calculator",
-        "content": "56088"
+            {"tool": "weather_api", "parameters": {"city": "Paris", "country": "France"}},
+        ],
     },
+    {"role": "tool", "tool_name": "calculator", "content": "56088"},
     {
         "role": "tool",
         "tool_name": "weather_api",
-        "content": "{'condition': 'rain', 'temperature': 15}"
-    }
+        "content": "{'condition': 'rain', 'temperature': 15}",
+    },
 ]
 ```
 
@@ -192,21 +185,15 @@ For multimodal conversations, chat templates can include image references or bas
 messages = [
     {
         "role": "system",
-        "content": "You are a helpful vision assistant that can analyze images."
+        "content": "You are a helpful vision assistant that can analyze images.",
     },
     {
         "role": "user",
         "content": [
-            {
-                "type": "text",
-                "text": "What's in this image?"
-            },
-            {
-                "type": "image",
-                "image_url": "https://example.com/image.jpg"
-            }
-        ]
-    }
+            {"type": "text", "text": "What's in this image?"},
+            {"type": "image", "image_url": "https://example.com/image.jpg"},
+        ],
+    },
 ]
 ```
 

From 82b1d4ae67fe1618a58f817d12e732ffdb3a9d16 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 5 Feb 2025 21:50:12 +0100
Subject: [PATCH 10/30] update structure

---
 chapters/en/_toctree.yml    |  10 +-
 chapters/en/chapter11/3.mdx | 257 +++++++++++++++++++++++++++-
 chapters/en/chapter11/4.mdx | 296 ++++++++++++--------------------
 chapters/en/chapter11/5.mdx | 329 ++++++++++++++++++++++--------------
 chapters/en/chapter11/6.mdx | 246 +--------------------------
 chapters/en/chapter11/7.mdx |  21 ++-
 chapters/en/chapter11/8.mdx |  16 --
 7 files changed, 588 insertions(+), 587 deletions(-)
 delete mode 100644 chapters/en/chapter11/8.mdx

diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
index a0b6a61fe..73e2a069b 100644
--- a/chapters/en/_toctree.yml
+++ b/chapters/en/_toctree.yml
@@ -216,15 +216,15 @@
     title: Introduction
   - local: chapter11/2
     title: Chat Templates
-  - local: chapter11/4
+  - local: chapter11/3
     title: Fine-Tuning with SFTTrainer
-  - local: chapter11/5
+  - local: chapter11/4
     title: LoRA (Low-Rank Adaptation)
-  - local: chapter11/6
+  - local: chapter11/5
     title: Evaluation
-  - local: chapter11/7
+  - local: chapter11/6
     title: Conclusion
-  - local: chapter11/8
+  - local: chapter11/7
     title: Exam Time!
     quiz: 11
 
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 5ae7b7605..f84f77557 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -1,3 +1,9 @@
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
+]} />
+
 # Supervised Fine-Tuning
 
 Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
@@ -122,4 +128,253 @@ Two core reasons to use SFT are:
 			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
 		}
 	]}
-/>
\ No newline at end of file
+/>
+
+# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+
+In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
+
+## Dataset Preparation
+
+The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+
+<iframe
+  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
+  frameborder="0"
+  width="100%"
+  height="360px"
+></iframe>
+
+## Understanding Training Dynamics
+
+When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves.
+
+### Loss Patterns
+
+The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data.
+
+### The Path to Convergence
+
+As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset.
+
+### Monitoring Training Progress
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves showing healthy convergence" width="600"/>
+</div>
+
+The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
+
+### Warning Signs to Watch For
+
+Several patterns in the loss curves can indicate potential issues:
+
+1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
+   - Reducing the model size or training time
+   - Adding regularization
+   - Increasing the dataset size
+   - Using techniques like early stopping
+
+2. If the loss doesn't show significant improvement, the model might be:
+   - Learning too slowly (try increasing the learning rate)
+   - Struggling with the task (check data quality and task complexity)
+   - Hitting architecture limitations (consider a different model)
+
+3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
+   - The model performs poorly on new, similar examples
+   - The outputs lack diversity
+   - The responses are too similar to training examples
+
+<Tip warning={true}>
+
+Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
+
+</Tip>
+
+## Training Configuration
+
+We will configure SFT trainer with the following parameters:
+
+| Parameter | Description |
+|-----------|-------------|
+| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
+| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
+| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
+| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
+| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
+| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
+| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
+| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+
+In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+
+## Training and Evaluation
+
+Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
+
+- Iterating over the dataset
+- Computing the loss
+- Updating the model's parameters
+- Regular evaluation on a validation set
+
+Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
+
+## `SFTTrainer` from Transformer Reinforcement Learning
+
+Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
+
+Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
+
+```python
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer
+
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+
+training_args = SFTConfig(
+    max_seq_length=512,
+    output_dir="/tmp",
+)
+
+trainer = SFTTrainer(
+    model_name="HuggingFaceTB/SmolLM2-135M",
+    train_dataset=dataset,
+    args=training_args,
+)
+trainer.train()
+```
+
+Just like in `transformers`, we work through the following steps:
+
+1. Load the dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
+<Tip>
+
+✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+
+For this exercise, you'll need to:
+1. Load and prepare your chosen dataset
+2. Configure the SFTTrainer with appropriate parameters
+3. Train the model and monitor its progress
+4. Save and evaluate the fine-tuned model
+
+</Tip>
+
+# Supervised Fine-Tuning with SFTTrainer
+
+Let's dive into the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+
+## Load the base model
+
+Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
+
+```python
+# Import necessary libraries
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer, setup_chat_format
+import torch
+
+# Set the device to use for training
+device = (
+    "cuda"
+    if torch.cuda.is_available()
+    else "mps" if torch.backends.mps.is_available() else "cpu"
+)
+
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
+).to(device)
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
+
+# Set up the chat format
+model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
+```
+
+## Generate with the base model
+
+First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
+
+```python
+# Let's test the base model before training
+prompt = "Write a haiku about programming"
+
+# Format with template
+messages = [{"role": "user", "content": prompt}]
+formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+
+# Generate response
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+```
+
+## Dataset Preparation
+
+We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
+
+**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+
+```python
+dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+```
+
+## Configuring the SFTTrainer
+
+The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
+
+```python
+# Configure the SFTTrainer
+sft_config = SFTConfig(
+    output_dir="./sft_output",
+    max_steps=1000,  # Adjust based on dataset size and desired training duration
+    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
+    learning_rate=5e-5,  # Common starting point for fine-tuning
+    logging_steps=10,  # Frequency of logging training metrics
+    save_steps=100,  # Frequency of saving model checkpoints
+    evaluation_strategy="steps",  # Evaluate the model at regular intervals
+    eval_steps=50,  # Frequency of evaluation
+    use_mps_device=(
+        True if device == "mps" else False
+    ),  # Use MPS for mixed precision training
+    hub_model_id=finetune_name,  # Set a unique name for your model
+)
+
+# Initialize the SFTTrainer
+trainer = SFTTrainer(
+    model=model,
+    args=sft_config,
+    train_dataset=ds["train"],
+    tokenizer=tokenizer,
+    eval_dataset=ds["test"],
+)
+```
+
+## Training the model
+
+With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+
+```python
+trainer.train()
+```
+
+
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/nlp_course_sft_loss_graphic.png" alt="SFTTrainer Training" />
+
+## 💐 Nice work!
+
+This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
+
+- Try this notebook on a harder difficulty
+- Review a colleagues PR
+- Improve the course material via an Issue or PR.
+
+
+## Resources
+
+- [TRL Documentation](https://huggingface.co/docs/trl)
+- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index 183a9fa06..28583bb58 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -1,254 +1,166 @@
-<CourseFloatingBanner chapter={2}
-  classNames="absolute z-10 right-0 top-0"
-  notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
-]} />
+# LoRA (Low-Rank Adaptation)
 
-# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
 
-In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
+## Understanding LoRA
 
-## Dataset Preparation
+LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685).
 
-The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.
 
-<iframe
-  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
-  frameborder="0"
-  width="100%"
-  height="360px"
-></iframe>
+## Key advantages of LoRA
 
-## Understanding Training Dynamics
+1. **Memory Efficiency**: 
+   - Only adapter parameters are stored in GPU memory
+   - Base model weights remain frozen and can be loaded in lower precision
+   - Enables fine-tuning of large models on consumer GPUs
 
-When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves.
+2. **Training Features**:
+   - Native PEFT/LoRA integration with minimal setup
+   - Support for QLoRA (Quantized LoRA) for even better memory efficiency
 
-### Loss Patterns
+3. **Adapter Management**:
+   - Adapter weight saving during checkpoints
+   - Features to merge adapters back into base model
 
-The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data.
+## Loading LoRA Adapters with PEFT
 
-### The Path to Convergence
+PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
 
-As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset.
+Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
 
-### Monitoring Training Progress
-
-<div class="flex justify-center">
-    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves showing healthy convergence" width="600"/>
-</div>
-
-The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
-
-### Warning Signs to Watch For
-
-Several patterns in the loss curves can indicate potential issues:
+```python
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
 
-1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
-   - Reducing the model size or training time
-   - Adding regularization
-   - Increasing the dataset size
-   - Using techniques like early stopping
+base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
+peft_model_id = "<peft_adapter_id>"
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+```
 
-2. If the loss doesn't show significant improvement, the model might be:
-   - Learning too slowly (try increasing the learning rate)
-   - Struggling with the task (check data quality and task complexity)
-   - Hitting architecture limitations (consider a different model)
+<!-- TODO: Add image -->
+![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png)
 
-3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
-   - The model performs poorly on new, similar examples
-   - The outputs lack diversity
-   - The responses are too similar to training examples
+## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA
 
-<Tip warning={true}>
+The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
 
-Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
+We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
 
-</Tip>
+1. Define the LoRA configuration (rank, alpha, dropout)
+2. Create the SFTTrainer with PEFT config
+3. Train and save the adapter weights
 
-## Training Configuration
+## LoRA Configuration
 
-We will configure SFT trainer with the following parameters:
+Let's walk through the LoRA configuration and key parameters.
 
 | Parameter | Description |
 |-----------|-------------|
-| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
-| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
-| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
-| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
-| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
-| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
-| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
-| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. |
+| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. |
+| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. |
+| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
+| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
 
-In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
 
-## Training and Evaluation
+## Using TRL with PEFT
 
-Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
-
-- Iterating over the dataset
-- Computing the loss
-- Updating the model's parameters
-- Regular evaluation on a validation set
-
-Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
-
-## `SFTTrainer` from Transformer Reinforcement Learning
-
-Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
-
-Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
+PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
 
 ```python
-from datasets import load_dataset
-from trl import SFTConfig, SFTTrainer
+from peft import LoraConfig
+
+# TODO: Configure LoRA parameters
+# r: rank dimension for LoRA update matrices (smaller = more compression)
+rank_dimension = 6
+# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
+lora_alpha = 8
+# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
+lora_dropout = 0.05
+
+peft_config = LoraConfig(
+    r=rank_dimension,  # Rank dimension - typically between 4-32
+    lora_alpha=lora_alpha,  # LoRA scaling factor - typically 2x rank
+    lora_dropout=lora_dropout,  # Dropout probability for LoRA layers
+    bias="none",  # Bias type for LoRA. the corresponding biases will be updated during training.
+    target_modules="all-linear",  # Which modules to apply LoRA to
+    task_type="CAUSAL_LM",  # Task type for model architecture
+)
+```
 
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. 
 
-training_args = SFTConfig(
-    max_seq_length=512,
-    output_dir="/tmp",
-)
+We will also need to define the `SFTTrainer` with the LoRA configuration.
 
+```python
+# Create SFTTrainer with LoRA configuration
 trainer = SFTTrainer(
-    model_name="HuggingFaceTB/SmolLM2-135M",
-    train_dataset=dataset,
-    args=training_args,
+    model=model,
+    args=args,
+    train_dataset=dataset["train"],
+    peft_config=lora_config,  # LoRA configuration
+    max_seq_length=max_seq_length,  # Maximum sequence length
+    tokenizer=tokenizer,
 )
-trainer.train()
 ```
 
-Just like in `transformers`, we work through the following steps:
-
-1. Load the dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
-
 <Tip>
 
-✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
-
-For this exercise, you'll need to:
-1. Load and prepare your chosen dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
+✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
 </Tip>
 
-# Supervised Fine-Tuning with SFTTrainer
-
-In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
-
-## Load the base model
-
-Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
-
-```python
-# Import necessary libraries
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from datasets import load_dataset
-from trl import SFTConfig, SFTTrainer, setup_chat_format
-import torch
-
-# Set the device to use for training
-device = (
-    "cuda"
-    if torch.cuda.is_available()
-    else "mps" if torch.backends.mps.is_available() else "cpu"
-)
+## Merging LoRA Adapters
 
-# Load the model and tokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
-).to(device)
-tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
+After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
 
-# Set up the chat format
-model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
-```
+The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. 
 
-## Generate with the base model
+Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. 
 
-First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
+## Merging Implementation
 
-```python
-# Let's test the base model before training
-prompt = "Write a haiku about programming"
-
-# Format with template
-messages = [{"role": "user", "content": prompt}]
-formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
-
-# Generate response
-inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
-outputs = model.generate(**inputs, max_new_tokens=100)
-```
-
-## Dataset Preparation
-
-We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
-
-**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it:
 
 ```python
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
-```
-
-## Configuring the SFTTrainer
-
-The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
+import torch
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
 
-```python
-# Configure the SFTTrainer
-sft_config = SFTConfig(
-    output_dir="./sft_output",
-    max_steps=1000,  # Adjust based on dataset size and desired training duration
-    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
-    learning_rate=5e-5,  # Common starting point for fine-tuning
-    logging_steps=10,  # Frequency of logging training metrics
-    save_steps=100,  # Frequency of saving model checkpoints
-    evaluation_strategy="steps",  # Evaluate the model at regular intervals
-    eval_steps=50,  # Frequency of evaluation
-    use_mps_device=(
-        True if device == "mps" else False
-    ),  # Use MPS for mixed precision training
-    hub_model_id=finetune_name,  # Set a unique name for your model
+# 1. Load the base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "base_model_name", torch_dtype=torch.float16, device_map="auto"
 )
 
-# Initialize the SFTTrainer
-trainer = SFTTrainer(
-    model=model,
-    args=sft_config,
-    train_dataset=ds["train"],
-    tokenizer=tokenizer,
-    eval_dataset=ds["test"],
+# 2. Load the PEFT model with adapter
+peft_model = PeftModel.from_pretrained(
+    base_model, "path/to/adapter", torch_dtype=torch.float16
 )
-```
 
-## Training the model
+# 3. Merge adapter weights with base model
+merged_model = peft_model.merge_and_unload()
+```
 
-With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer:
 
 ```python
-trainer.train()
+# Save both model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("base_model_name")
+merged_model.save_pretrained("path/to/save/merged_model")
+tokenizer.save_pretrained("path/to/save/merged_model")
 ```
 
+<Tip>
 
+✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
 
-<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/nlp_course_sft_loss_graphic.png" alt="SFTTrainer Training" />
-
-## 💐 Nice work!
-
-This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
-
-- Try this notebook on a harder difficulty
-- Review a colleagues PR
-- Improve the course material via an Issue or PR.
+</Tip>
 
 
-## Resources
+# Resources
 
-- [TRL Documentation](https://huggingface.co/docs/trl)
-- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
+- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685)
+- [PEFT Documentation](https://huggingface.co/docs/peft)
+- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft)
\ No newline at end of file
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index 28583bb58..d02183d40 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -1,166 +1,245 @@
-# LoRA (Low-Rank Adaptation)
+# Evaluation
 
-Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
+With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
 
-## Understanding LoRA
+## Automatic Benchmarks
 
-LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685).
+Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy.
 
-LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.
+## Understanding Automatic Benchmarks
 
-## Key advantages of LoRA
+Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results.
 
-1. **Memory Efficiency**: 
-   - Only adapter parameters are stored in GPU memory
-   - Base model weights remain frozen and can be loaded in lower precision
-   - Enables fine-tuning of large models on consumer GPUs
+However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases.
 
-2. **Training Features**:
-   - Native PEFT/LoRA integration with minimal setup
-   - Support for QLoRA (Quantized LoRA) for even better memory efficiency
+## General Knowledge Benchmarks
 
-3. **Adapter Management**:
-   - Adapter weight saving during checkpoints
-   - Features to merge adapters back into base model
+MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
 
-## Loading LoRA Adapters with PEFT
+## Reasoning Benchmarks
+BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
 
-PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
+## Language Understanding
 
-Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
+HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
 
-```python
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
+## Alternative Evaluation Approaches
 
-base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
-peft_model_id = "<peft_adapter_id>"
-model = PeftModel.from_pretrained(base_model, peft_model_id)
-```
-
-<!-- TODO: Add image -->
-![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png)
+Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
 
-## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA
+### LLM-as-Judge
 
-The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
+Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
 
-We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
+### Evaluation Arenas
 
-1. Define the LoRA configuration (rank, alpha, dropout)
-2. Create the SFTTrainer with PEFT config
-3. Train and save the adapter weights
+Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
 
-## LoRA Configuration
+### Custom Benchmark Suites
 
-Let's walk through the LoRA configuration and key parameters.
+Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
 
-| Parameter | Description |
-|-----------|-------------|
-| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. |
-| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. |
-| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. |
-| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
-| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
-
-When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
-
-## Using TRL with PEFT
-
-PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
-
-```python
-from peft import LoraConfig
-
-# TODO: Configure LoRA parameters
-# r: rank dimension for LoRA update matrices (smaller = more compression)
-rank_dimension = 6
-# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
-lora_alpha = 8
-# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
-lora_dropout = 0.05
-
-peft_config = LoraConfig(
-    r=rank_dimension,  # Rank dimension - typically between 4-32
-    lora_alpha=lora_alpha,  # LoRA scaling factor - typically 2x rank
-    lora_dropout=lora_dropout,  # Dropout probability for LoRA layers
-    bias="none",  # Bias type for LoRA. the corresponding biases will be updated during training.
-    target_modules="all-linear",  # Which modules to apply LoRA to
-    task_type="CAUSAL_LM",  # Task type for model architecture
-)
-```
+## Creating Your Own Evaluation Strategy
 
-Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. 
+Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
 
-We will also need to define the `SFTTrainer` with the LoRA configuration.
+While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
 
-```python
-# Create SFTTrainer with LoRA configuration
-trainer = SFTTrainer(
-    model=model,
-    args=args,
-    train_dataset=dataset["train"],
-    peft_config=lora_config,  # LoRA configuration
-    max_seq_length=max_seq_length,  # Maximum sequence length
-    tokenizer=tokenizer,
-)
-```
-
-<Tip>
-
-✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
-
-</Tip>
+1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models.
 
-## Merging LoRA Adapters
+2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic?
 
-After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference.
+3. Develop custom evaluation datasets that reflect your actual use case. This might include:
+   - Real user queries from your domain
+   - Common edge cases you've encountered
+   - Examples of particularly challenging scenarios
 
-The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. 
+4. Consider implementing a multi-layered evaluation strategy:
+   - Automated metrics for quick feedback
+   - Human evaluation for nuanced understanding
+   - Domain expert review for specialized applications
+   - A/B testing in controlled environments
 
-Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. 
+# Implementing Evaluation
 
-## Merging Implementation
+In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
 
-After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it:
+LightEval tasks are defined using a specific format:
 
-```python
-import torch
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
-
-# 1. Load the base model
-base_model = AutoModelForCausalLM.from_pretrained(
-    "base_model_name", torch_dtype=torch.float16, device_map="auto"
-)
-
-# 2. Load the PEFT model with adapter
-peft_model = PeftModel.from_pretrained(
-    base_model, "path/to/adapter", torch_dtype=torch.float16
-)
+```
+{suite}|{task}|{num_few_shot}|{auto_reduce}
+```
 
-# 3. Merge adapter weights with base model
-merged_model = peft_model.merge_and_unload()
+| Parameter | Description |
+|-----------|-------------|
+| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
+| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
+| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
+| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
+
+Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
+
+## Example Evaluation Pipeline
+
+Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
+
+Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
+
+```bash
+lighteval vllm \
+    "pretrained=your-model-name" \
+    "mmlu|anatomy|0|0" \
+    "mmlu|high_school_biology|0|0" \
+    "mmlu|high_school_chemistry|0|0" \
+    "mmlu|professional_medicine|0|0" \
+    --max_samples 40 \
+    --batch_size 1 \
+    --output_path "./results" \
+    --save_generations true
 ```
 
-If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer:
+Results are displayed in a tabular format showing:
 
-```python
-# Save both model and tokenizer
-tokenizer = AutoTokenizer.from_pretrained("base_model_name")
-merged_model.save_pretrained("path/to/save/merged_model")
-tokenizer.save_pretrained("path/to/save/merged_model")
 ```
+|                  Task                  |Version|Metric|Value |   |Stderr|
+|----------------------------------------|------:|------|-----:|---|-----:|
+|all                                     |       |acc   |0.3333|±  |0.1169|
+|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
+|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
+|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
+```
+
+Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
 
 <Tip>
 
-✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above.
+✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
 
 </Tip>
 
-
-# Resources
-
-- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685)
-- [PEFT Documentation](https://huggingface.co/docs/peft)
-- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft)
\ No newline at end of file
+# End-of-chapter quiz[[end-of-chapter-quiz]]
+
+<CourseFloatingBanner
+    chapter={11}
+    classNames="absolute z-10 right-0 top-0"
+/>
+
+### 1. What are the main advantages of using automatic benchmarks for model evaluation?
+
+<Question
+	choices={[
+		{
+			text: "They provide perfect real-world performance metrics",
+			explain: "Incorrect! While automatic benchmarks are useful, they don't always translate directly to real-world performance."
+		},
+		{
+			text: "They allow for standardized comparison between models and provide reproducible results",
+			explain: "Correct! This is one of the key benefits of automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They eliminate the need for any other form of evaluation",
+			explain: "Incorrect! Automatic benchmarks should be part of a comprehensive evaluation strategy, not the only method."
+		}
+	]}
+/>
+
+### 2. Which benchmark specifically tests knowledge across 57 different subjects?
+
+<Question
+	choices={[
+		{
+			text: "BBH (Big Bench Hard)",
+			explain: "Incorrect! BBH focuses on complex reasoning tasks, not broad subject knowledge."
+		},
+		{
+			text: "GSM8K",
+			explain: "Incorrect! GSM8K specifically targets mathematical problem-solving."
+		},
+		{
+			text: "MMLU",
+			explain: "Correct! MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities.",
+			correct: true
+		}
+	]}
+/>
+
+### 3. What is LLM-as-Judge?
+
+<Question
+	choices={[
+		{
+			text: "Using one language model to evaluate another's outputs",
+			explain: "Correct! This is an alternative evaluation approach that can provide more nuanced feedback.",
+			correct: true
+		},
+		{
+			text: "A benchmark that tests judicial reasoning",
+			explain: "Incorrect! LLM-as-Judge refers to using one model to evaluate another, not testing judicial reasoning."
+		},
+		{
+			text: "A method for training models on legal datasets",
+			explain: "Incorrect! This isn't related to training on legal data, but rather using one model to evaluate another's outputs."
+		}
+	]}
+/>
+
+### 4. What should be included in a comprehensive evaluation strategy?
+
+<Question
+	choices={[
+		{
+			text: "Only standard benchmarks",
+			explain: "Incorrect! A comprehensive strategy should include multiple evaluation methods."
+		},
+		{
+			text: "Standard benchmarks, custom evaluation datasets, and domain-specific testing",
+			explain: "Correct! A comprehensive strategy should include multiple layers of evaluation.",
+			correct: true
+		},
+		{
+			text: "Only custom datasets specific to your use case",
+			explain: "Incorrect! While custom datasets are important, they shouldn't be the only evaluation method."
+		}
+	]}
+/>
+
+### 5. What is a limitation of automatic benchmarks?
+
+<Question
+	choices={[
+		{
+			text: "They are too expensive to run",
+			explain: "Incorrect! Cost isn't typically the main limitation of automatic benchmarks."
+		},
+		{
+			text: "Benchmark performance doesn't always translate directly to real-world effectiveness",
+			explain: "Correct! This is a key limitation to keep in mind when using automatic benchmarks.",
+			correct: true
+		},
+		{
+			text: "They can only evaluate small models",
+			explain: "Incorrect! Automatic benchmarks can be used to evaluate models of various sizes."
+		}
+	]}
+/>
+
+### 6. What is the purpose of creating custom evaluation datasets?
+
+<Question
+	choices={[
+		{
+			text: "To reflect your specific use case and include real user queries from your domain",
+			explain: "Correct! Custom datasets help ensure evaluation is relevant to your specific needs.",
+			correct: true
+		},
+		{
+			text: "To replace standard benchmarks entirely",
+			explain: "Incorrect! Custom datasets should complement, not replace, standard benchmarks."
+		},
+		{
+			text: "To make evaluation easier",
+			explain: "Incorrect! Creating custom datasets requires additional effort but provides more relevant evaluation."
+		}
+	]}
+/>
diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx
index d02183d40..093de47d6 100644
--- a/chapters/en/chapter11/6.mdx
+++ b/chapters/en/chapter11/6.mdx
@@ -1,245 +1,13 @@
-# Evaluation
+# Conclusion
 
-With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
+In this chapter, we explored the essential components of fine-tuning language models:
 
-## Automatic Benchmarks
+1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
 
-Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy.
+2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
 
-## Understanding Automatic Benchmarks
+3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
 
-Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results.
+4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
 
-However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases.
-
-## General Knowledge Benchmarks
-
-MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
-
-## Reasoning Benchmarks
-BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
-
-## Language Understanding
-
-HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
-
-## Alternative Evaluation Approaches
-
-Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:
-
-### LLM-as-Judge
-
-Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations.
-
-### Evaluation Arenas
-
-Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
-
-### Custom Benchmark Suites
-
-Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
-
-## Creating Your Own Evaluation Strategy
-
-Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
-
-While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
-
-1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models.
-
-2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic?
-
-3. Develop custom evaluation datasets that reflect your actual use case. This might include:
-   - Real user queries from your domain
-   - Common edge cases you've encountered
-   - Examples of particularly challenging scenarios
-
-4. Consider implementing a multi-layered evaluation strategy:
-   - Automated metrics for quick feedback
-   - Human evaluation for nuanced understanding
-   - Domain expert review for specialized applications
-   - A/B testing in controlled environments
-
-# Implementing Evaluation
-
-In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
-
-LightEval tasks are defined using a specific format:
-
-```
-{suite}|{task}|{num_few_shot}|{auto_reduce}
-```
-
-| Parameter | Description |
-|-----------|-------------|
-| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
-| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
-| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
-| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
-
-Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
-
-## Example Evaluation Pipeline
-
-Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
-
-Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
-
-```bash
-lighteval vllm \
-    "pretrained=your-model-name" \
-    "mmlu|anatomy|0|0" \
-    "mmlu|high_school_biology|0|0" \
-    "mmlu|high_school_chemistry|0|0" \
-    "mmlu|professional_medicine|0|0" \
-    --max_samples 40 \
-    --batch_size 1 \
-    --output_path "./results" \
-    --save_generations true
-```
-
-Results are displayed in a tabular format showing:
-
-```
-|                  Task                  |Version|Metric|Value |   |Stderr|
-|----------------------------------------|------:|------|-----:|---|-----:|
-|all                                     |       |acc   |0.3333|±  |0.1169|
-|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
-|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
-|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
-```
-
-Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
-
-<Tip>
-
-✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
-
-</Tip>
-
-# End-of-chapter quiz[[end-of-chapter-quiz]]
-
-<CourseFloatingBanner
-    chapter={11}
-    classNames="absolute z-10 right-0 top-0"
-/>
-
-### 1. What are the main advantages of using automatic benchmarks for model evaluation?
-
-<Question
-	choices={[
-		{
-			text: "They provide perfect real-world performance metrics",
-			explain: "Incorrect! While automatic benchmarks are useful, they don't always translate directly to real-world performance."
-		},
-		{
-			text: "They allow for standardized comparison between models and provide reproducible results",
-			explain: "Correct! This is one of the key benefits of automatic benchmarks.",
-			correct: true
-		},
-		{
-			text: "They eliminate the need for any other form of evaluation",
-			explain: "Incorrect! Automatic benchmarks should be part of a comprehensive evaluation strategy, not the only method."
-		}
-	]}
-/>
-
-### 2. Which benchmark specifically tests knowledge across 57 different subjects?
-
-<Question
-	choices={[
-		{
-			text: "BBH (Big Bench Hard)",
-			explain: "Incorrect! BBH focuses on complex reasoning tasks, not broad subject knowledge."
-		},
-		{
-			text: "GSM8K",
-			explain: "Incorrect! GSM8K specifically targets mathematical problem-solving."
-		},
-		{
-			text: "MMLU",
-			explain: "Correct! MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities.",
-			correct: true
-		}
-	]}
-/>
-
-### 3. What is LLM-as-Judge?
-
-<Question
-	choices={[
-		{
-			text: "Using one language model to evaluate another's outputs",
-			explain: "Correct! This is an alternative evaluation approach that can provide more nuanced feedback.",
-			correct: true
-		},
-		{
-			text: "A benchmark that tests judicial reasoning",
-			explain: "Incorrect! LLM-as-Judge refers to using one model to evaluate another, not testing judicial reasoning."
-		},
-		{
-			text: "A method for training models on legal datasets",
-			explain: "Incorrect! This isn't related to training on legal data, but rather using one model to evaluate another's outputs."
-		}
-	]}
-/>
-
-### 4. What should be included in a comprehensive evaluation strategy?
-
-<Question
-	choices={[
-		{
-			text: "Only standard benchmarks",
-			explain: "Incorrect! A comprehensive strategy should include multiple evaluation methods."
-		},
-		{
-			text: "Standard benchmarks, custom evaluation datasets, and domain-specific testing",
-			explain: "Correct! A comprehensive strategy should include multiple layers of evaluation.",
-			correct: true
-		},
-		{
-			text: "Only custom datasets specific to your use case",
-			explain: "Incorrect! While custom datasets are important, they shouldn't be the only evaluation method."
-		}
-	]}
-/>
-
-### 5. What is a limitation of automatic benchmarks?
-
-<Question
-	choices={[
-		{
-			text: "They are too expensive to run",
-			explain: "Incorrect! Cost isn't typically the main limitation of automatic benchmarks."
-		},
-		{
-			text: "Benchmark performance doesn't always translate directly to real-world effectiveness",
-			explain: "Correct! This is a key limitation to keep in mind when using automatic benchmarks.",
-			correct: true
-		},
-		{
-			text: "They can only evaluate small models",
-			explain: "Incorrect! Automatic benchmarks can be used to evaluate models of various sizes."
-		}
-	]}
-/>
-
-### 6. What is the purpose of creating custom evaluation datasets?
-
-<Question
-	choices={[
-		{
-			text: "To reflect your specific use case and include real user queries from your domain",
-			explain: "Correct! Custom datasets help ensure evaluation is relevant to your specific needs.",
-			correct: true
-		},
-		{
-			text: "To replace standard benchmarks entirely",
-			explain: "Incorrect! Custom datasets should complement, not replace, standard benchmarks."
-		},
-		{
-			text: "To make evaluation easier",
-			explain: "Incorrect! Creating custom datasets requires additional effort but provides more relevant evaluation."
-		}
-	]}
-/>
+These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
index 093de47d6..690547c5f 100644
--- a/chapters/en/chapter11/7.mdx
+++ b/chapters/en/chapter11/7.mdx
@@ -1,13 +1,16 @@
-# Conclusion
+# Exam Time!
 
-In this chapter, we explored the essential components of fine-tuning language models:
+It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter.
 
-1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
+To take the quiz, you will need to follow these steps:
 
-2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
+1. Sign in to your Hugging Face account.
+2. Answer the questions in the quiz.
+3. Submit your answers.
 
-3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
-
-4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
-
-These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
+<iframe
+	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx
deleted file mode 100644
index 690547c5f..000000000
--- a/chapters/en/chapter11/8.mdx
+++ /dev/null
@@ -1,16 +0,0 @@
-# Exam Time!
-
-It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter.
-
-To take the quiz, you will need to follow these steps:
-
-1. Sign in to your Hugging Face account.
-2. Answer the questions in the quiz.
-3. Submit your answers.
-
-<iframe
-	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
-	frameborder="0"
-	width="850"
-	height="450"
-></iframe>

From 549612b518471e069969b579dee072b745c5d4fa Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Thu, 6 Feb 2025 13:14:00 +0100
Subject: [PATCH 11/30] followinf readthrough: simplify and add more tips

---
 chapters/en/chapter11/2.mdx | 164 ++++++-----
 chapters/en/chapter11/3.mdx | 530 +++++++++++++++++-------------------
 chapters/en/chapter11/4.mdx |   2 +
 chapters/en/chapter11/5.mdx |   6 +-
 4 files changed, 333 insertions(+), 369 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 8798e1d05..560db674f 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -6,17 +6,29 @@
 
 # Chat Templates
 
+## Introduction
 Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns.
 
-## Base Models vs Instruct Models
+<Tip>
+Chat templates are crucial for:
+- Maintaining consistent conversation structure
+- Ensuring proper role identification
+- Managing context across multiple turns
+- Supporting advanced features like tool use
+</Tip>
+
+## Model Types and Templates
 
+### Base Models vs Instruct Models
 A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant.
 
 To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
 
-It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template.
+<Tip warning={true}>
+When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior.
+</Tip>
 
-## Common Chat Template Formats
+### Common Template Formats
 
 Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models:
 
@@ -94,25 +106,9 @@ mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False)
 qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False)
 ```
 
-## Understanding Chat Templates
-
-At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs.
-
-### Basic Chat Template Example
-
-Here's a basic example of a chat template:
-
-```sh
-<|im_start|>user
-Hi there!<|im_end|>
-<|im_start|>assistant
-Nice to meet you!<|im_end|>
-<|im_start|>user
-Can I ask a question?<|im_end|>
-<|im_start|>assistant
-```
+## Working with Templates
 
-### Implementation with Transformers
+### Basic Implementation
 
 The transformers library provides built-in support for chat templates through the `apply_chat_template()` method:
 
@@ -141,8 +137,7 @@ You are a helpful coding assistant.<|im_end|>
 Write a Python function to sort a list<|im_end|>
 ```
 
-### Advanced Chat Templates
-
+### Advanced Features
 Chat templates can handle more complex scenarios, including:
 
 1. **Tool Use**: When models need to interact with external tools or APIs
@@ -150,6 +145,32 @@ Chat templates can handle more complex scenarios, including:
 3. **Function Calling**: For structured function execution
 4. **Multi-turn Context**: For maintaining conversation history
 
+<Tip>
+When implementing advanced features:
+- Test thoroughly with your specific model
+- Handle errors gracefully
+- Monitor token usage carefully
+- Document the expected format for each feature
+</Tip>
+
+For multimodal conversations, chat templates can include image references or base64-encoded images:
+
+```python
+messages = [
+    {
+        "role": "system",
+        "content": "You are a helpful vision assistant that can analyze images.",
+    },
+    {
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "What's in this image?"},
+            {"type": "image", "image_url": "https://example.com/image.jpg"},
+        ],
+    },
+]
+```
+
 Here's an example of a chat template with tool use:
 
 ```python
@@ -179,85 +200,56 @@ messages = [
 ]
 ```
 
-For multimodal conversations, chat templates can include image references or base64-encoded images:
-
-```python
-messages = [
-    {
-        "role": "system",
-        "content": "You are a helpful vision assistant that can analyze images.",
-    },
-    {
-        "role": "user",
-        "content": [
-            {"type": "text", "text": "What's in this image?"},
-            {"type": "image", "image_url": "https://example.com/image.jpg"},
-        ],
-    },
-]
-```
-
-## Working with Chat Templates
+## Best Practices
 
-When working with chat templates, you have several options for processing the conversation:
+### General Guidelines
+When working with chat templates, follow these key practices:
 
-1. Apply the template without tokenization to return the raw formatted string
-2. Apply the template with tokenization to return the token IDs
-3. Add a generation prompt to prepare for model inference
+1. **Consistent Formatting**: Always use the same template format throughout your application
+2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message
+3. **Context Management**: Be mindful of token limits when maintaining conversation history
+4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs
+5. **Validation**: Validate message structure before sending to the model
 
-The tokenizer's `apply_chat_template()` method handles all these cases through its parameters:
+<Tip warning={true}>
+Common pitfalls to avoid:
+- Mixing different template formats in the same application
+- Exceeding token limits with long conversation histories
+- Not properly escaping special characters in messages
+- Forgetting to validate input message structure
+- Ignoring model-specific template requirements
+</Tip>
 
-- `tokenize`: Whether to return token IDs (True) or the formatted string (False)
-- `add_generation_prompt`: Whether to add a prompt for the model to generate a response
+## Hands-on Exercise
 
-## System Messages
+Let's practice implementing chat templates with a real-world example.
 
-System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example:
+<Tip>
+Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml format:
 
+1. Load the dataset:
 ```python
-system_message = {
-    "role": "system",
-    "content": "You are a professional customer service agent. Always be polite, clear, and helpful.",
-}
+from datasets import load_dataset
+dataset = load_dataset("HuggingFaceTB/smoltalk")
 ```
 
-## Conversations
-
-Chat templates can maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:
-
+2. Create a processing function:
 ```python
-conversation = [
-    {"role": "user", "content": "I need help with my order"},
-    {
-        "role": "assistant",
-        "content": "I'd be happy to help. Could you provide your order number?",
-    },
-    {"role": "user", "content": "It's ORDER-123"},
-]
+def convert_to_chatml(example):
+    return {
+        "messages": [
+            {"role": "user", "content": example["input"]},
+            {"role": "assistant", "content": example["output"]}
+        ]
+    }
 ```
 
-## Best Practices
-
-When working with chat templates, consider these best practices:
-
-1. **Consistent Formatting**: Always use the same template format throughout your application
-2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message
-3. **Context Management**: Be mindful of token limits when maintaining conversation history
-4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs
-5. **Validation**: Validate message structure before sending to the model
-
-<Tip>
-
-✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file.
-
-For this exercise, you'll need to:
-1. Load the dataset using the Hugging Face datasets library
-2. Create a processing function that converts the samples into the correct chat format
-3. Apply the chat template using the tokenizer's methods
+3. Apply the chat template using your chosen model's tokenizer
 
+Remember to validate your output format matches your target model's requirements!
 </Tip>
 
-## Resources
+## Additional Resources
 
 - [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
 - [Transformers Documentation](https://huggingface.co/docs/transformers)
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index f84f77557..fb401c0bb 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -6,137 +6,45 @@
 
 # Supervised Fine-Tuning
 
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
-
-Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections.
-
-## Understanding Supervised Fine-Tuning
-
-Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. 
-
-SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
-
-## When to Use Supervised Fine-Tuning
-
-The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
-
-Two core reasons to use SFT are:
-
-1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs.
-
-2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise.
-
-## Quiz
-
-### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)?
-
-<Question
-	choices={[
-		{
-			text: "To train a language model from scratch",
-			explain: "SFT builds upon pre-trained models rather than training from scratch."
-		},
-		{
-			text: "To adapt a pre-trained model to specific tasks or domains while maintaining its foundational knowledge",
-			explain: "Correct! SFT allows models to learn specific tasks while leveraging their pre-trained capabilities.",
-			correct: true
-		},
-		{
-			text: "To compress a large language model into a smaller one",
-			explain: "This is more related to model distillation, not SFT."
-		}
-	]}
-/>
-
-### 2. Which of the following are valid reasons to use SFT?
-
-<Question
-	choices={[
-		{
-			text: "Template Control - ensuring the model generates outputs in a specific format",
-			explain: "Yes! SFT helps enforce specific output structures through training examples.",
-			correct: true
-		},
-		{
-			text: "Domain Adaptation - teaching the model domain-specific knowledge and terminology",
-			explain: "Correct! SFT is excellent for adapting models to specialized domains.",
-			correct: true
-		},
-		{
-			text: "Model Architecture Changes - modifying the underlying structure of the model",
-			explain: "SFT doesn't change the model architecture, it only updates the weights."
-		}
-	]}
-/>
-
-### 3. What is required for effective Supervised Fine-Tuning?
-
-<Question
-	choices={[
-		{
-			text: "A pre-trained language model",
-			explain: "Yes! SFT starts with a pre-trained model as its foundation.",
-			correct: true
-		},
-		{
-			text: "Validated examples of desired input-output behavior",
-			explain: "Correct! Quality training data is crucial for successful SFT.",
-			correct: true
-		},
-		{
-			text: "A high performing reference model",
-			explain: "SFT uses existing architectures rather than creating new ones."
-		}
-	]}
-/>
+This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can 
+adapt the model to perform specific tasks more effectively. 
 
-### 4. How does SFT relate to chat templates?
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
 
-<Question
-	choices={[
-		{
-			text: "SFT can train models to consistently follow specific chat templates",
-			explain: "Correct! SFT helps models learn to generate responses in the desired template format.",
-			correct: true
-		},
-		{
-			text: "Chat templates are not compatible with SFT",
-			explain: "Incorrect! Chat templates are commonly used with SFT for structured outputs."
-		},
-		{
-			text: "SFT automatically creates chat templates",
-			explain: "SFT doesn't create templates, it trains models to use existing templates."
-		}
-	]}
-/>
+## When to Use SFT
+The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors:
 
-### 5. What distinguishes SFT from pre-training?
+### Template Control
+SFT allows precise control over the model's output structure. This is particularly valuable when you need the model to:
+1. Generate responses in a specific chat template format
+2. Follow strict output schemas
+3. Maintain consistent styling across responses
 
-<Question
-	choices={[
-		{
-			text: "SFT uses labeled data for specific tasks",
-			explain: "Yes! SFT requires examples of desired behavior for specific tasks.",
-			correct: true
-		},
-		{
-			text: "SFT is faster than pre-training",
-			explain: "The speed difference isn't a defining characteristic; it depends on various factors."
-		},
-		{
-			text: "SFT requires more data than pre-training",
-			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
-		}
-	]}
-/>
+### Domain Adaptation
+When working in specialized domains, SFT helps align the model with domain-specific requirements by:
+1. Teaching domain terminology and concepts
+2. Enforcing professional standards
+3. Handling technical queries appropriately
+4. Following industry-specific guidelines
 
-# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning
+<Tip>
+Before starting SFT, evaluate whether your use case requires:
+- Precise output formatting
+- Domain-specific knowledge
+- Consistent response patterns
+- Adherence to specific guidelines
 
-In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library.
+This evaluation will help determine if SFT is the right approach for your needs.
+</Tip>
 
 ## Dataset Preparation
 
-The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning.
+The supervised fine-tuning process requires a task-specific dataset structured with input-output pairs. Each pair should consist of:
+1. An input prompt
+2. The expected model response
+3. Any additional context or metadata
+
+The data format must be compatible with your model's chat template. Here's an example dataset suitable for supervised fine-tuning:
 
 <iframe
   src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
@@ -145,55 +53,11 @@ The supervised fine-tuning process involves adjusting a model's weights on a tas
   height="360px"
 ></iframe>
 
-## Understanding Training Dynamics
-
-When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves.
-
-### Loss Patterns
-
-The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data.
-
-### The Path to Convergence
-
-As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset.
-
-### Monitoring Training Progress
-
-<div class="flex justify-center">
-    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves showing healthy convergence" width="600"/>
-</div>
-
-The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
-
-### Warning Signs to Watch For
-
-Several patterns in the loss curves can indicate potential issues:
-
-1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
-   - Reducing the model size or training time
-   - Adding regularization
-   - Increasing the dataset size
-   - Using techniques like early stopping
-
-2. If the loss doesn't show significant improvement, the model might be:
-   - Learning too slowly (try increasing the learning rate)
-   - Struggling with the task (check data quality and task complexity)
-   - Hitting architecture limitations (consider a different model)
-
-3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
-   - The model performs poorly on new, similar examples
-   - The outputs lack diversity
-   - The responses are too similar to training examples
-
-<Tip warning={true}>
-
-Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
-
-</Tip>
-
 ## Training Configuration
 
-We will configure SFT trainer with the following parameters:
+### Parameters
+
+The SFTTrainer configuration requires consideration of several parameters that control the training process:
 
 | Parameter | Description |
 |-----------|-------------|
@@ -206,175 +70,283 @@ We will configure SFT trainer with the following parameters:
 | logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
 | save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
 
-In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance.
+### Core Parameters Explained
 
-## Training and Evaluation
+1. **Training Duration Parameters**:
+   - `num_train_epochs`: Controls total training duration
+   - `max_steps`: Alternative to epochs, sets maximum number of training steps
+   - More epochs allow better learning but risk overfitting
 
-Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes.
+2. **Batch Size Parameters**:
+   - `per_device_train_batch_size`: Determines memory usage and training stability
+   - `gradient_accumulation_steps`: Enables larger effective batch sizes
+   - Larger batches provide more stable gradients but require more memory
 
-- Iterating over the dataset
-- Computing the loss
-- Updating the model's parameters
-- Regular evaluation on a validation set
+3. **Learning Rate Parameters**:
+   - `learning_rate`: Controls size of weight updates
+   - `warmup_ratio`: Portion of training used for learning rate warmup
+   - Too high can cause instability, too low results in slow learning
 
-Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. 
+4. **Monitoring Parameters**:
+   - `logging_steps`: Frequency of metric logging
+   - `eval_steps`: How often to evaluate on validation data
+   - `save_steps`: Frequency of model checkpoint saves
 
-## `SFTTrainer` from Transformer Reinforcement Learning
+<Tip>
+Start with conservative values and adjust based on monitoring:
+- Begin with 1-3 epochs
+- Use smaller batch sizes initially
+- Monitor validation metrics closely
+- Adjust learning rate if training is unstable
+</Tip>
 
-Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter.
+## Implementation with TRL
 
-Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics.
+We will use the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. Here's a complete example using the TRL library:
 
 ```python
 from datasets import load_dataset
 from trl import SFTConfig, SFTTrainer
+import torch
+
+# Set device
+device = "cuda" if torch.cuda.is_available() else "cpu"
 
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
+# Load dataset
+dataset = load_dataset("HuggingFaceTB/smoltalk")
 
+# Configure trainer
 training_args = SFTConfig(
-    max_seq_length=512,
-    output_dir="/tmp",
+    output_dir="./sft_output",
+    max_steps=1000,
+    per_device_train_batch_size=4,
+    learning_rate=5e-5,
+    logging_steps=10,
+    save_steps=100,
+    evaluation_strategy="steps",
+    eval_steps=50
 )
 
+# Initialize trainer
 trainer = SFTTrainer(
-    model_name="HuggingFaceTB/SmolLM2-135M",
-    train_dataset=dataset,
+    model=model,
     args=training_args,
+    train_dataset=dataset["train"],
+    eval_dataset=dataset["test"],
+    tokenizer=tokenizer
 )
+
+# Start training
 trainer.train()
 ```
 
-Just like in `transformers`, we work through the following steps:
+## Monitoring Training Progress
 
-1. Load the dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
+### Understanding Loss Patterns
 
-<Tip>
+Training loss typically follows three distinct phases:
+1. Initial Sharp Drop: Rapid adaptation to new data distribution
+2. Gradual Stabilization: Learning rate slows as model fine-tunes
+3. Convergence: Loss values stabilize, indicating training completion
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/nlp_course_sft_loss_graphic.png" alt="SFTTrainer Training" />
+
+### Metrics to Monitor
 
-✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+Effective monitoring involves tracking quantitative metrics, and evaluating qualitative metrics. Available metrics are:
 
-For this exercise, you'll need to:
-1. Load and prepare your chosen dataset
-2. Configure the SFTTrainer with appropriate parameters
-3. Train the model and monitor its progress
-4. Save and evaluate the fine-tuned model
+- Training loss
+- Validation loss
+- Learning rate progression
+- Gradient norms
 
+<Tip warning={true}>
+Watch for these warning signs during training:
+1. Validation loss increasing while training loss decreases (overfitting)
+2. No significant improvement in loss values (underfitting)
+3. Extremely low loss values (potential memorization)
+4. Inconsistent output formatting (template learning issues)
 </Tip>
 
-# Supervised Fine-Tuning with SFTTrainer
+### The Path to Convergence
 
-Let's dive into the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model.
+As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting 
+the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset.
 
-## Load the base model
+### Monitoring Training Progress
 
-Here we'll load the base model and tokenizer. We'll also set up the chat format for the model.
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves 
+    showing healthy convergence" width="600"/>
+</div>
 
-```python
-# Import necessary libraries
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from datasets import load_dataset
-from trl import SFTConfig, SFTTrainer, setup_chat_format
-import torch
+The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern 
+indicates the model is learning effectively while maintaining generalization ability.
 
-# Set the device to use for training
-device = (
-    "cuda"
-    if torch.cuda.is_available()
-    else "mps" if torch.backends.mps.is_available() else "cpu"
-)
+### Warning Signs to Watch For
 
-# Load the model and tokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
-).to(device)
-tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
+Several patterns in the loss curves can indicate potential issues:
 
-# Set up the chat format
-model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)
-```
+1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
+   - Reducing the model size or training time
+   - Adding regularization
+   - Increasing the dataset size
+   - Using techniques like early stopping
 
-## Generate with the base model
+2. If the loss doesn't show significant improvement, the model might be:
+   - Learning too slowly (try increasing the learning rate)
+   - Struggling with the task (check data quality and task complexity)
+   - Hitting architecture limitations (consider a different model)
 
-First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model.
+3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
+   - The model performs poorly on new, similar examples
+   - The outputs lack diversity
+   - The responses are too similar to training examples
 
-```python
-# Let's test the base model before training
-prompt = "Write a haiku about programming"
+<Tip warning={true}>
 
-# Format with template
-messages = [{"role": "user", "content": prompt}]
-formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular 
+qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
 
-# Generate response
-inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
-outputs = model.generate(**inputs, max_new_tokens=100)
-```
+</Tip>
 
-## Dataset Preparation
+## Evaluation after SFT
 
-We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.
+In section [11.4](/en/chapter11/4) we will learn how to evaluate the model using benchmark datasets. For now, we will focus on the qualitative evaluation of the model.
 
-**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,.
+After completing SFT, consider these follow-up actions:
 
-```python
-dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
-```
+1. Evaluate the model thoroughly on held-out test data
+2. Validate template adherence across various inputs
+3. Test domain-specific knowledge retention
+4. Monitor real-world performance metrics
 
-## Configuring the SFTTrainer
+<Tip>
+Document your training process, including:
+- Dataset characteristics
+- Training parameters
+- Performance metrics
+- Known limitations
+This documentation will be valuable for future model iterations.
+</Tip>
 
-The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.
+## Additional Resources
 
-```python
-# Configure the SFTTrainer
-sft_config = SFTConfig(
-    output_dir="./sft_output",
-    max_steps=1000,  # Adjust based on dataset size and desired training duration
-    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
-    learning_rate=5e-5,  # Common starting point for fine-tuning
-    logging_steps=10,  # Frequency of logging training metrics
-    save_steps=100,  # Frequency of saving model checkpoints
-    evaluation_strategy="steps",  # Evaluate the model at regular intervals
-    eval_steps=50,  # Frequency of evaluation
-    use_mps_device=(
-        True if device == "mps" else False
-    ),  # Use MPS for mixed precision training
-    hub_model_id=finetune_name,  # Set a unique name for your model
-)
+- [TRL Documentation](https://huggingface.co/docs/trl)
+- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
+- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training)
 
-# Initialize the SFTTrainer
-trainer = SFTTrainer(
-    model=model,
-    args=sft_config,
-    train_dataset=ds["train"],
-    tokenizer=tokenizer,
-    eval_dataset=ds["test"],
-)
-```
+## Quiz
 
-## Training the model
+### 1. What parameters control the training duration in SFT?
 
-With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.
+<Question
+	choices={[
+		{
+			text: "num_train_epochs and max_steps",
+			explain: "Correct! These parameters determine how long the model will train, either by number of epochs or total steps.",
+			correct: true
+		},
+		{
+			text: "batch_size and learning_rate",
+			explain: "While these affect training, they don't directly control the duration."
+		},
+		{
+			text: "gradient_checkpointing and warmup_ratio",
+			explain: "These parameters affect training efficiency and stability, not duration."
+		}
+	]}
+/>
 
-```python
-trainer.train()
-```
+### 2. Which pattern in the loss curves indicates potential overfitting?
+
+<Question
+    choices={[
+        {
+            text: "Validation loss increases while training loss continues to decrease",
+            explain: "Correct! This divergence between training and validation loss is a classic sign of overfitting.",
+            correct: true
+        },
+        {
+            text: "Both training and validation loss decrease steadily",
+            explain: "This pattern actually indicates healthy training."
+        },
+        {
+            text: "Training loss remains constant while validation loss decreases",
+            explain: "This would be an unusual pattern and doesn't indicate overfitting."
+        }
+    ]}
+/>
 
+### 3. What is gradient_accumulation_steps used for?
 
+<Question
+    choices={[
+        {
+            text: "To increase effective batch size without using more memory",
+            explain: "Correct! It accumulates gradients across multiple forward passes before updating weights.",
+            correct: true
+        },
+        {
+            text: "To save checkpoints during training",
+            explain: "This is handled by save_steps and save_strategy parameters."
+        },
+        {
+            text: "To control the learning rate schedule",
+            explain: "Learning rate scheduling is controlled by learning_rate and warmup_ratio."
+        }
+    ]}
+/>
 
-<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/nlp_course_sft_loss_graphic.png" alt="SFTTrainer Training" />
+### 4. What should you monitor during SFT training?
 
-## 💐 Nice work!
+<Question
+    choices={[
+        {
+            text: "Both quantitative metrics and qualitative outputs",
+            explain: "Correct! Monitoring both types of metrics helps catch all potential issues.",
+            correct: true
+        },
+        {
+            text: "Only the training loss",
+            explain: "Training loss alone isn't sufficient to ensure good model behavior."
+        },
+        {
+            text: "Only the model's output quality",
+            explain: "While important, qualitative evaluation alone misses important training dynamics."
+        }
+    ]}
+/>
+
+### 5. What indicates healthy convergence during training?
 
-This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:
+<Question
+    choices={[
+        {
+            text: "A small gap between training and validation loss",
+            explain: "Correct! This indicates the model is learning generalizable patterns.",
+            correct: true
+        },
+        {
+            text: "Training loss reaching zero",
+            explain: "Extremely low loss values might indicate memorization rather than learning."
+        },
+        {
+            text: "Validation loss being lower than training loss",
+            explain: "This would be unusual and might indicate problems with the validation set."
+        }
+    ]}
+/>
 
-- Try this notebook on a harder difficulty
-- Review a colleagues PR
-- Improve the course material via an Issue or PR.
+## 💐 Nice work!
 
+You've learned how to fine-tune models using SFT! To continue your learning:
+1. Try the notebook with different parameters
+2. Experiment with other datasets
+3. Contribute improvements to the course material
 
-## Resources
+## Additional Resources
 
 - [TRL Documentation](https://huggingface.co/docs/trl)
-- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
\ No newline at end of file
+- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
+- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training)
\ No newline at end of file
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index 28583bb58..48c3bebbf 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -63,7 +63,9 @@ Let's walk through the LoRA configuration and key parameters.
 | `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. |
 | `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. |
 
+<Tip>
 When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.
+</Tip>
 
 ## Using TRL with PEFT
 
diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index d02183d40..471e48751 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -39,9 +39,7 @@ Platforms like Anthropic's Constitutional AI Arena allow models to interact and
 
 Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions.
 
-## Creating Your Own Evaluation Strategy
-
-Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case.
+## Custom Evaluation
 
 While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach:
 
@@ -60,7 +58,7 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only
    - Domain expert review for specialized applications
    - A/B testing in controlled environments
 
-# Implementing Evaluation
+## Implementing Custom Evaluations
 
 In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
 

From 881865e67bf439cdce5ac5f714d822bcd06c4941 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Thu, 6 Feb 2025 16:07:02 +0100
Subject: [PATCH 12/30] format code blocks

---
 chapters/en/chapter11/2.mdx | 3 ++-
 chapters/en/chapter11/3.mdx | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 560db674f..509b33e7d 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -230,6 +230,7 @@ Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml f
 1. Load the dataset:
 ```python
 from datasets import load_dataset
+
 dataset = load_dataset("HuggingFaceTB/smoltalk")
 ```
 
@@ -239,7 +240,7 @@ def convert_to_chatml(example):
     return {
         "messages": [
             {"role": "user", "content": example["input"]},
-            {"role": "assistant", "content": example["output"]}
+            {"role": "assistant", "content": example["output"]},
         ]
     }
 ```
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index fb401c0bb..351be5182 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -124,7 +124,7 @@ training_args = SFTConfig(
     logging_steps=10,
     save_steps=100,
     evaluation_strategy="steps",
-    eval_steps=50
+    eval_steps=50,
 )
 
 # Initialize trainer
@@ -133,7 +133,7 @@ trainer = SFTTrainer(
     args=training_args,
     train_dataset=dataset["train"],
     eval_dataset=dataset["test"],
-    tokenizer=tokenizer
+    tokenizer=tokenizer,
 )
 
 # Start training

From a386bbf400281e99a9b91891491354ebffe9d848 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:15:33 +0000
Subject: [PATCH 13/30] suggestions in intro page

---
 chapters/en/chapter11/1.mdx | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
index b5e75efc6..99c8fe115 100644
--- a/chapters/en/chapter11/1.mdx
+++ b/chapters/en/chapter11/1.mdx
@@ -1,6 +1,6 @@
 # Supervised Fine-Tuning
 
-This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. We will separate this chapter into three sections:
+This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. Majority of the LLMs that people interact with on platforms like ChatGPT go through some form of SFT because it's a robust way to adapt models to common use cases. We will separate this chapter into three sections:
 
 ## 1️⃣ Chat Templates
 
@@ -12,7 +12,7 @@ Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained lang
 
 ## 3️⃣ Low Rank Adaptation (LoRA)
 
-Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge.
+Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge. One of the key benefits of LoRA is the significant memory savings it offers, making it possible to fine-tune large models on hardware with limited resources.
 
 ## 4️⃣ Evaluation
 
@@ -29,5 +29,5 @@ Evaluation is a crucial step in the fine-tuning process. It allows us to measure
 - [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer)
 - [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290)
 - [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning)
-- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)
+- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://github.com/huggingface/alignment-handbook)  
 - [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)

From 6cefbc91bd65c0840b1d58cdc112102827e0da6e Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:25:25 +0000
Subject: [PATCH 14/30] respond to suggestions on chat templates page

---
 chapters/en/chapter11/2.mdx | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 509b33e7d..d5114f5f1 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -20,12 +20,14 @@ Chat templates are crucial for:
 ## Model Types and Templates
 
 ### Base Models vs Instruct Models
-A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant.
+A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, [`SmolLM2-135M`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a base model, while [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) is its instruction-tuned variant.  
+
+Instuction tuneds models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling.
 
 To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
 
 <Tip warning={true}>
-When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior.
+When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses [this configuration](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/main/tokenizer_config.json).
 </Tip>
 
 ### Common Template Formats
@@ -43,15 +45,7 @@ messages = [
 ]
 ```
 
-This is using the `mistral` template format:
-
-```sh
-<s>[INST] You are a helpful assistant. [/INST]
-Hi! How can I help you today?</s>
-[INST] Hello! [/INST]
-```
-
-This is the chat template for a Qwen 2 model:
+This is the ChatML template used in models like SmolLM2 and Qwen 2:
 
 ```sh
 <|im_start|>system
@@ -65,6 +59,14 @@ What's the weather?<|im_end|>
 <|im_start|>assistant
 ```
 
+This is using the `mistral` template format:
+
+```sh
+<s>[INST] You are a helpful assistant. [/INST]
+Hi! How can I help you today?</s>
+[INST] Hello! [/INST]
+```
+
 Key differences between these formats include:
 1. **System Message Handling**: 
    - Llama 2 wraps system messages in `<<SYS>>` tags

From 3b7cc5a6b011ed0d755f408843960c492e31db5e Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:27:35 +0000
Subject: [PATCH 15/30] Update chapters/en/chapter11/3.mdx

Co-authored-by: vb <vaibhavs10@gmail.com>
---
 chapters/en/chapter11/3.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 351be5182..71a667db2 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -6,7 +6,7 @@
 
 # Supervised Fine-Tuning
 
-This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can 
+This page provided a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can 
 adapt the model to perform specific tasks more effectively. 
 
 Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.

From c6800f1f23d1add7be79e05fa0ff03bec01ec8d1 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:27:48 +0000
Subject: [PATCH 16/30] Update chapters/en/chapter11/5.mdx

Co-authored-by: vb <vaibhavs10@gmail.com>
---
 chapters/en/chapter11/5.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index 471e48751..b9f15ca2e 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -60,7 +60,7 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only
 
 ## Implementing Custom Evaluations
 
-In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
+In this section, we will implement evaluation for our finetuned model. We can use [`lighteval`](https://github.com/huggingface/lighteval) to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
 
 LightEval tasks are defined using a specific format:
 

From 844d7159c14b71d58f47fa85fb36d6d3e1f1d019 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:27:57 +0000
Subject: [PATCH 17/30] Update chapters/en/chapter11/5.mdx

Co-authored-by: vb <vaibhavs10@gmail.com>
---
 chapters/en/chapter11/5.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index b9f15ca2e..4734ac720 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -17,7 +17,7 @@ However, it's crucial to understand that benchmark performance doesn't always tr
 MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
 
 ## Reasoning Benchmarks
-BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
+[BBH](https://huggingface.co/datasets/lukaemon/bbh) (Big Bench Hard) and [GSM8K](https://huggingface.co/datasets/openai/gsm8k) focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
 
 ## Language Understanding
 

From 3f9815c3e84a1264b2d4537114705f9e57ad7130 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:33:56 +0000
Subject: [PATCH 18/30] respond to suggestions in SFT page

---
 chapters/en/chapter11/3.mdx | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 71a667db2..96d98d7bf 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -6,10 +6,9 @@
 
 # Supervised Fine-Tuning
 
-This page provided a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can 
-adapt the model to perform specific tasks more effectively. 
+Supervised Fine-Tuning (SFT) is a process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
 
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
+This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. 
 
 ## When to Use SFT
 The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors:
@@ -180,8 +179,7 @@ the model is learning generalizable patterns rather than memorizing specific exa
     showing healthy convergence" width="600"/>
 </div>
 
-The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern 
-indicates the model is learning effectively while maintaining generalization ability.
+The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
 
 ### Warning Signs to Watch For
 

From c47a5a50c50cfb194a71595cec9259977b1895a3 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Tue, 11 Feb 2025 21:58:00 +0000
Subject: [PATCH 19/30] improve loss illustrations on sft page

---
 chapters/en/chapter11/3.mdx | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 96d98d7bf..0e8f78be9 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -174,29 +174,29 @@ the model is learning generalizable patterns rather than memorizing specific exa
 
 ### Monitoring Training Progress
 
-<div class="flex justify-center">
-    <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter11/loss_curve.png" alt="Training and validation loss curves 
-    showing healthy convergence" width="600"/>
-</div>
-
 The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability.
 
 ### Warning Signs to Watch For
 
-Several patterns in the loss curves can indicate potential issues:
+Several patterns in the loss curves can indicate potential issues. Below we illustrate common warning signs and solutions that we can consider.
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_1.png" alt="SFTTrainer Training" />
 
-1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider:
-   - Reducing the model size or training time
-   - Adding regularization
+If the validation loss decreases at a significantly slower rate than training loss, your model is likely overfitting to the training data. Consider:
+   - Reducing the training steps
    - Increasing the dataset size
-   - Using techniques like early stopping
+   - Validating dataset quality and diversity
 
-2. If the loss doesn't show significant improvement, the model might be:
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_2.png" alt="SFTTrainer Training" />
+
+If the loss doesn't show significant improvement, the model might be:
    - Learning too slowly (try increasing the learning rate)
    - Struggling with the task (check data quality and task complexity)
    - Hitting architecture limitations (consider a different model)
 
-3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_3.png" alt="SFTTrainer Training" />
+
+Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
    - The model performs poorly on new, similar examples
    - The outputs lack diversity
    - The responses are too similar to training examples

From d66fa86892c272a9ed3a2a5cf816fa6f0fa06ee4 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 12 Feb 2025 11:21:54 +0000
Subject: [PATCH 20/30] respond to feedback in chat template

---
 chapters/en/chapter11/2.mdx | 67 +++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index d5114f5f1..7b8d19dfb 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -1,13 +1,14 @@
 <CourseFloatingBanner chapter={2}
   classNames="absolute z-10 right-0 top-0"
   notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/chat_templates_example.ipynb"},
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/main/course/en/chapter11/section2.ipynb"},
 ]} />
 
 # Chat Templates
 
 ## Introduction
-Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns.
+
+Chat templates are essential for structuring interactions between language models and users. Whether you're building a simple chatbot or a complex AI agent, understanding how to properly format your conversations is crucial for getting the best results from your model. In this guide, we'll explore what chat templates are, why they matter, and how to use them effectively.
 
 <Tip>
 Chat templates are crucial for:
@@ -22,7 +23,7 @@ Chat templates are crucial for:
 ### Base Models vs Instruct Models
 A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, [`SmolLM2-135M`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a base model, while [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) is its instruction-tuned variant.  
 
-Instuction tuneds models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling.
+Instuction tuned models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling.
 
 To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
 
@@ -32,7 +33,7 @@ When using an instruct model, always verify you're using the correct chat templa
 
 ### Common Template Formats
 
-Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models:
+Before diving into specific implementations, it's important to understand how different models expect their conversations to be formatted. Let's explore some common template formats using a simple example conversation:
 
 We'll use the following conversation structure for all examples:
 
@@ -55,8 +56,7 @@ Hello!<|im_end|>
 <|im_start|>assistant
 Hi! How can I help you today?<|im_end|>
 <|im_start|>user
-What's the weather?<|im_end|>
-<|im_start|>assistant
+What's the weather?<|im_start|>assistant
 ```
 
 This is using the `mistral` template format:
@@ -87,15 +87,15 @@ Key differences between these formats include:
    - Mistral uses `<s>` and `</s>` for turn boundaries
    - Qwen uses role-specific start/end tokens
 
-The transformers library handles these differences through model-specific chat templates. When you load a tokenizer, it automatically uses the correct template for that model:
+Understanding these differences is key to working with various models. Let's look at how the transformers library helps us handle these variations automatically:
 
 ```python
 from transformers import AutoTokenizer
 
 # These will use different templates automatically
-llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
 mistral_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
 qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat")
+smol_tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
 
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
@@ -103,44 +103,40 @@ messages = [
 ]
 
 # Each will format according to its model's template
-llama_chat = llama_tokenizer.apply_chat_template(messages, tokenize=False)
 mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False)
 qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False)
+smol_chat = smol_tokenizer.apply_chat_template(messages, tokenize=False)
 ```
 
-## Working with Templates
-
-### Basic Implementation
-
-The transformers library provides built-in support for chat templates through the `apply_chat_template()` method:
-
-```python
-from transformers import AutoTokenizer
+<details>
+<summary>Click to see template examples</summary>
 
-tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
+Qwen 2 and SmolLM2 ChatML template:
 
-messages = [
-    {"role": "system", "content": "You are a helpful coding assistant."},
-    {"role": "user", "content": "Write a Python function to sort a list"},
-]
-
-# Apply the chat template
-formatted_chat = tokenizer.apply_chat_template(
-    messages, tokenize=False, add_generation_prompt=True
-)
+```sh
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+Hello!<|im_end|>
+<|im_start|>assistant
+Hi! How can I help you today?<|im_end|>
+<|im_start|>user
+What's the weather?<|im_start|>assistant
 ```
 
-This will return a formatted string that looks like:
+Mistral template:
 
 ```sh
-<|im_start|>system
-You are a helpful coding assistant.<|im_end|>
-<|im_start|>user
-Write a Python function to sort a list<|im_end|>
+<s>[INST] You are a helpful assistant. [/INST]
+Hi! How can I help you today?</s>
+[INST] Hello! [/INST]
 ```
 
+</details>
+
+
 ### Advanced Features
-Chat templates can handle more complex scenarios, including:
+Chat templates can handle more complex scenarios beyond just conversational interactions, including:
 
 1. **Tool Use**: When models need to interact with external tools or APIs
 2. **Multimodal Inputs**: For handling images, audio, or other media types
@@ -149,9 +145,8 @@ Chat templates can handle more complex scenarios, including:
 
 <Tip>
 When implementing advanced features:
-- Test thoroughly with your specific model
-- Handle errors gracefully
-- Monitor token usage carefully
+- Test thoroughly with your specific model. Vision and tool use template are particularly diverse.
+- Monitor token usage carefully between each feature and model.
 - Document the expected format for each feature
 </Tip>
 

From 21c8dd1958d1d1e117431f4caf108147a844d7ab Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 12 Feb 2025 11:36:07 +0000
Subject: [PATCH 21/30] respond to feedback on sft section

---
 chapters/en/chapter11/3.mdx | 55 ++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 0e8f78be9..16cea0fac 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -1,7 +1,7 @@
 <CourseFloatingBanner chapter={2}
   classNames="absolute z-10 right-0 top-0"
   notebooks={[
-    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb"},
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/main/course/en/chapter11/section3.ipynb"},
 ]} />
 
 # Supervised Fine-Tuning
@@ -11,7 +11,9 @@ Supervised Fine-Tuning (SFT) is a process for adapting pre-trained language mode
 This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. 
 
 ## When to Use SFT
-The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors:
+
+Before diving into implementation, it's important to understand when SFT is the right choice for your project. The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors:
+factors:
 
 ### Template Control
 SFT allows precise control over the model's output structure. This is particularly valuable when you need the model to:
@@ -43,7 +45,7 @@ The supervised fine-tuning process requires a task-specific dataset structured w
 2. The expected model response
 3. Any additional context or metadata
 
-The data format must be compatible with your model's chat template. Here's an example dataset suitable for supervised fine-tuning:
+The quality of your training data is crucial for successful fine-tuning. Let's look at how to prepare and validate your dataset:
 
 <iframe
   src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
@@ -54,22 +56,9 @@ The data format must be compatible with your model's chat template. Here's an ex
 
 ## Training Configuration
 
-### Parameters
-
-The SFTTrainer configuration requires consideration of several parameters that control the training process:
-
-| Parameter | Description |
-|-----------|-------------|
-| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) |
-| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) |
-| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size |
-| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) |
-| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations |
-| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) |
-| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) |
-| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) |
+The success of your fine-tuning depends heavily on choosing the right training parameters. Let's explore each important parameter and how to configure them effectively:
 
-### Core Parameters Explained
+The SFTTrainer configuration requires consideration of several parameters that control the training process. Let's explore each parameter and their purpose:
 
 1. **Training Duration Parameters**:
    - `num_train_epochs`: Controls total training duration
@@ -101,7 +90,7 @@ Start with conservative values and adjust based on monitoring:
 
 ## Implementation with TRL
 
-We will use the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. Here's a complete example using the TRL library:
+Now that we understand the key components, let's implement the training with proper validation and monitoring. We will use the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. Here's a complete example using the TRL library:
 
 ```python
 from datasets import load_dataset
@@ -141,6 +130,8 @@ trainer.train()
 
 ## Monitoring Training Progress
 
+Effective monitoring is crucial for successful fine-tuning. Let's explore what to watch for during training:
+
 ### Understanding Loss Patterns
 
 Training loss typically follows three distinct phases:
@@ -183,23 +174,23 @@ Several patterns in the loss curves can indicate potential issues. Below we illu
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_1.png" alt="SFTTrainer Training" />
 
 If the validation loss decreases at a significantly slower rate than training loss, your model is likely overfitting to the training data. Consider:
-   - Reducing the training steps
-   - Increasing the dataset size
-   - Validating dataset quality and diversity
+- Reducing the training steps
+- Increasing the dataset size
+- Validating dataset quality and diversity
 
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_2.png" alt="SFTTrainer Training" />
 
 If the loss doesn't show significant improvement, the model might be:
-   - Learning too slowly (try increasing the learning rate)
-   - Struggling with the task (check data quality and task complexity)
-   - Hitting architecture limitations (consider a different model)
+- Learning too slowly (try increasing the learning rate)
+- Struggling with the task (check data quality and task complexity)
+- Hitting architecture limitations (consider a different model)
 
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sft_loss_3.png" alt="SFTTrainer Training" />
 
 Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if:
-   - The model performs poorly on new, similar examples
-   - The outputs lack diversity
-   - The responses are too similar to training examples
+- The model performs poorly on new, similar examples
+- The outputs lack diversity
+- The responses are too similar to training examples
 
 <Tip warning={true}>
 
@@ -228,12 +219,6 @@ Document your training process, including:
 This documentation will be valuable for future model iterations.
 </Tip>
 
-## Additional Resources
-
-- [TRL Documentation](https://huggingface.co/docs/trl)
-- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
-- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training)
-
 ## Quiz
 
 ### 1. What parameters control the training duration in SFT?
@@ -347,4 +332,4 @@ You've learned how to fine-tune models using SFT! To continue your learning:
 
 - [TRL Documentation](https://huggingface.co/docs/trl)
 - [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft)
-- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training)
\ No newline at end of file
+- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training)

From 5d4025d2cb1b114dc5eecb3439607b26da06137f Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 12 Feb 2025 11:53:16 +0000
Subject: [PATCH 22/30] respond to feedback on lora section

---
 chapters/en/chapter11/4.mdx | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index 48c3bebbf..b9dcf7b1d 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -1,3 +1,9 @@
+<CourseFloatingBanner chapter={2}
+  classNames="absolute z-10 right-0 top-0"
+  notebooks={[
+    {label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/main/course/en/chapter11/section4.ipynb"},
+]} />
+
 # LoRA (Low-Rank Adaptation)
 
 Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.
@@ -25,17 +31,17 @@ LoRA works by adding pairs of rank decomposition matrices to transformer layers,
 
 ## Loading LoRA Adapters with PEFT
 
-PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
+
+[PEFT](https://github.com/huggingface/peft) is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
 
 Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
 
 ```python
-from transformers import AutoModelForCausalLM
-from peft import PeftModel
+from peft import PeftModel, PeftConfig
 
-base_model = AutoModelForCausalLM.from_pretrained("<base_model_name>")
-peft_model_id = "<peft_adapter_id>"
-model = PeftModel.from_pretrained(base_model, peft_model_id)
+config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
+model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
+lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")
 ```
 
 <!-- TODO: Add image -->
@@ -69,7 +75,7 @@ When implementing PEFT methods, start with small rank values (4-8) for LoRA and
 
 ## Using TRL with PEFT
 
-PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.
+PEFT methods can be combined with TRL for fine-tuning to reduce memory requirements. We can pass the  `LoraConfig` to the model when loading it.  
 
 ```python
 from peft import LoraConfig

From e0ecc8c37da290e9fd528a153b8f4034daaf9e9a Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Wed, 12 Feb 2025 12:01:56 +0000
Subject: [PATCH 23/30] respond to feedback in unit 5

---
 chapters/en/chapter11/5.mdx | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index 4734ac720..2032a3f93 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -14,14 +14,14 @@ However, it's crucial to understand that benchmark performance doesn't always tr
 
 ## General Knowledge Benchmarks
 
-MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
+[MMLU](https://huggingface.co/datasets/cais/mmlu) (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
 
 ## Reasoning Benchmarks
 [BBH](https://huggingface.co/datasets/lukaemon/bbh) (Big Bench Hard) and [GSM8K](https://huggingface.co/datasets/openai/gsm8k) focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
 
 ## Language Understanding
 
-HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology.
+[HELM](https://github.com/stanford-crfm/helm) provides a holistic evaluation framework. Benchmarks like HELM offer insights into language processing capabilities on aspects like commonsense, world knowledge, and reasoning. But may not fully represent the complexity of natural conversation or domain-specific terminology.
 
 ## Alternative Evaluation Approaches
 
@@ -33,7 +33,7 @@ Using one language model to evaluate another's outputs has become increasingly p
 
 ### Evaluation Arenas
 
-Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks.
+Evaluation arenas like [Chatbot Arena](https://lmarena.ai/) offer a unique approach to LLM assessment through crowdsourced feedback. In these platforms, users engage in anonymous "battles" between two LLMs, asking questions and voting on which model provides better responses. This approach captures real-world usage patterns and preferences through diverse, challenging questions, with studies showing strong agreement between crowd-sourced votes and expert evaluations. While powerful, these platforms have limitations including potential user base bias, skewed prompt distributions, and a primary focus on helpfulness rather than safety considerations.
 
 ### Custom Benchmark Suites
 
@@ -60,7 +60,7 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only
 
 ## Implementing Custom Evaluations
 
-In this section, we will implement evaluation for our finetuned model. We can use [`lighteval`](https://github.com/huggingface/lighteval) to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
+In this section, we will implement evaluation for our finetuned model. We can use [`lighteval`](https://github.com/huggingface/lighteval) to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.  
 
 LightEval tasks are defined using a specific format:
 
@@ -84,7 +84,7 @@ Let's set up an evaluation pipeline for our finetuned model. We will evaluate th
 Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
 
 ```bash
-lighteval vllm \
+lighteval accelerate \
     "pretrained=your-model-name" \
     "mmlu|anatomy|0|0" \
     "mmlu|high_school_biology|0|0" \

From 2c30171ed9d8f1ff013eb03df3a1e28a1fe7d703 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:32:04 +0100
Subject: [PATCH 24/30] update toc with new tag and subtitle

---
 chapters/en/_toctree.yml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml
index 73e2a069b..c793e773d 100644
--- a/chapters/en/_toctree.yml
+++ b/chapters/en/_toctree.yml
@@ -191,7 +191,6 @@
     quiz: 9
 
 - title: 10. Curate high-quality datasets
-  new: true
   subtitle: How to use Argilla to create amazing datasets
   sections:
   - local: chapter10/1
@@ -210,7 +209,9 @@
     title: End-of-chapter quiz
     quiz: 10
 
-- title: 11. Supervised fine-tuning
+- title: 11. Fine-tune Large Language Models
+  subtitle: Use Supervised Fine-tuning and Low-Rank Adaptation to fine-tune a large language model
+  new: true
   sections:
   - local: chapter11/1
     title: Introduction

From cc7ddee348d65a4d0b550c0ebb81d8a78271abff Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:32:23 +0100
Subject: [PATCH 25/30] improve intro congruency with previous chapters

---
 chapters/en/chapter11/1.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
index 99c8fe115..2aab1381b 100644
--- a/chapters/en/chapter11/1.mdx
+++ b/chapters/en/chapter11/1.mdx
@@ -1,6 +1,6 @@
 # Supervised Fine-Tuning
 
-This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. Majority of the LLMs that people interact with on platforms like ChatGPT go through some form of SFT because it's a robust way to adapt models to common use cases. We will separate this chapter into three sections:
+In [Chapter 2 Section 2](/course/chapter2/2), we saw that generative language models can be fine-tuned on specific tasks like summarization and question answering. However, nowadays it is far more common to fine-tune language models on a broad range of tasks simultaneously; a method known as supervised fine-tuning (SFT). This process helps models become more versatile and capable of handling diverse use cases. Most LLMs that people interact with on platforms like ChatGPT have undergone SFT to make them more helpful and aligned with human preferences. We will separate this chapter into four sections:
 
 ## 1️⃣ Chat Templates
 
@@ -8,7 +8,7 @@ Chat templates structure interactions between users and AI models, ensuring cons
 
 ## 2️⃣ Supervised Fine-Tuning
 
-Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices.
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see [The supervised fine-tuning section of the TRL documentation](https://huggingface.co/docs/trl/en/sft_trainer).
 
 ## 3️⃣ Low Rank Adaptation (LoRA)
 

From f040b6ceb2017c321cdc540602d54946f0b943c7 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:32:42 +0100
Subject: [PATCH 26/30] make chat templates more about structure

---
 chapters/en/chapter11/2.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
index 7b8d19dfb..e2c038e72 100644
--- a/chapters/en/chapter11/2.mdx
+++ b/chapters/en/chapter11/2.mdx
@@ -25,10 +25,10 @@ A base model is trained on raw text data to predict the next token, while an ins
 
 Instuction tuned models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling.
 
-To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
+To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). Here's a guide on [ChatML](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146).
 
 <Tip warning={true}>
-When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses [this configuration](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/main/tokenizer_config.json).
+When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses [this configuration](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/e2c3f7557efbdec707ae3a336371d169783f1da1/tokenizer_config.json#L146).  
 </Tip>
 
 ### Common Template Formats

From 6d2a54c7372d623011f2f715e5e4f1150412e599 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:33:04 +0100
Subject: [PATCH 27/30] add packing and references to the sft section

---
 chapters/en/chapter11/3.mdx | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
index 16cea0fac..e92885a86 100644
--- a/chapters/en/chapter11/3.mdx
+++ b/chapters/en/chapter11/3.mdx
@@ -6,14 +6,22 @@
 
 # Supervised Fine-Tuning
 
-Supervised Fine-Tuning (SFT) is a process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
+Supervised Fine-Tuning (SFT) is a process primarily used to adapt pre-trained language models to follow instructions, engage in dialogue, and use specific output formats. While pre-trained models have impressive general capabilities, SFT helps transform them into assistant-like models that can better understand and respond to user prompts. This is typically done by training on datasets of human-written conversations and instructions.
 
 This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. 
 
 ## When to Use SFT
 
-Before diving into implementation, it's important to understand when SFT is the right choice for your project. The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors:
-factors:
+Before diving into implementation, it's important to understand when SFT is the right choice for your project. As a first step, you should consider whether using an existing instruction-tuned model with well-crafted prompts would suffice for your use case. SFT involves significant computational resources and engineering effort, so it should only be pursued when prompting existing models proves insufficient.
+
+<Tip>
+Consider SFT only if you:
+- Need additional performance beyond what prompting can achieve
+- Have a specific use case where the cost of using a large general-purpose model outweighs the cost of fine-tuning a smaller model
+- Require specialized output formats or domain-specific knowledge that existing models struggle with
+</Tip>
+
+If you determine that SFT is necessary, the decision to proceed depends on two primary factors:
 
 ### Template Control
 SFT allows precise control over the model's output structure. This is particularly valuable when you need the model to:
@@ -128,6 +136,14 @@ trainer = SFTTrainer(
 trainer.train()
 ```
 
+<Tip>
+When using a dataset with a "messages" field (like the example above), the SFTTrainer automatically applies the model's chat template, which it retrieves from the hub. This means you don't need any additional configuration to handle chat-style conversations - the trainer will format the messages according to the model's expected template format.
+</Tip>
+
+## Packing the Dataset
+
+The SFTTrainer supports example packing to optimize training efficiency through the `ConstantLengthDataset` utility class. This feature allows multiple short examples to be packed into the same input sequence, maximizing GPU utilization during training. To enable packing, simply set `packing=True` in the SFTConfig constructor. When using packed datasets with `max_steps`, be aware that you may train for more epochs than expected depending on your packing configuration. You can customize how examples are combined using a formatting function - particularly useful when working with datasets that have multiple fields like question-answer pairs. For evaluation datasets, you can disable packing by setting `eval_packing=False` in the SFTConfig. Here's a basic example:
+
 ## Monitoring Training Progress
 
 Effective monitoring is crucial for successful fine-tuning. Let's explore what to watch for during training:
@@ -193,12 +209,11 @@ Extremely low loss values could suggest memorization rather than learning. This
 - The responses are too similar to training examples
 
 <Tip warning={true}>
-
-Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular 
-qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
-
+Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss.
 </Tip>
 
+We should note that the interpretation of the loss values we outline here is aimed on the most common case, and in fact, loss values can behave on various ways depending on the model, the dataset, the training parameters, etc. If you interested in exploring more about outlined patterns, you should check out this blog post by the people at [Fast AI](https://www.fast.ai/posts/2023-09-04-learning-jumps/).
+
 ## Evaluation after SFT
 
 In section [11.4](/en/chapter11/4) we will learn how to evaluate the model using benchmark datasets. For now, we will focus on the qualitative evaluation of the model.

From 3a2ee3cccaa644e990c9a536f3ab068a1fd53ec4 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:33:21 +0100
Subject: [PATCH 28/30] fix qlora mistake in lora page

---
 chapters/en/chapter11/4.mdx | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
index b9dcf7b1d..ec2ebbf11 100644
--- a/chapters/en/chapter11/4.mdx
+++ b/chapters/en/chapter11/4.mdx
@@ -31,10 +31,9 @@ LoRA works by adding pairs of rank decomposition matrices to transformer layers,
 
 ## Loading LoRA Adapters with PEFT
 
-
 [PEFT](https://github.com/huggingface/peft) is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. 
 
-Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.
+Adapters can be loaded onto a pretrained model with `load_adapter()`, which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the `set_adapter()` function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.  
 
 ```python
 from peft import PeftModel, PeftConfig
@@ -51,7 +50,7 @@ lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")
 
 The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train.
 
-We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
+We'll use the `LoRAConfig` class from PEFT in our example. The setup requires just a few configuration steps:
 
 1. Define the LoRA configuration (rank, alpha, dropout)
 2. Create the SFTTrainer with PEFT config

From a02b2d2f70e839773a11706181074fb2ecca1fa2 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Fri, 14 Feb 2025 14:33:37 +0100
Subject: [PATCH 29/30] add more benchmarks to evaluation section

---
 chapters/en/chapter11/5.mdx | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx
index 2032a3f93..27462e20e 100644
--- a/chapters/en/chapter11/5.mdx
+++ b/chapters/en/chapter11/5.mdx
@@ -1,6 +1,6 @@
 # Evaluation
 
-With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks.
+With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. As machine learning engineers you should maintain a suite of relevant evaluations for your targeted domain of interest. In this page, we will look at some of the most common benchmarks and how to use them to evaluate your model. We'll also look at how to create custom benchmarks for your specific use case.
 
 ## Automatic Benchmarks
 
@@ -17,12 +17,23 @@ However, it's crucial to understand that benchmark performance doesn't always tr
 [MMLU](https://huggingface.co/datasets/cais/mmlu) (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation.
 
 ## Reasoning Benchmarks
+
 [BBH](https://huggingface.co/datasets/lukaemon/bbh) (Big Bench Hard) and [GSM8K](https://huggingface.co/datasets/openai/gsm8k) focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios.
 
 ## Language Understanding
 
 [HELM](https://github.com/stanford-crfm/helm) provides a holistic evaluation framework. Benchmarks like HELM offer insights into language processing capabilities on aspects like commonsense, world knowledge, and reasoning. But may not fully represent the complexity of natural conversation or domain-specific terminology.
 
+## Domain-Specific Benchmarks
+
+Let's look at a few benchmarks that focus on specific domains like math, coding, and chat.
+
+The [MATH benchmark](https://huggingface.co/papers/2103.03874) is another important evaluation tool for mathematical reasoning. It consists of 12,500 problems from mathematics competitions, covering algebra, geometry, number theory, counting, probability, and more. What makes MATH particularly challenging is that it requires multi-step reasoning, formal mathematical notation understanding, and the ability to generate step-by-step solutions. Unlike simpler arithmetic tasks, MATH problems often demand sophisticated problem-solving strategies and mathematical concept applications.
+
+The [HumanEval Benchmark](https://github.com/openai/human-eval) is a coding-focused evaluation dataset consisting of 164 programming problems. The benchmark tests a model's ability to generate functionally correct Python code that solves the given programming tasks. What makes HumanEval particularly valuable is that it evaluates both code generation capabilities and functional correctness through actual test case execution, rather than just superficial similarity to reference solutions. The problems range from basic string manipulation to more complex algorithms and data structures.
+
+[Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/) is an automated evaluation framework designed to assess the quality of instruction-following language models. It uses GPT-4 as a judge to evaluate model outputs across various dimensions including helpfulness, honesty, and harmlessness. The framework includes a dataset of 805 carefully curated prompts and can evaluate responses against multiple reference models like Claude, GPT-4, and others. What makes Alpaca Eval particularly useful is its ability to provide consistent, scalable evaluations without requiring human annotators, while still capturing nuanced aspects of model performance that traditional metrics might miss.
+
 ## Alternative Evaluation Approaches
 
 Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks:

From c74ebd3db3654be77f5386b6dd94d2db3ec83572 Mon Sep 17 00:00:00 2001
From: burtenshaw <ben.burtenshaw@gmail.com>
Date: Mon, 17 Feb 2025 12:57:38 +0100
Subject: [PATCH 30/30] add final quizzes to quiz section

---
 chapters/en/chapter11/7.mdx | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx
index 690547c5f..d81e314cc 100644
--- a/chapters/en/chapter11/7.mdx
+++ b/chapters/en/chapter11/7.mdx
@@ -8,9 +8,26 @@ To take the quiz, you will need to follow these steps:
 2. Answer the questions in the quiz.
 3. Submit your answers.
 
+
+## Multiple Choice Quiz
+
+In this quiz, you will be asked to select the correct answer from a list of options. We'll test you on the fundamentals of supervised finetuning.
+
 <iframe
 	src="https://nlp-course-supervised-finetuning-quiz.hf.space"
 	frameborder="0"
 	width="850"
 	height="450"
 ></iframe>
+
+
+## Code Quiz
+
+In this quiz, you will be asked to write code to complete a task. We'll test you on the code you've studied in the course from libraries like `datasets`, `transformers`, `peft`, and `TRL`.
+
+<iframe
+	src="https://nlp-course-sft-code-quiz.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
\ No newline at end of file