From 7bc134b84d83cd9a2a824d2415dd6032f5a05bd5 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 29 Jan 2025 14:41:16 +0100 Subject: [PATCH 01/30] initial copy from smol-course --- chapters/en/chapter11/README.md | 30 + chapters/en/chapter11/chat_templates.md | 114 + .../en/chapter11/chat_templates_example.ipynb | 5741 +++++++++++++++++ .../en/chapter11/sft_finetuning_example.ipynb | 273 + .../en/chapter11/supervised_fine_tuning.md | 41 + 5 files changed, 6199 insertions(+) create mode 100644 chapters/en/chapter11/README.md create mode 100644 chapters/en/chapter11/chat_templates.md create mode 100644 chapters/en/chapter11/chat_templates_example.ipynb create mode 100644 chapters/en/chapter11/sft_finetuning_example.ipynb create mode 100644 chapters/en/chapter11/supervised_fine_tuning.md diff --git a/chapters/en/chapter11/README.md b/chapters/en/chapter11/README.md new file mode 100644 index 000000000..a7fae79c6 --- /dev/null +++ b/chapters/en/chapter11/README.md @@ -0,0 +1,30 @@ +# Instruction Tuning + +This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. + +In this module, we will explore two topics: 1) Chat Templates and 2) Supervised Fine-Tuning. + +## 1️⃣ Chat Templates + +Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the [Chat Templates](./chat_templates.md) section. + +## 2️⃣ Supervised Fine-Tuning + +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the [Supervised Fine-Tuning](./supervised_fine_tuning.md) page. + +## Exercise Notebooks + +| Title | Description | Exercise | Link | Colab | +|-------|-------------|----------|------|-------| +| Chat Templates | Learn how to use chat templates with SmolLM2 and process datasets into chatml format | 🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format
🐕 Convert the `openai/gsm8k` dataset into chatml format | [Notebook](./notebooks/chat_templates_example.ipynb) | Open In Colab | +| Supervised Fine-Tuning | Learn how to fine-tune SmolLM2 using the SFTTrainer | 🐢 Use the `HuggingFaceTB/smoltalk` dataset
🐕 Try out the `bigcode/the-stack-smol` dataset
🦁 Select a dataset for a real world use case | [Notebook](./notebooks/sft_finetuning_example.ipynb) | Open In Colab | + +## References + +- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) +- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) +- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) +- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) +- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma) +- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) diff --git a/chapters/en/chapter11/chat_templates.md b/chapters/en/chapter11/chat_templates.md new file mode 100644 index 000000000..61ff65e6f --- /dev/null +++ b/chapters/en/chapter11/chat_templates.md @@ -0,0 +1,114 @@ +# Chat Templates + +Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns. + +## Base Models vs Instruct Models + +A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant. + +To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). + +It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template. + +## Understanding Chat Templates + +At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template: + +```sh +<|im_start|>user +Hi there!<|im_end|> +<|im_start|>assistant +Nice to meet you!<|im_end|> +<|im_start|>user +Can I ask a question?<|im_end|> +<|im_start|>assistant +``` + +The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation: + +```python +messages = [ + {"role": "system", "content": "You are a helpful assistant focused on technical topics."}, + {"role": "user", "content": "Can you explain what a chat template is?"}, + {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."} +] +``` + +Let's break down the above example, and see how it maps to the chat template format. + +## System Messages + +System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example: + +```python +system_message = { + "role": "system", + "content": "You are a professional customer service agent. Always be polite, clear, and helpful." +} +``` + +## Conversations + +Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations: + +```python +conversation = [ + {"role": "user", "content": "I need help with my order"}, + {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"}, + {"role": "user", "content": "It's ORDER-123"}, +] +``` + +## Implementation with Transformers + +The transformers library provides built-in support for chat templates. Here's how to use them: + +```python +from transformers import AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") + +messages = [ + {"role": "system", "content": "You are a helpful coding assistant."}, + {"role": "user", "content": "Write a Python function to sort a list"}, +] + +# Apply the chat template +formatted_chat = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True +) +``` + +## Custom Formatting +You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles: + +```python +template = """ +<|system|>{system_message} +<|user|>{user_message} +<|assistant|>{assistant_message} +""".lstrip() +``` + +## Multi-Turn Support + +Templates can handle complex multi-turn conversations while maintaining context: + +```python +messages = [ + {"role": "system", "content": "You are a math tutor."}, + {"role": "user", "content": "What is calculus?"}, + {"role": "assistant", "content": "Calculus is a branch of mathematics..."}, + {"role": "user", "content": "Can you give me an example?"}, +] +``` + +⏭️ [Next: Supervised Fine-Tuning](./supervised_fine_tuning.md) + +## Resources + +- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Transformers Documentation](https://huggingface.co/docs/transformers) +- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) diff --git a/chapters/en/chapter11/chat_templates_example.ipynb b/chapters/en/chapter11/chat_templates_example.ipynb new file mode 100644 index 000000000..88f60c1e4 --- /dev/null +++ b/chapters/en/chapter11/chat_templates_example.ipynb @@ -0,0 +1,5741 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "vZAvFVIAtFlq" + }, + "source": [ + "# Exploring Chat Templates with SmolLM2\n", + "\n", + "This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "K-lZu8JvtwUN", + "outputId": "c3871418-15bc-4265-ae8d-6d6036036d0e" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "c15d320002504d95bb86e87f50d43b08", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "VBox(children=(HTML(value='
user\n", + "Hello, how are you?<|im_end|>\n", + "<|im_start|>assistant\n", + "I'm doing well, thank you! How can I assist you today?<|im_end|>\n", + "\n" + ] + } + ], + "source": [ + "input_text = tokenizer.apply_chat_template(messages, tokenize=False)\n", + "\n", + "print(\"Conversation with template:\", input_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sfvdglOqtFls" + }, + "source": [ + "# Decode the conversation\n", + "\n", + "Note that the conversation is represented as above but with a further assistant message.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mXUVdPeytFls", + "outputId": "80870e53-7bc1-426e-ac33-ba6748e030fc" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Conversation decoded: <|im_start|>user\n", + "Hello, how are you?<|im_end|>\n", + "<|im_start|>assistant\n", + "I'm doing well, thank you! How can I assist you today?<|im_end|>\n", + "<|im_start|>assistant\n", + "\n" + ] + } + ], + "source": [ + "input_text = tokenizer.apply_chat_template(\n", + " messages, tokenize=True, add_generation_prompt=True\n", + ")\n", + "\n", + "print(\"Conversation decoded:\", tokenizer.decode(token_ids=input_text))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UcZQpspEtFlt" + }, + "source": [ + "# Tokenize the conversation\n", + "\n", + "Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jc2PLxAMtFlt", + "outputId": "d2098780-b3f4-41ec-a1f3-b6da2b593c62" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]\n" + ] + } + ], + "source": [ + "input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)\n", + "\n", + "print(\"Conversation tokenized:\", input_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "m3eNp9a0tFlt" + }, + "source": [ + "
\n", + "

Exercise: Process a dataset for SFT

\n", + "

Take a dataset from the Hugging Face hub and process it for SFT.

\n", + "

Difficulty Levels

\n", + "

🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.

\n", + "

🐕 Convert the `openai/gsm8k` dataset into chatml format.

\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 381 + }, + "id": "qbkXV2_ItFlt", + "outputId": "06deadc3-2c63-4660-d2bd-05096ef07c9f" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from IPython.core.display import display, HTML\n", + "\n", + "display(\n", + " HTML(\n", + " \"\"\"\n", + "\"\"\"\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 241, + "referenced_widgets": [ + "c2d74a42fb574b8892d0a288fd92f0a6", + "056b9ef5706843b19cd62fce75743afb", + "17b4d81e40564a53bb79be9fbef4918e", + "951f60cddcb84dfdbbdf2058369f0541", + "646484cf7a36444daebe1dfe4a0e4150", + "e2f0c39ce1c046e8acb150dfbfaf5aa8", + "7eb12d70d2b542a7b651c7680f590279", + "ea1f9cb22abf4e7d9f6e76fc86c03387", + "00c9f5ca71b84df4b26acee72c97fefb", + "505f96bc0c7843bcb1498ba1c1ba5f06", + "635cc2881a1e4b8788bb26c356740e04", + "a6ee323c13904525a99c6f092ba96e18", + "67fffe7d6f8c4963972b408529e05532", + "0055b6b628934affaf88bc58a1572bb6", + "aafbbb9fc5164fa3a88193bfd33d2f79", + "606e39d53ed64967a60337418c71c595", + "26b15fa18b1b4963a1ba76a76675e7ee", + "db09ab1f79db4f3a8de77f0348eca0f7", + "de04f344a8d4428e8ba1836a563d8aa1", + "03c09673186046d799d6f487d6623e6b", + "1cc682330b24431b8812c73041e987d0", + "dafe748a452148038779f6a62a22a4ec", + "addad1c100024c44a0959978153da9a8", + "9bea2a23db644ad19b708d10e35d54ee", + "d1174b127571420593971166fbb1966b", + "add90ed3746d4293a1b71198137a892c", + "8def25e6389f4e6192b517b6e80aa05e", + "c9747e7a810f413ba1ea108307e3ad1d", + "d0ea49d1d90f4d34bf2ae70efa96946e", + "59d0997b85614384bbfebeee928340b6", + "269920491c134501873e0110367bc984", + "384d26051c04460e8870a3ffe9406c48", + "8e8a0e89a50646c897e546c4077db79e", + "ff60308921f9432683acbcd6d29fb78f", + "3bc8f6339f4e4a3b961d810255c5573e", + "4780ad263ec04b1a97525d985e102049", + "488feef55878426bbf1c753c6d58735b", + "560ba45d70ca431dadeb327d234c330a", + "04d0a6f74af346f7bc696951949063c8", + "2a18ce941b0f4cef8307988ef898b47f", + "194e3fda3635466b998f96e3dc22746a", + "e2ab3cb38b5a41f68d18ed5f0e6ae22c", + "f0b271bcac6c43a9aaddac54259bb514", + "0dc93d50a283472f9ca64fd0a4c6ff15", + "dd1a50d4497144388a1809b78bb38f58", + "6b72a856e5bd4812a5e0dd0c3bfb8455", + "4e21a567d1f6461985727823b37166e1", + "ec1efb7598fd496bb170673ae1b8a1df", + "84f393468aa74baa903243d238b2d387", + "a54ce365be104d27aaa15cf8c63b5ebe", + "1791220377d141ac9b307246177d0712", + "fa330d4f0fb241aebd065f6ef4a6892c", + "cfa1cc6eed8a4f7791a7959308456b6b", + "b50c9c4433854cf7a6b2593e946b7faa", + "7557cd24ba9b4aa3955866d59db94519", + "cc608dfb880c49d4bc5acf2d691b8ec6", + "cb838c5bed994a9a8e6fcf5c98b76d17", + "76bbe8c2beba4c0594085d32a68d2ee7", + "c9836c952b07472880649b82e2347e8d", + "383db57f997140d482b82b123080837a", + "182abc7ec4d944d9bb2ec1281c98b4c8", + "6934c6d1cbac44dbb08f3fffe3056edb", + "05fa0f6eb78b4c56b219b0e57521bd2e", + "012aa94e3cf24e32833c6bbca23c52f7", + "76c1a1cdc9054bbe90d0d3b662cf0ed1", + "e453f1672772400a851735ba64f42c8b", + "d1358f6b16644cb3a2328ca639a4a77a", + "c19f60d4028045399c62004027eaafd9", + "8055588a1fa940239c801ef66f3ecf3b", + "7468a9bc8bda44e5b44574c64fdc6803", + "a13a8f8b702e44ed88c7d358a0a8b4b4", + "13367fbb763747fa8de94cde40ffae32", + "b1fcf477db664ccdade4096fb79de327", + "9d1c06ac6b774d82adca58773f389161", + "31910159cf30463b8246ec47ffd8ab5b", + "72220420f9d340eabec13a01caebc92c", + "55b14c03a41c495aacf8ac2d0f96ba0b" + ] + }, + "id": "4p3atw4_tFlu", + "outputId": "62ee9812-3819-4a9c-9e24-5687368ffcd8" + }, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "\n", + "ds = load_dataset(\"HuggingFaceTB/smoltalk\", \"everyday-conversations\")\n", + "\n", + "\n", + "def process_dataset(sample):\n", + " # TODO: 🐢 Convert the sample into a chat format\n", + " # use the tokenizer's method to apply the chat template\n", + " return sample\n", + "\n", + "\n", + "ds = ds.map(process_dataset)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 381 + }, + "id": "81fQeazltFlu", + "outputId": "36cf7148-9881-4f13-d0ce-76c82c4ab219" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(\n", + " HTML(\n", + " \"\"\"\n", + "\"\"\"\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "id": "bWUSv7NMtFlu" + }, + "outputs": [], + "source": [ + "ds = load_dataset(\"openai/gsm8k\", \"main\")\n", + "\n", + "\n", + "def process_dataset(sample):\n", + " # TODO: 🐕 Convert the sample into a chat format\n", + "\n", + " # 1. create a message format with the role and content\n", + "\n", + " # 2. apply the chat template to the samples using the tokenizer's method\n", + "\n", + " return sample\n", + "\n", + "\n", + "ds = ds.map(process_dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qlXCuRKotFlu" + }, + "source": [ + "## Conclusion\n", + "\n", + "This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.\n", + "\n", + "In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood." + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "0055b6b628934affaf88bc58a1572bb6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_de04f344a8d4428e8ba1836a563d8aa1", + "max": 946449, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_03c09673186046d799d6f487d6623e6b", + "value": 946449 + } + }, + "00c9f5ca71b84df4b26acee72c97fefb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "012aa94e3cf24e32833c6bbca23c52f7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "016d5e929f1240cea067372b2191d107": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "01e0f8a799ad479eb95eef3e5a09bd70": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8fe2df9a14a0436c9124a856ac7419e4", + "IPY_MODEL_d108e029e743419989e30f64f0c82b90", + "IPY_MODEL_bfd11f21f197459b8f27ef364bc9b264" + ], + "layout": "IPY_MODEL_76a0341ebe9f4c3face32460d7023be9" + } + }, + "0206fb9662a349c1aa8a6d87ce01c388": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "03c09673186046d799d6f487d6623e6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0479fd3fc1ba476ab46f8c0a98f89468": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "04ae3f7b640c42f3a8eb1977cd1a585d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "04d0a6f74af346f7bc696951949063c8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "056b9ef5706843b19cd62fce75743afb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_e2f0c39ce1c046e8acb150dfbfaf5aa8", + "placeholder": "​", + "style": "IPY_MODEL_7eb12d70d2b542a7b651c7680f590279", + "value": "README.md: 100%" + } + }, + "05fa0f6eb78b4c56b219b0e57521bd2e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0942430d36de4677b4c2fa771d7bcd2a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0bab42beb845475684e9e71dd1591e1d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0c336ea5c653434da49e2f0e949f83d0": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "0dc93d50a283472f9ca64fd0a4c6ff15": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "10a0f37020d44156a11e9750778892e0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "13367fbb763747fa8de94cde40ffae32": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1513792bad534a0c9c381a131395c519": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76d306c21214412ab44e542d82e547aa", + "placeholder": "​", + "style": "IPY_MODEL_b9e41ef9e9c54fa7b71bc333604af74e", + "value": " 831/831 [00:00<00:00, 42.7kB/s]" + } + }, + "17023310de9b4c3ebd8cc03758d59ef9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1791220377d141ac9b307246177d0712": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "17b4d81e40564a53bb79be9fbef4918e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ea1f9cb22abf4e7d9f6e76fc86c03387", + "max": 9251, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_00c9f5ca71b84df4b26acee72c97fefb", + "value": 9251 + } + }, + "182abc7ec4d944d9bb2ec1281c98b4c8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "194e3fda3635466b998f96e3dc22746a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1cc682330b24431b8812c73041e987d0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "269920491c134501873e0110367bc984": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "26b15fa18b1b4963a1ba76a76675e7ee": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "26ed0f1bae204d74a313d101d9355e90": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7612cc9b8908471b90c9118151d6e447", + "placeholder": "​", + "style": "IPY_MODEL_b687aca79e6e470b96254c5e309d6d63", + "value": "generation_config.json: 100%" + } + }, + "2a18ce941b0f4cef8307988ef898b47f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "31910159cf30463b8246ec47ffd8ab5b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "383db57f997140d482b82b123080837a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "384d26051c04460e8870a3ffe9406c48": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "3b881514716c47308061fe85b810a6a4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_26ed0f1bae204d74a313d101d9355e90", + "IPY_MODEL_4ff5af1784904bc9b85515105885e2d8", + "IPY_MODEL_b3c42d7e25d6494993029531adc3866d" + ], + "layout": "IPY_MODEL_6227b40396ea4024b3c8710c5e65601f" + } + }, + "3bc8f6339f4e4a3b961d810255c5573e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_04d0a6f74af346f7bc696951949063c8", + "placeholder": "​", + "style": "IPY_MODEL_2a18ce941b0f4cef8307988ef898b47f", + "value": "Generating train split: 100%" + } + }, + "3cc519fd92fe4b328943ec839115b63e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e15fc503bb73476980cedb5f06b51ced", + "IPY_MODEL_d8c5dc8df3be4e65b2bbba020d29150f", + "IPY_MODEL_c0177c4ad18740d88acfc603ce4735f8" + ], + "layout": "IPY_MODEL_eb570fd159124e2cbd2df9335b3f9cd6" + } + }, + "3fa18e3b50104af796bd0887f556224a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "41a27cf0a91246599d4d1b7dae7c7863": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "45675fb5f5c94f8cae575582f7ae41a7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7eb91920e4384194a008902d6c4a09c7", + "placeholder": "​", + "style": "IPY_MODEL_b379da78cb34463aa5a72eedc3d176cd", + "value": " 704/704 [00:00<00:00, 36.5kB/s]" + } + }, + "471b481a3e5b4d439ab31fdc49fc99c7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4780ad263ec04b1a97525d985e102049": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_194e3fda3635466b998f96e3dc22746a", + "max": 2260, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_e2ab3cb38b5a41f68d18ed5f0e6ae22c", + "value": 2260 + } + }, + "48724ba7ba4e4f00923445245640739f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "488feef55878426bbf1c753c6d58735b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f0b271bcac6c43a9aaddac54259bb514", + "placeholder": "​", + "style": "IPY_MODEL_0dc93d50a283472f9ca64fd0a4c6ff15", + "value": " 2260/2260 [00:00<00:00, 21556.99 examples/s]" + } + }, + "4bfa3103048a47989a09a0d90ac6b9bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4e21a567d1f6461985727823b37166e1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fa330d4f0fb241aebd065f6ef4a6892c", + "max": 119, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_cfa1cc6eed8a4f7791a7959308456b6b", + "value": 119 + } + }, + "4ff5af1784904bc9b85515105885e2d8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3fa18e3b50104af796bd0887f556224a", + "max": 111, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_4bfa3103048a47989a09a0d90ac6b9bf", + "value": 111 + } + }, + "505f96bc0c7843bcb1498ba1c1ba5f06": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "50dbf8861ca94b0ba1f4a7e2f0d8aead": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d17c62b889754b5d88cfced5b18ff7a7", + "placeholder": "​", + "style": "IPY_MODEL_990f706db474450ba0997d1dbcd53cb7", + "value": " 269M/269M [00:06<00:00, 43.2MB/s]" + } + }, + "5291041c86db4933816088c047d659d8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "530fc4c2bf1244628af7dea3e4b35cdf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "547151540399460fb9a946bbe67afbd9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "547eeb64ffd34e509c0b8b8ba6d657e2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cbc312cb858b48a5a0f8dbcf60b7e684", + "max": 704, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f70401b6dba74380b19bd1ef887b3bf7", + "value": 704 + } + }, + "55b14c03a41c495aacf8ac2d0f96ba0b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "560ba45d70ca431dadeb327d234c330a": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "58fb913274b54a60a832513c09608a2f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "59d0997b85614384bbfebeee928340b6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5b7b09d983844f7893bdda411f9a0076": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0206fb9662a349c1aa8a6d87ce01c388", + "placeholder": "​", + "style": "IPY_MODEL_881b6196dfa0446e8c55a2420e484b6b", + "value": " 2.10M/2.10M [00:00<00:00, 20.7MB/s]" + } + }, + "5de5dab3d92f4f41838a8f302d27f0c3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "606e39d53ed64967a60337418c71c595": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6227b40396ea4024b3c8710c5e65601f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "635cc2881a1e4b8788bb26c356740e04": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "646484cf7a36444daebe1dfe4a0e4150": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "648c3c820b39493daf0cce5f57a55467": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "67fffe7d6f8c4963972b408529e05532": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_26b15fa18b1b4963a1ba76a76675e7ee", + "placeholder": "​", + "style": "IPY_MODEL_db09ab1f79db4f3a8de77f0348eca0f7", + "value": "train-00000-of-00001.parquet: 100%" + } + }, + "6934c6d1cbac44dbb08f3fffe3056edb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "69f38fecf8ad403898634cfdfadf8925": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "6b72a856e5bd4812a5e0dd0c3bfb8455": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a54ce365be104d27aaa15cf8c63b5ebe", + "placeholder": "​", + "style": "IPY_MODEL_1791220377d141ac9b307246177d0712", + "value": "Generating test split: 100%" + } + }, + "6ceb292f2b8544f2a9a005d16d3e8978": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "70f0eaed6ef14c2db8aecb592edeb1ad": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "72220420f9d340eabec13a01caebc92c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "745fb1db425e44e5b3a23b36ae7675d1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7468a9bc8bda44e5b44574c64fdc6803": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_72220420f9d340eabec13a01caebc92c", + "placeholder": "​", + "style": "IPY_MODEL_55b14c03a41c495aacf8ac2d0f96ba0b", + "value": " 119/119 [00:00<00:00, 2302.28 examples/s]" + } + }, + "7557cd24ba9b4aa3955866d59db94519": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7612cc9b8908471b90c9118151d6e447": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76a0341ebe9f4c3face32460d7023be9": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76bbe8c2beba4c0594085d32a68d2ee7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_05fa0f6eb78b4c56b219b0e57521bd2e", + "max": 2260, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_012aa94e3cf24e32833c6bbca23c52f7", + "value": 2260 + } + }, + "76c1a1cdc9054bbe90d0d3b662cf0ed1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76d306c21214412ab44e542d82e547aa": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "76febcd912404a58add3a39f80a8218d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0bab42beb845475684e9e71dd1591e1d", + "max": 3658, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_89ecd1b28ab64c90afe3b9736fd48306", + "value": 3658 + } + }, + "77d3d81687e6417ab988b04984fc68f4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_17023310de9b4c3ebd8cc03758d59ef9", + "placeholder": "​", + "style": "IPY_MODEL_f3e23f781bce4429954d76bfea97aff4", + "value": "special_tokens_map.json: 100%" + } + }, + "77f6c27c3c854138b4aa9789637141a1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7a0c705334694da6b750104b28db6dba": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7b20c7c8f6be40c6815b8531ecb9c936": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d32017fa83aa44f6b2e3443a602654be", + "placeholder": "​", + "style": "IPY_MODEL_ff8debfb713f4b88be6b9b3bf33bfca2", + "value": "tokenizer.json: 100%" + } + }, + "7eb12d70d2b542a7b651c7680f590279": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7eb91920e4384194a008902d6c4a09c7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8055588a1fa940239c801ef66f3ecf3b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9d1c06ac6b774d82adca58773f389161", + "max": 119, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_31910159cf30463b8246ec47ffd8ab5b", + "value": 119 + } + }, + "84f393468aa74baa903243d238b2d387": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "85de66e1ee3140cf85eadebe5fea1e9f": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "881b6196dfa0446e8c55a2420e484b6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "89ecd1b28ab64c90afe3b9736fd48306": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "8def25e6389f4e6192b517b6e80aa05e": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8e8a0e89a50646c897e546c4077db79e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8fe2df9a14a0436c9124a856ac7419e4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_da1a999fb5af4eae9f6a9d1086cbb4cf", + "placeholder": "​", + "style": "IPY_MODEL_77f6c27c3c854138b4aa9789637141a1", + "value": "vocab.json: 100%" + } + }, + "951f60cddcb84dfdbbdf2058369f0541": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_505f96bc0c7843bcb1498ba1c1ba5f06", + "placeholder": "​", + "style": "IPY_MODEL_635cc2881a1e4b8788bb26c356740e04", + "value": " 9.25k/9.25k [00:00<00:00, 428kB/s]" + } + }, + "96c2aae9198441569362135ad4bcbc98": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "990f706db474450ba0997d1dbcd53cb7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9bea2a23db644ad19b708d10e35d54ee": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c9747e7a810f413ba1ea108307e3ad1d", + "placeholder": "​", + "style": "IPY_MODEL_d0ea49d1d90f4d34bf2ae70efa96946e", + "value": "test-00000-of-00001.parquet: 100%" + } + }, + "9d1c06ac6b774d82adca58773f389161": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a026a32dd6d646bea82c1ebb06147d89": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a13a8f8b702e44ed88c7d358a0a8b4b4": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a54ce365be104d27aaa15cf8c63b5ebe": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a6ee323c13904525a99c6f092ba96e18": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_67fffe7d6f8c4963972b408529e05532", + "IPY_MODEL_0055b6b628934affaf88bc58a1572bb6", + "IPY_MODEL_aafbbb9fc5164fa3a88193bfd33d2f79" + ], + "layout": "IPY_MODEL_606e39d53ed64967a60337418c71c595" + } + }, + "aa2d32cb76ba47ebaa5ea391efbf58a7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7b20c7c8f6be40c6815b8531ecb9c936", + "IPY_MODEL_e90b58981bd34d0e8f975fc1a9658c4c", + "IPY_MODEL_5b7b09d983844f7893bdda411f9a0076" + ], + "layout": "IPY_MODEL_70f0eaed6ef14c2db8aecb592edeb1ad" + } + }, + "aafbbb9fc5164fa3a88193bfd33d2f79": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1cc682330b24431b8812c73041e987d0", + "placeholder": "​", + "style": "IPY_MODEL_dafe748a452148038779f6a62a22a4ec", + "value": " 946k/946k [00:00<00:00, 28.7MB/s]" + } + }, + "add90ed3746d4293a1b71198137a892c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_384d26051c04460e8870a3ffe9406c48", + "placeholder": "​", + "style": "IPY_MODEL_8e8a0e89a50646c897e546c4077db79e", + "value": " 52.6k/52.6k [00:00<00:00, 2.34MB/s]" + } + }, + "addad1c100024c44a0959978153da9a8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_9bea2a23db644ad19b708d10e35d54ee", + "IPY_MODEL_d1174b127571420593971166fbb1966b", + "IPY_MODEL_add90ed3746d4293a1b71198137a892c" + ], + "layout": "IPY_MODEL_8def25e6389f4e6192b517b6e80aa05e" + } + }, + "ae2690497e024095adb3879643cffd33": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f600aa1fe4094133888ec9a2504a60eb", + "IPY_MODEL_efe9a9fcebfe441b80075fbfe9c32674", + "IPY_MODEL_50dbf8861ca94b0ba1f4a7e2f0d8aead" + ], + "layout": "IPY_MODEL_547151540399460fb9a946bbe67afbd9" + } + }, + "b1fcf477db664ccdade4096fb79de327": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b31de9bcf83e4070be09c7d663361232": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b379da78cb34463aa5a72eedc3d176cd": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b3c42d7e25d6494993029531adc3866d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_85de66e1ee3140cf85eadebe5fea1e9f", + "placeholder": "​", + "style": "IPY_MODEL_b31de9bcf83e4070be09c7d663361232", + "value": " 111/111 [00:00<00:00, 3.57kB/s]" + } + }, + "b50c9c4433854cf7a6b2593e946b7faa": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b687aca79e6e470b96254c5e309d6d63": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "b922b90106414644bc0e933f28dea1bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e0a40f83ae2e4ab29376a1d48b53aa6e", + "IPY_MODEL_547eeb64ffd34e509c0b8b8ba6d657e2", + "IPY_MODEL_45675fb5f5c94f8cae575582f7ae41a7" + ], + "layout": "IPY_MODEL_016d5e929f1240cea067372b2191d107" + } + }, + "b9e41ef9e9c54fa7b71bc333604af74e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "bde95b39561145548fc81fb4cc94a1bf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "be4e145938054f13a510fe4d04a7a60d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "bfd11f21f197459b8f27ef364bc9b264": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_745fb1db425e44e5b3a23b36ae7675d1", + "placeholder": "​", + "style": "IPY_MODEL_bde95b39561145548fc81fb4cc94a1bf", + "value": " 801k/801k [00:00<00:00, 5.92MB/s]" + } + }, + "c0177c4ad18740d88acfc603ce4735f8": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ec15d99b3a604405a2b4707931d4bf44", + "placeholder": "​", + "style": "IPY_MODEL_e7f5d507d9564941bb7db742b4bf01c7", + "value": " 466k/466k [00:00<00:00, 3.56MB/s]" + } + }, + "c19f60d4028045399c62004027eaafd9": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_13367fbb763747fa8de94cde40ffae32", + "placeholder": "​", + "style": "IPY_MODEL_b1fcf477db664ccdade4096fb79de327", + "value": "Map: 100%" + } + }, + "c2d74a42fb574b8892d0a288fd92f0a6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_056b9ef5706843b19cd62fce75743afb", + "IPY_MODEL_17b4d81e40564a53bb79be9fbef4918e", + "IPY_MODEL_951f60cddcb84dfdbbdf2058369f0541" + ], + "layout": "IPY_MODEL_646484cf7a36444daebe1dfe4a0e4150" + } + }, + "c9747e7a810f413ba1ea108307e3ad1d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c9836c952b07472880649b82e2347e8d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_76c1a1cdc9054bbe90d0d3b662cf0ed1", + "placeholder": "​", + "style": "IPY_MODEL_e453f1672772400a851735ba64f42c8b", + "value": " 2260/2260 [00:00<00:00, 10845.53 examples/s]" + } + }, + "cb838c5bed994a9a8e6fcf5c98b76d17": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_182abc7ec4d944d9bb2ec1281c98b4c8", + "placeholder": "​", + "style": "IPY_MODEL_6934c6d1cbac44dbb08f3fffe3056edb", + "value": "Map: 100%" + } + }, + "cbc312cb858b48a5a0f8dbcf60b7e684": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cc608dfb880c49d4bc5acf2d691b8ec6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_cb838c5bed994a9a8e6fcf5c98b76d17", + "IPY_MODEL_76bbe8c2beba4c0594085d32a68d2ee7", + "IPY_MODEL_c9836c952b07472880649b82e2347e8d" + ], + "layout": "IPY_MODEL_383db57f997140d482b82b123080837a" + } + }, + "cfa1cc6eed8a4f7791a7959308456b6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d0ea49d1d90f4d34bf2ae70efa96946e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d108e029e743419989e30f64f0c82b90": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6ceb292f2b8544f2a9a005d16d3e8978", + "max": 800662, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_41a27cf0a91246599d4d1b7dae7c7863", + "value": 800662 + } + }, + "d1174b127571420593971166fbb1966b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_59d0997b85614384bbfebeee928340b6", + "max": 52603, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_269920491c134501873e0110367bc984", + "value": 52603 + } + }, + "d1358f6b16644cb3a2328ca639a4a77a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_c19f60d4028045399c62004027eaafd9", + "IPY_MODEL_8055588a1fa940239c801ef66f3ecf3b", + "IPY_MODEL_7468a9bc8bda44e5b44574c64fdc6803" + ], + "layout": "IPY_MODEL_a13a8f8b702e44ed88c7d358a0a8b4b4" + } + }, + "d17c62b889754b5d88cfced5b18ff7a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d32017fa83aa44f6b2e3443a602654be": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d43410dfcc8c4bebb8672f10ed2aeb66": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "d54fb2da9f1f4a89ae962b8816314f43": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_77d3d81687e6417ab988b04984fc68f4", + "IPY_MODEL_fbce0a69847e4099a55d1e39d4118c91", + "IPY_MODEL_1513792bad534a0c9c381a131395c519" + ], + "layout": "IPY_MODEL_69f38fecf8ad403898634cfdfadf8925" + } + }, + "d64d50101891491f96ff80162dc6d26c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d65ec0f0dc0b44e0869c6159e6e82ad6", + "IPY_MODEL_76febcd912404a58add3a39f80a8218d", + "IPY_MODEL_f4ea276bdc0d4da2a04b46e3f1ed95b5" + ], + "layout": "IPY_MODEL_0942430d36de4677b4c2fa771d7bcd2a" + } + }, + "d65ec0f0dc0b44e0869c6159e6e82ad6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_10a0f37020d44156a11e9750778892e0", + "placeholder": "​", + "style": "IPY_MODEL_58fb913274b54a60a832513c09608a2f", + "value": "tokenizer_config.json: 100%" + } + }, + "d8c5dc8df3be4e65b2bbba020d29150f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7a0c705334694da6b750104b28db6dba", + "max": 466391, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_0c336ea5c653434da49e2f0e949f83d0", + "value": 466391 + } + }, + "da1a999fb5af4eae9f6a9d1086cbb4cf": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "dafe748a452148038779f6a62a22a4ec": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "db09ab1f79db4f3a8de77f0348eca0f7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "db3bd55d779947028f36a8b24a2621b6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "dd1a50d4497144388a1809b78bb38f58": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_6b72a856e5bd4812a5e0dd0c3bfb8455", + "IPY_MODEL_4e21a567d1f6461985727823b37166e1", + "IPY_MODEL_ec1efb7598fd496bb170673ae1b8a1df" + ], + "layout": "IPY_MODEL_84f393468aa74baa903243d238b2d387" + } + }, + "de04f344a8d4428e8ba1836a563d8aa1": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e0a40f83ae2e4ab29376a1d48b53aa6e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a026a32dd6d646bea82c1ebb06147d89", + "placeholder": "​", + "style": "IPY_MODEL_0479fd3fc1ba476ab46f8c0a98f89468", + "value": "config.json: 100%" + } + }, + "e15fc503bb73476980cedb5f06b51ced": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5de5dab3d92f4f41838a8f302d27f0c3", + "placeholder": "​", + "style": "IPY_MODEL_471b481a3e5b4d439ab31fdc49fc99c7", + "value": "merges.txt: 100%" + } + }, + "e2ab3cb38b5a41f68d18ed5f0e6ae22c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "e2f0c39ce1c046e8acb150dfbfaf5aa8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e453f1672772400a851735ba64f42c8b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e7f5d507d9564941bb7db742b4bf01c7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e90b58981bd34d0e8f975fc1a9658c4c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ed577dea3ac54884a637ad775b42bc68", + "max": 2104556, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_d43410dfcc8c4bebb8672f10ed2aeb66", + "value": 2104556 + } + }, + "ea1f9cb22abf4e7d9f6e76fc86c03387": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "eb570fd159124e2cbd2df9335b3f9cd6": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ec15d99b3a604405a2b4707931d4bf44": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ec1efb7598fd496bb170673ae1b8a1df": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b50c9c4433854cf7a6b2593e946b7faa", + "placeholder": "​", + "style": "IPY_MODEL_7557cd24ba9b4aa3955866d59db94519", + "value": " 119/119 [00:00<00:00, 3547.77 examples/s]" + } + }, + "ed577dea3ac54884a637ad775b42bc68": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "efe9a9fcebfe441b80075fbfe9c32674": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_04ae3f7b640c42f3a8eb1977cd1a585d", + "max": 269060552, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_db3bd55d779947028f36a8b24a2621b6", + "value": 269060552 + } + }, + "f0b271bcac6c43a9aaddac54259bb514": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f3e23f781bce4429954d76bfea97aff4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f4ea276bdc0d4da2a04b46e3f1ed95b5": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_be4e145938054f13a510fe4d04a7a60d", + "placeholder": "​", + "style": "IPY_MODEL_648c3c820b39493daf0cce5f57a55467", + "value": " 3.66k/3.66k [00:00<00:00, 197kB/s]" + } + }, + "f600aa1fe4094133888ec9a2504a60eb": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5291041c86db4933816088c047d659d8", + "placeholder": "​", + "style": "IPY_MODEL_48724ba7ba4e4f00923445245640739f", + "value": "model.safetensors: 100%" + } + }, + "f70401b6dba74380b19bd1ef887b3bf7": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "fa330d4f0fb241aebd065f6ef4a6892c": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fbce0a69847e4099a55d1e39d4118c91": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_530fc4c2bf1244628af7dea3e4b35cdf", + "max": 831, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_96c2aae9198441569362135ad4bcbc98", + "value": 831 + } + }, + "ff60308921f9432683acbcd6d29fb78f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_3bc8f6339f4e4a3b961d810255c5573e", + "IPY_MODEL_4780ad263ec04b1a97525d985e102049", + "IPY_MODEL_488feef55878426bbf1c753c6d58735b" + ], + "layout": "IPY_MODEL_560ba45d70ca431dadeb327d234c330a" + } + }, + "ff8debfb713f4b88be6b9b3bf33bfca2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/chapters/en/chapter11/sft_finetuning_example.ipynb b/chapters/en/chapter11/sft_finetuning_example.ipynb new file mode 100644 index 000000000..d18479a91 --- /dev/null +++ b/chapters/en/chapter11/sft_finetuning_example.ipynb @@ -0,0 +1,273 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Supervised Fine-Tuning with SFTTrainer\n", + "\n", + "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n", + "\n", + "
\n", + "

Exercise: Fine-Tuning SmolLM2 with SFTTrainer

\n", + "

Take a dataset from the Hugging Face hub and finetune a model on it.

\n", + "

Difficulty Levels

\n", + "

🐢 Use the `HuggingFaceTB/smoltalk` dataset

\n", + "

🐕 Try out the `bigcode/the-stack-smol` dataset and finetune a code generation model on a specific subset `data/python`.

\n", + "

🦁 Select a dataset that relates to a real world use case your interested in

\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Install the requirements in Google Colab\n", + "# !pip install transformers datasets trl huggingface_hub\n", + "\n", + "# Authenticate to Hugging Face\n", + "\n", + "from huggingface_hub import login\n", + "login()\n", + "\n", + "# for convenience you can create an environment variable containing your hub token as HF_TOKEN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import necessary libraries\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer\n", + "from datasets import load_dataset\n", + "from trl import SFTConfig, SFTTrainer, setup_chat_format\n", + "import torch\n", + "\n", + "device = (\n", + " \"cuda\"\n", + " if torch.cuda.is_available()\n", + " else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n", + ")\n", + "\n", + "# Load the model and tokenizer\n", + "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n", + "model = AutoModelForCausalLM.from_pretrained(\n", + " pretrained_model_name_or_path=model_name\n", + ").to(device)\n", + "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n", + "\n", + "# Set up the chat format\n", + "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)\n", + "\n", + "# Set our name for the finetune to be saved &/ uploaded to\n", + "finetune_name = \"SmolLM2-FT-MyDataset\"\n", + "finetune_tags = [\"smol-course\", \"module_1\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Generate with the base model\n", + "\n", + "Here we will try out the base model which does not have a chat template. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Let's test the base model before training\n", + "prompt = \"Write a haiku about programming\"\n", + "\n", + "# Format with template\n", + "messages = [{\"role\": \"user\", \"content\": prompt}]\n", + "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n", + "\n", + "# Generate response\n", + "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n", + "outputs = model.generate(**inputs, max_new_tokens=100)\n", + "print(\"Before training:\")\n", + "print(tokenizer.decode(outputs[0], skip_special_tokens=True))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset Preparation\n", + "\n", + "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n", + "\n", + "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load a sample dataset\n", + "from datasets import load_dataset\n", + "\n", + "# TODO: define your dataset and config using the path and name parameters\n", + "ds = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. Refer to the [module](../chat_templates.md)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configuring the SFTTrainer\n", + "\n", + "The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Configure the SFTTrainer\n", + "sft_config = SFTConfig(\n", + " output_dir=\"./sft_output\",\n", + " max_steps=1000, # Adjust based on dataset size and desired training duration\n", + " per_device_train_batch_size=4, # Set according to your GPU memory capacity\n", + " learning_rate=5e-5, # Common starting point for fine-tuning\n", + " logging_steps=10, # Frequency of logging training metrics\n", + " save_steps=100, # Frequency of saving model checkpoints\n", + " evaluation_strategy=\"steps\", # Evaluate the model at regular intervals\n", + " eval_steps=50, # Frequency of evaluation\n", + " use_mps_device=(\n", + " True if device == \"mps\" else False\n", + " ), # Use MPS for mixed precision training\n", + " hub_model_id=finetune_name, # Set a unique name for your model\n", + ")\n", + "\n", + "# Initialize the SFTTrainer\n", + "trainer = SFTTrainer(\n", + " model=model,\n", + " args=sft_config,\n", + " train_dataset=ds[\"train\"],\n", + " tokenizer=tokenizer,\n", + " eval_dataset=ds[\"test\"],\n", + ")\n", + "\n", + "# TODO: 🦁 🐕 align the SFTTrainer params with your chosen dataset. For example, if you are using the `bigcode/the-stack-smol` dataset, you will need to choose the `content` column`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training the Model\n", + "\n", + "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Train the model\n", + "trainer.train()\n", + "\n", + "# Save the model\n", + "trainer.save_model(f\"./{finetune_name}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "trainer.push_to_hub(tags=finetune_tags)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

Bonus Exercise: Generate with fine-tuned model

\n", + "

🐕 Use the fine-tuned to model generate a response, just like with the base example..

\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Test the fine-tuned model on the same prompt\n", + "\n", + "# Let's test the base model before training\n", + "prompt = \"Write a haiku about programming\"\n", + "\n", + "# Format with template\n", + "messages = [{\"role\": \"user\", \"content\": prompt}]\n", + "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n", + "\n", + "# Generate response\n", + "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n", + "\n", + "# TODO: use the fine-tuned to model generate a response, just like with the base example." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 💐 You're done!\n", + "\n", + "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", + "\n", + "- Try this notebook on a harder difficulty\n", + "- Review a colleagues PR\n", + "- Improve the course material via an Issue or PR." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "py310", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.15" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/chapters/en/chapter11/supervised_fine_tuning.md b/chapters/en/chapter11/supervised_fine_tuning.md new file mode 100644 index 000000000..dc236962e --- /dev/null +++ b/chapters/en/chapter11/supervised_fine_tuning.md @@ -0,0 +1,41 @@ +# Supervised Fine-Tuning + +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on carefully curated datasets with human-validated examples. + +## Understanding Supervised Fine-Tuning + +At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. + +SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. + +## When to Use Supervised Fine-Tuning + +The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. + +For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise. + +## The Fine-Tuning Process + +The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. + +First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. + +Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. In [module 4](../4_evaluation), we'll cover how to evaluate your model. + +## The Role of SFT in Preference Alignment + +SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations. + +## Supervised Fine-Tuning With Transformer Reinforcement Learning + +A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models models using reinforcement learning (RL). + +Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo. + +# Next Steps + +Try out the following tutorials to get hands on experience with SFT using TRL: + +⏭️ [Chat Templates Tutorial](./notebooks/chat_templates_example.ipynb) + +⏭️ [Supervised Fine-Tuning Tutorial](./notebooks/sft_finetuning_example.ipynb) From 995493bac10a8d7356747d12be1b7543b042ca1f Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Thu, 30 Jan 2025 14:42:00 +0100 Subject: [PATCH 02/30] convert smol course material into nlp course style --- chapters/en/chapter11/1.mdx | 34 + chapters/en/chapter11/10.mdx | 56 + chapters/en/chapter11/11.mdx | 13 + .../en/chapter11/{chat_templates.md => 2.mdx} | 58 +- chapters/en/chapter11/3.mdx | 83 + chapters/en/chapter11/4.mdx | 125 + chapters/en/chapter11/5.mdx | 84 + chapters/en/chapter11/6.mdx | 109 + chapters/en/chapter11/7.mdx | 120 + chapters/en/chapter11/8.mdx | 47 + chapters/en/chapter11/9.mdx | 185 + chapters/en/chapter11/README.md | 30 - .../en/chapter11/chat_templates_example.ipynb | 5741 ----------------- .../en/chapter11/sft_finetuning_example.ipynb | 273 - .../en/chapter11/supervised_fine_tuning.md | 41 - 15 files changed, 861 insertions(+), 6138 deletions(-) create mode 100644 chapters/en/chapter11/1.mdx create mode 100644 chapters/en/chapter11/10.mdx create mode 100644 chapters/en/chapter11/11.mdx rename chapters/en/chapter11/{chat_templates.md => 2.mdx} (55%) create mode 100644 chapters/en/chapter11/3.mdx create mode 100644 chapters/en/chapter11/4.mdx create mode 100644 chapters/en/chapter11/5.mdx create mode 100644 chapters/en/chapter11/6.mdx create mode 100644 chapters/en/chapter11/7.mdx create mode 100644 chapters/en/chapter11/8.mdx create mode 100644 chapters/en/chapter11/9.mdx delete mode 100644 chapters/en/chapter11/README.md delete mode 100644 chapters/en/chapter11/chat_templates_example.ipynb delete mode 100644 chapters/en/chapter11/sft_finetuning_example.ipynb delete mode 100644 chapters/en/chapter11/supervised_fine_tuning.md diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx new file mode 100644 index 000000000..5acfb97fd --- /dev/null +++ b/chapters/en/chapter11/1.mdx @@ -0,0 +1,34 @@ +# Supervised Fine-Tuning + +This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. We will separate this chapter into three sections: + +## 1️⃣ Chat Templates + +Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. + +## 2️⃣ Supervised Fine-Tuning + +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices. + +## 3️⃣ Low Rank Adaptation (LoRA) + +Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge. + + +## 4️⃣ Evaluation + +Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset. + + +⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend creating an account. + + +## References + +- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) +- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) +- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) +- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) +- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma) +- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) diff --git a/chapters/en/chapter11/10.mdx b/chapters/en/chapter11/10.mdx new file mode 100644 index 000000000..bd307e7e9 --- /dev/null +++ b/chapters/en/chapter11/10.mdx @@ -0,0 +1,56 @@ +# Implementing Evaluation + +In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. + +LightEval tasks are defined using a specific format: + +``` +{suite}|{task}|{num_few_shot}|{auto_reduce} +``` + +| Parameter | Description | +|-----------|-------------| +| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') | +| `task` | Specific task within the suite (e.g., 'abstract_algebra') | +| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) | +| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) | + +Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. + +## Example Evaluation Pipeline + +Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on set of sub tasks that relate to the domain of medicine. + +Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend: + +```bash +lighteval vllm \ + "pretrained=your-model-name" \ + "mmlu|anatomy|0|0" \ + "mmlu|high_school_biology|0|0" \ + "mmlu|high_school_chemistry|0|0" \ + "mmlu|professional_medicine|0|0" \ + --max_samples 40 \ + --batch_size 1 \ + --output_path "./results" \ + --save_generations true +``` + +Results are displayed in a tabular format showing: + +``` +| Task |Version|Metric|Value | |Stderr| +|----------------------------------------|------:|------|-----:|---|-----:| +|all | |acc |0.3333|± |0.1169| +|leaderboard:mmlu:_average:5 | |acc |0.3400|± |0.1121| +|leaderboard:mmlu:anatomy:5 | 0|acc |0.4500|± |0.1141| +|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| +``` + +Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information. + + + +✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval. + + \ No newline at end of file diff --git a/chapters/en/chapter11/11.mdx b/chapters/en/chapter11/11.mdx new file mode 100644 index 000000000..093de47d6 --- /dev/null +++ b/chapters/en/chapter11/11.mdx @@ -0,0 +1,13 @@ +# Conclusion + +In this chapter, we explored the essential components of fine-tuning language models: + +1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting. + +2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge. + +3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance. + +4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks. + +These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation. diff --git a/chapters/en/chapter11/chat_templates.md b/chapters/en/chapter11/2.mdx similarity index 55% rename from chapters/en/chapter11/chat_templates.md rename to chapters/en/chapter11/2.mdx index 61ff65e6f..87a82f026 100644 --- a/chapters/en/chapter11/chat_templates.md +++ b/chapters/en/chapter11/2.mdx @@ -12,7 +12,7 @@ It's important to note that a base model could be fine-tuned on different chat t ## Understanding Chat Templates -At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template: +At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template: ```sh <|im_start|>user @@ -49,7 +49,7 @@ system_message = { ## Conversations -Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations: +Chat templates can maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations: ```python conversation = [ @@ -59,56 +59,8 @@ conversation = [ ] ``` -## Implementation with Transformers + -The transformers library provides built-in support for chat templates. Here's how to use them: +✏️ **Try it out!** Create a chat template for a conversation between a user and an assistant. Then, use the `transformers` library to tokenize the conversation and see how the model responds. You won't need to download the model to do this, as the tokenizer will handle the formatting. -```python -from transformers import AutoTokenizer - -tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") - -messages = [ - {"role": "system", "content": "You are a helpful coding assistant."}, - {"role": "user", "content": "Write a Python function to sort a list"}, -] - -# Apply the chat template -formatted_chat = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True -) -``` - -## Custom Formatting -You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles: - -```python -template = """ -<|system|>{system_message} -<|user|>{user_message} -<|assistant|>{assistant_message} -""".lstrip() -``` - -## Multi-Turn Support - -Templates can handle complex multi-turn conversations while maintaining context: - -```python -messages = [ - {"role": "system", "content": "You are a math tutor."}, - {"role": "user", "content": "What is calculus?"}, - {"role": "assistant", "content": "Calculus is a branch of mathematics..."}, - {"role": "user", "content": "Can you give me an example?"}, -] -``` - -⏭️ [Next: Supervised Fine-Tuning](./supervised_fine_tuning.md) - -## Resources - -- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Transformers Documentation](https://huggingface.co/docs/transformers) -- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) + diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx new file mode 100644 index 000000000..baa3972c1 --- /dev/null +++ b/chapters/en/chapter11/3.mdx @@ -0,0 +1,83 @@ +# Implementation with Transformers + +Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages. + +```python +from transformers import AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") + +messages = [ + {"role": "system", "content": "You are a helpful coding assistant."}, + {"role": "user", "content": "Write a Python function to sort a list"}, +] + +# Apply the chat template +formatted_chat = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True +) +``` + +This will return a formatted string that can be passed to the model. It would look like this for the SmolLM2-135M-Instruct model specified: + +```sh +<|im_start|>system +You are a helpful coding assistant.<|im_end|> +<|im_start|>user +Write a Python function to sort a list<|im_end|> +``` + +Note that the `im_start` and `im_end` tokens are used to indicate the start and end of a message. The tokenizer will also have corresponding special tokens for the start and end of messages. For a refresher on how these tokens work, see the [Tokenizers](../chapter2/5.mdx) section. + +Chat templates can handle multi-turn conversations while maintaining context: + +```python +messages = [ + {"role": "system", "content": "You are a math tutor."}, + {"role": "user", "content": "What is calculus?"}, + {"role": "assistant", "content": "Calculus is a branch of mathematics..."}, + {"role": "user", "content": "Can you give me an example?"}, +] +``` + +## Working with Chat Templates + +When working with chat templates, you have several options for processing the conversation: + +1. Apply the template without tokenization to return the raw formatted string +2. Apply the template with tokenization to return the token IDs +3. Add a generation prompt to prepare for model inference + +The tokenizer's `apply_chat_template()` method handles all these cases through its parameters: + +- `tokenize`: Whether to return token IDs (True) or the formatted string (False) +- `add_generation_prompt`: Whether to add a prompt for the model to generate a response + + + +✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file. + +For this exercise, you'll need to: +1. Load the dataset using the Hugging Face datasets library +2. Create a processing function that converts the samples into the correct chat format +3. Apply the chat template using the tokenizer's methods + + + +## Conclusion + +Chat templates are a crucial component for working with language models, especially when fine-tuning or deploying models for chat applications. They provide structure and consistency to conversations, making it easier for models to understand context and generate appropriate responses. + +Understanding how to work with chat templates is essential for: +- Converting datasets for fine-tuning +- Preparing inputs for model inference +- Maintaining conversation context +- Ensuring consistent model behavior + +## Resources + +- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Transformers Documentation](https://huggingface.co/docs/transformers) +- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx new file mode 100644 index 000000000..5ae7b7605 --- /dev/null +++ b/chapters/en/chapter11/4.mdx @@ -0,0 +1,125 @@ +# Supervised Fine-Tuning + +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. + +Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections. + +## Understanding Supervised Fine-Tuning + +Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. + +SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. + +## When to Use Supervised Fine-Tuning + +The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. + +Two core reasons to use SFT are: + +1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs. + +2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise. + +## Quiz + +### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)? + + + +### 2. Which of the following are valid reasons to use SFT? + + + +### 3. What is required for effective Supervised Fine-Tuning? + + + +### 4. How does SFT relate to chat templates? + + + +### 5. What distinguishes SFT from pre-training? + + \ No newline at end of file diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx new file mode 100644 index 000000000..d19d3f5c5 --- /dev/null +++ b/chapters/en/chapter11/5.mdx @@ -0,0 +1,84 @@ +# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning + +The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Let's work through the process step by step. + +## Dataset Preparation + +Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. + + + +## Training Configuration + +We will configure SFT trainer with the following parameters: + +| Parameter | Description | +|-----------|-------------| +| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) | +| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) | +| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size | +| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) | +| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations | +| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) | +| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | +| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | + +In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. + +## Training and Evaluation + +Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. + +- Iterating over the dataset +- Computing the loss +- Updating the model's parameters +- Regular evaluation on a validation set + +Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. + +## `SFTTrainer` from Transformer Reinforcement Learning + +Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). + +Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. + +```python +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer + +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") + +training_args = SFTConfig( + max_seq_length=512, + output_dir="/tmp", +) + +trainer = SFTTrainer( + model_name="HuggingFaceTB/SmolLM2-135M", + train_dataset=dataset, + args=training_args, +) +trainer.train() +``` + + + +✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. + +For this exercise, you'll need to: +1. Load and prepare your chosen dataset +2. Configure the SFTTrainer with appropriate parameters +3. Train the model and monitor its progress +4. Save and evaluate the fine-tuned model + + + +## Resources + +- [TRL Documentation](https://huggingface.co/docs/trl) +- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx new file mode 100644 index 000000000..a81bfe762 --- /dev/null +++ b/chapters/en/chapter11/6.mdx @@ -0,0 +1,109 @@ +# Supervised Fine-Tuning with SFTTrainer + +This page demonstrates how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets. + +## Load the base model + +Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. + +```python +# Import necessary libraries +from transformers import AutoModelForCausalLM, AutoTokenizer +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer, setup_chat_format +import torch + +# Set the device to use for training +device = ( + "cuda" + if torch.cuda.is_available() + else "mps" if torch.backends.mps.is_available() else "cpu" +) + +# Load the model and tokenizer +model = AutoModelForCausalLM.from_pretrained( + pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" +).to(device) +tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) + +# Set up the chat format +model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) + +# Set our name for the finetune to be saved &/ uploaded to +finetune_name = "SmolLM2-FT-MyDataset" +finetune_tags = ["smol-course", "module_1"] +``` + +## Generate with the base model + +First we will try out the base model which does not have a chat template. + +```python +# Let's test the base model before training +prompt = "Write a haiku about programming" + +# Format with template +messages = [{"role": "user", "content": prompt}] +formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) + +# Generate response +inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) +outputs = model.generate(**inputs, max_new_tokens=100) +``` + +## Dataset Preparation + +We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. + +**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. + +```python +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +``` + +## Configuring the SFTTrainer + +The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. + +```python +# Configure the SFTTrainer +sft_config = SFTConfig( + output_dir="./sft_output", + max_steps=1000, # Adjust based on dataset size and desired training duration + per_device_train_batch_size=4, # Set according to your GPU memory capacity + learning_rate=5e-5, # Common starting point for fine-tuning + logging_steps=10, # Frequency of logging training metrics + save_steps=100, # Frequency of saving model checkpoints + evaluation_strategy="steps", # Evaluate the model at regular intervals + eval_steps=50, # Frequency of evaluation + use_mps_device=( + True if device == "mps" else False + ), # Use MPS for mixed precision training + hub_model_id=finetune_name, # Set a unique name for your model +) + +# Initialize the SFTTrainer +trainer = SFTTrainer( + model=model, + args=sft_config, + train_dataset=ds["train"], + tokenizer=tokenizer, + eval_dataset=ds["test"], +) +``` + +## Training the model + +With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. + +```python +trainer.train() +``` + +## 💐 Nice work! + +This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: + +- Try this notebook on a harder difficulty +- Review a colleagues PR +- Improve the course material via an Issue or PR. diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx new file mode 100644 index 000000000..a47bd1f42 --- /dev/null +++ b/chapters/en/chapter11/7.mdx @@ -0,0 +1,120 @@ +# LoRA (Low-Rank Adaptation) + +Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. + +## Understanding LoRA + +LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). + +LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. + +## Key advantages of LoRA + +1. **Memory Efficiency**: + - Only adapter parameters are stored in GPU memory + - Base model weights remain frozen and can be loaded in lower precision + - Enables fine-tuning of large models on consumer GPUs + +2. **Training Features**: + - Native PEFT/LoRA integration with minimal setup + - Support for QLoRA (Quantized LoRA) for even better memory efficiency + +3. **Adapter Management**: + - Adapter weight saving during checkpoints + - Features to merge adapters back into base model + +## Loading LoRA Adapters with PEFT + +PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. + +Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. + +```python +from transformers import AutoModelForCausalLM +from peft import PeftModel + +base_model = AutoModelForCausalLM.from_pretrained("") +peft_model_id = "" +model = PeftModel.from_pretrained(base_model, peft_model_id) +``` + + +![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) + +## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA + +The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train. + +We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps: + +1. Define the LoRA configuration (rank, alpha, dropout) +2. Create the SFTTrainer with PEFT config +3. Train and save the adapter weights + +## LoRA Configuration + +Let's walk through the LoRA configuration and key parameters. + +| Parameter | Description | +|-----------|-------------| +| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. | +| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. | +| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. | +| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | +| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | + +When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. + +## Using TRL with PEFT + +PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the `LoraConfig` to the model when loading it. + +```python +from peft import LoraConfig + +# TODO: Configure LoRA parameters +# r: rank dimension for LoRA update matrices (smaller = more compression) +rank_dimension = 6 +# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation) +lora_alpha = 8 +# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) +lora_dropout = 0.05 + +peft_config = LoraConfig( + r=rank_dimension, # Rank dimension - typically between 4-32 + lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank + lora_dropout=lora_dropout, # Dropout probability for LoRA layers + bias="none", # Bias type for LoRA. the corresponding biases will be updated during training. + target_modules="all-linear", # Which modules to apply LoRA to + task_type="CAUSAL_LM", # Task type for model architecture +) +``` + +Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. + +We will also need to define the `SFTTrainer` with the LoRA configuration. + +```python +# Create SFTTrainer with LoRA configuration +trainer = SFTTrainer( + model=model, + args=args, + train_dataset=dataset["train"], + peft_config=lora_config, # LoRA configuration + max_seq_length=max_seq_length, # Maximum sequence length + tokenizer=tokenizer, + +) +``` + + + +✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. + + + +# Resources + +- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) +- [PEFT Documentation](https://huggingface.co/docs/peft) +- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx new file mode 100644 index 000000000..acc3fdabf --- /dev/null +++ b/chapters/en/chapter11/8.mdx @@ -0,0 +1,47 @@ +## Merging LoRA Adapters + +After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. + +The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will help with automatic memory management. Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. Before deploying, always validate the merged model by comparing its outputs and performance metrics with the adapter-based version. + +## Merging Implementation + +After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: + +```python +import torch +from transformers import AutoModelForCausalLM +from peft import PeftModel + +# 1. Load the base model +base_model = AutoModelForCausalLM.from_pretrained( + "base_model_name", + torch_dtype=torch.float16, + device_map="auto" +) + +# 2. Load the PEFT model with adapter +peft_model = PeftModel.from_pretrained( + base_model, + "path/to/adapter", + torch_dtype=torch.float16 +) + +# 3. Merge adapter weights with base model +merged_model = peft_model.merge_and_unload() +``` + +If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: + +```python +# Save both model and tokenizer +tokenizer = AutoTokenizer.from_pretrained("base_model_name") +merged_model.save_pretrained("path/to/save/merged_model") +tokenizer.save_pretrained("path/to/save/merged_model") +``` + + + +✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. + + diff --git a/chapters/en/chapter11/9.mdx b/chapters/en/chapter11/9.mdx new file mode 100644 index 000000000..df0b28765 --- /dev/null +++ b/chapters/en/chapter11/9.mdx @@ -0,0 +1,185 @@ +# Evaluation + +With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. + +## Automatic Benchmarks + +Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. + +## Understanding Automatic Benchmarks + +Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. + +However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. + +## General Knowledge Benchmarks + +MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. + +## Reasoning Benchmarks +BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. + +## Language Understanding +HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. + +## Alternative Evaluation Approaches + +Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: + +### LLM-as-Judge +Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. + +### Evaluation Arenas +Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. + +### Custom Benchmark Suites +Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. + +## Creating Your Own Evaluation Strategy + +Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. + +While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: + +1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. + +2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? + +3. Develop custom evaluation datasets that reflect your actual use case. This might include: + - Real user queries from your domain + - Common edge cases you've encountered + - Examples of particularly challenging scenarios + +4. Consider implementing a multi-layered evaluation strategy: + - Automated metrics for quick feedback + - Human evaluation for nuanced understanding + - Domain expert review for specialized applications + - A/B testing in controlled environments + +# End-of-chapter quiz[[end-of-chapter-quiz]] + + + +### 1. What are the main advantages of using automatic benchmarks for model evaluation? + + + +### 2. Which benchmark specifically tests knowledge across 57 different subjects? + + + +### 3. What is LLM-as-Judge? + + + +### 4. What should be included in a comprehensive evaluation strategy? + + + +### 5. What is a limitation of automatic benchmarks? + + + +### 6. What is the purpose of creating custom evaluation datasets? + + + diff --git a/chapters/en/chapter11/README.md b/chapters/en/chapter11/README.md deleted file mode 100644 index a7fae79c6..000000000 --- a/chapters/en/chapter11/README.md +++ /dev/null @@ -1,30 +0,0 @@ -# Instruction Tuning - -This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. - -In this module, we will explore two topics: 1) Chat Templates and 2) Supervised Fine-Tuning. - -## 1️⃣ Chat Templates - -Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the [Chat Templates](./chat_templates.md) section. - -## 2️⃣ Supervised Fine-Tuning - -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the [Supervised Fine-Tuning](./supervised_fine_tuning.md) page. - -## Exercise Notebooks - -| Title | Description | Exercise | Link | Colab | -|-------|-------------|----------|------|-------| -| Chat Templates | Learn how to use chat templates with SmolLM2 and process datasets into chatml format | 🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format
🐕 Convert the `openai/gsm8k` dataset into chatml format | [Notebook](./notebooks/chat_templates_example.ipynb) | Open In Colab | -| Supervised Fine-Tuning | Learn how to fine-tune SmolLM2 using the SFTTrainer | 🐢 Use the `HuggingFaceTB/smoltalk` dataset
🐕 Try out the `bigcode/the-stack-smol` dataset
🦁 Select a dataset for a real world use case | [Notebook](./notebooks/sft_finetuning_example.ipynb) | Open In Colab | - -## References - -- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) -- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) -- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) -- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) -- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma) -- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) diff --git a/chapters/en/chapter11/chat_templates_example.ipynb b/chapters/en/chapter11/chat_templates_example.ipynb deleted file mode 100644 index 88f60c1e4..000000000 --- a/chapters/en/chapter11/chat_templates_example.ipynb +++ /dev/null @@ -1,5741 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "vZAvFVIAtFlq" - }, - "source": [ - "# Exploring Chat Templates with SmolLM2\n", - "\n", - "This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "K-lZu8JvtwUN", - "outputId": "c3871418-15bc-4265-ae8d-6d6036036d0e" - }, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "c15d320002504d95bb86e87f50d43b08", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "VBox(children=(HTML(value='
user\n", - "Hello, how are you?<|im_end|>\n", - "<|im_start|>assistant\n", - "I'm doing well, thank you! How can I assist you today?<|im_end|>\n", - "\n" - ] - } - ], - "source": [ - "input_text = tokenizer.apply_chat_template(messages, tokenize=False)\n", - "\n", - "print(\"Conversation with template:\", input_text)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sfvdglOqtFls" - }, - "source": [ - "# Decode the conversation\n", - "\n", - "Note that the conversation is represented as above but with a further assistant message.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "mXUVdPeytFls", - "outputId": "80870e53-7bc1-426e-ac33-ba6748e030fc" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Conversation decoded: <|im_start|>user\n", - "Hello, how are you?<|im_end|>\n", - "<|im_start|>assistant\n", - "I'm doing well, thank you! How can I assist you today?<|im_end|>\n", - "<|im_start|>assistant\n", - "\n" - ] - } - ], - "source": [ - "input_text = tokenizer.apply_chat_template(\n", - " messages, tokenize=True, add_generation_prompt=True\n", - ")\n", - "\n", - "print(\"Conversation decoded:\", tokenizer.decode(token_ids=input_text))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "UcZQpspEtFlt" - }, - "source": [ - "# Tokenize the conversation\n", - "\n", - "Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "jc2PLxAMtFlt", - "outputId": "d2098780-b3f4-41ec-a1f3-b6da2b593c62" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]\n" - ] - } - ], - "source": [ - "input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)\n", - "\n", - "print(\"Conversation tokenized:\", input_text)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "m3eNp9a0tFlt" - }, - "source": [ - "
\n", - "

Exercise: Process a dataset for SFT

\n", - "

Take a dataset from the Hugging Face hub and process it for SFT.

\n", - "

Difficulty Levels

\n", - "

🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.

\n", - "

🐕 Convert the `openai/gsm8k` dataset into chatml format.

\n", - "
" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 381 - }, - "id": "qbkXV2_ItFlt", - "outputId": "06deadc3-2c63-4660-d2bd-05096ef07c9f" - }, - "outputs": [ - { - "data": { - "text/html": [ - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from IPython.core.display import display, HTML\n", - "\n", - "display(\n", - " HTML(\n", - " \"\"\"\n", - "\"\"\"\n", - " )\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 241, - "referenced_widgets": [ - "c2d74a42fb574b8892d0a288fd92f0a6", - "056b9ef5706843b19cd62fce75743afb", - "17b4d81e40564a53bb79be9fbef4918e", - "951f60cddcb84dfdbbdf2058369f0541", - "646484cf7a36444daebe1dfe4a0e4150", - "e2f0c39ce1c046e8acb150dfbfaf5aa8", - "7eb12d70d2b542a7b651c7680f590279", - "ea1f9cb22abf4e7d9f6e76fc86c03387", - "00c9f5ca71b84df4b26acee72c97fefb", - "505f96bc0c7843bcb1498ba1c1ba5f06", - "635cc2881a1e4b8788bb26c356740e04", - "a6ee323c13904525a99c6f092ba96e18", - "67fffe7d6f8c4963972b408529e05532", - "0055b6b628934affaf88bc58a1572bb6", - "aafbbb9fc5164fa3a88193bfd33d2f79", - "606e39d53ed64967a60337418c71c595", - "26b15fa18b1b4963a1ba76a76675e7ee", - "db09ab1f79db4f3a8de77f0348eca0f7", - "de04f344a8d4428e8ba1836a563d8aa1", - "03c09673186046d799d6f487d6623e6b", - "1cc682330b24431b8812c73041e987d0", - "dafe748a452148038779f6a62a22a4ec", - "addad1c100024c44a0959978153da9a8", - "9bea2a23db644ad19b708d10e35d54ee", - "d1174b127571420593971166fbb1966b", - "add90ed3746d4293a1b71198137a892c", - "8def25e6389f4e6192b517b6e80aa05e", - "c9747e7a810f413ba1ea108307e3ad1d", - "d0ea49d1d90f4d34bf2ae70efa96946e", - "59d0997b85614384bbfebeee928340b6", - "269920491c134501873e0110367bc984", - "384d26051c04460e8870a3ffe9406c48", - "8e8a0e89a50646c897e546c4077db79e", - "ff60308921f9432683acbcd6d29fb78f", - "3bc8f6339f4e4a3b961d810255c5573e", - "4780ad263ec04b1a97525d985e102049", - "488feef55878426bbf1c753c6d58735b", - "560ba45d70ca431dadeb327d234c330a", - "04d0a6f74af346f7bc696951949063c8", - "2a18ce941b0f4cef8307988ef898b47f", - "194e3fda3635466b998f96e3dc22746a", - "e2ab3cb38b5a41f68d18ed5f0e6ae22c", - "f0b271bcac6c43a9aaddac54259bb514", - "0dc93d50a283472f9ca64fd0a4c6ff15", - "dd1a50d4497144388a1809b78bb38f58", - "6b72a856e5bd4812a5e0dd0c3bfb8455", - "4e21a567d1f6461985727823b37166e1", - "ec1efb7598fd496bb170673ae1b8a1df", - "84f393468aa74baa903243d238b2d387", - "a54ce365be104d27aaa15cf8c63b5ebe", - "1791220377d141ac9b307246177d0712", - "fa330d4f0fb241aebd065f6ef4a6892c", - "cfa1cc6eed8a4f7791a7959308456b6b", - "b50c9c4433854cf7a6b2593e946b7faa", - "7557cd24ba9b4aa3955866d59db94519", - "cc608dfb880c49d4bc5acf2d691b8ec6", - "cb838c5bed994a9a8e6fcf5c98b76d17", - "76bbe8c2beba4c0594085d32a68d2ee7", - "c9836c952b07472880649b82e2347e8d", - "383db57f997140d482b82b123080837a", - "182abc7ec4d944d9bb2ec1281c98b4c8", - "6934c6d1cbac44dbb08f3fffe3056edb", - "05fa0f6eb78b4c56b219b0e57521bd2e", - "012aa94e3cf24e32833c6bbca23c52f7", - "76c1a1cdc9054bbe90d0d3b662cf0ed1", - "e453f1672772400a851735ba64f42c8b", - "d1358f6b16644cb3a2328ca639a4a77a", - "c19f60d4028045399c62004027eaafd9", - "8055588a1fa940239c801ef66f3ecf3b", - "7468a9bc8bda44e5b44574c64fdc6803", - "a13a8f8b702e44ed88c7d358a0a8b4b4", - "13367fbb763747fa8de94cde40ffae32", - "b1fcf477db664ccdade4096fb79de327", - "9d1c06ac6b774d82adca58773f389161", - "31910159cf30463b8246ec47ffd8ab5b", - "72220420f9d340eabec13a01caebc92c", - "55b14c03a41c495aacf8ac2d0f96ba0b" - ] - }, - "id": "4p3atw4_tFlu", - "outputId": "62ee9812-3819-4a9c-9e24-5687368ffcd8" - }, - "outputs": [], - "source": [ - "from datasets import load_dataset\n", - "\n", - "ds = load_dataset(\"HuggingFaceTB/smoltalk\", \"everyday-conversations\")\n", - "\n", - "\n", - "def process_dataset(sample):\n", - " # TODO: 🐢 Convert the sample into a chat format\n", - " # use the tokenizer's method to apply the chat template\n", - " return sample\n", - "\n", - "\n", - "ds = ds.map(process_dataset)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 381 - }, - "id": "81fQeazltFlu", - "outputId": "36cf7148-9881-4f13-d0ce-76c82c4ab219" - }, - "outputs": [ - { - "data": { - "text/html": [ - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "display(\n", - " HTML(\n", - " \"\"\"\n", - "\"\"\"\n", - " )\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true, - "id": "bWUSv7NMtFlu" - }, - "outputs": [], - "source": [ - "ds = load_dataset(\"openai/gsm8k\", \"main\")\n", - "\n", - "\n", - "def process_dataset(sample):\n", - " # TODO: 🐕 Convert the sample into a chat format\n", - "\n", - " # 1. create a message format with the role and content\n", - "\n", - " # 2. apply the chat template to the samples using the tokenizer's method\n", - "\n", - " return sample\n", - "\n", - "\n", - "ds = ds.map(process_dataset)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qlXCuRKotFlu" - }, - "source": [ - "## Conclusion\n", - "\n", - "This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.\n", - "\n", - "In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood." - ] - } - ], - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "display_name": ".venv", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.10" - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "0055b6b628934affaf88bc58a1572bb6": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_de04f344a8d4428e8ba1836a563d8aa1", - "max": 946449, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_03c09673186046d799d6f487d6623e6b", - "value": 946449 - } - }, - "00c9f5ca71b84df4b26acee72c97fefb": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "012aa94e3cf24e32833c6bbca23c52f7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "016d5e929f1240cea067372b2191d107": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "01e0f8a799ad479eb95eef3e5a09bd70": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_8fe2df9a14a0436c9124a856ac7419e4", - "IPY_MODEL_d108e029e743419989e30f64f0c82b90", - "IPY_MODEL_bfd11f21f197459b8f27ef364bc9b264" - ], - "layout": "IPY_MODEL_76a0341ebe9f4c3face32460d7023be9" - } - }, - "0206fb9662a349c1aa8a6d87ce01c388": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "03c09673186046d799d6f487d6623e6b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "0479fd3fc1ba476ab46f8c0a98f89468": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "04ae3f7b640c42f3a8eb1977cd1a585d": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "04d0a6f74af346f7bc696951949063c8": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "056b9ef5706843b19cd62fce75743afb": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_e2f0c39ce1c046e8acb150dfbfaf5aa8", - "placeholder": "​", - "style": "IPY_MODEL_7eb12d70d2b542a7b651c7680f590279", - "value": "README.md: 100%" - } - }, - "05fa0f6eb78b4c56b219b0e57521bd2e": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "0942430d36de4677b4c2fa771d7bcd2a": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "0bab42beb845475684e9e71dd1591e1d": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "0c336ea5c653434da49e2f0e949f83d0": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "0dc93d50a283472f9ca64fd0a4c6ff15": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "10a0f37020d44156a11e9750778892e0": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "13367fbb763747fa8de94cde40ffae32": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "1513792bad534a0c9c381a131395c519": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_76d306c21214412ab44e542d82e547aa", - "placeholder": "​", - "style": "IPY_MODEL_b9e41ef9e9c54fa7b71bc333604af74e", - "value": " 831/831 [00:00<00:00, 42.7kB/s]" - } - }, - "17023310de9b4c3ebd8cc03758d59ef9": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "1791220377d141ac9b307246177d0712": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "17b4d81e40564a53bb79be9fbef4918e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_ea1f9cb22abf4e7d9f6e76fc86c03387", - "max": 9251, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_00c9f5ca71b84df4b26acee72c97fefb", - "value": 9251 - } - }, - "182abc7ec4d944d9bb2ec1281c98b4c8": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "194e3fda3635466b998f96e3dc22746a": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "1cc682330b24431b8812c73041e987d0": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "269920491c134501873e0110367bc984": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "26b15fa18b1b4963a1ba76a76675e7ee": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "26ed0f1bae204d74a313d101d9355e90": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_7612cc9b8908471b90c9118151d6e447", - "placeholder": "​", - "style": "IPY_MODEL_b687aca79e6e470b96254c5e309d6d63", - "value": "generation_config.json: 100%" - } - }, - "2a18ce941b0f4cef8307988ef898b47f": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "31910159cf30463b8246ec47ffd8ab5b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "383db57f997140d482b82b123080837a": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "384d26051c04460e8870a3ffe9406c48": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "3b881514716c47308061fe85b810a6a4": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_26ed0f1bae204d74a313d101d9355e90", - "IPY_MODEL_4ff5af1784904bc9b85515105885e2d8", - "IPY_MODEL_b3c42d7e25d6494993029531adc3866d" - ], - "layout": "IPY_MODEL_6227b40396ea4024b3c8710c5e65601f" - } - }, - "3bc8f6339f4e4a3b961d810255c5573e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_04d0a6f74af346f7bc696951949063c8", - "placeholder": "​", - "style": "IPY_MODEL_2a18ce941b0f4cef8307988ef898b47f", - "value": "Generating train split: 100%" - } - }, - "3cc519fd92fe4b328943ec839115b63e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_e15fc503bb73476980cedb5f06b51ced", - "IPY_MODEL_d8c5dc8df3be4e65b2bbba020d29150f", - "IPY_MODEL_c0177c4ad18740d88acfc603ce4735f8" - ], - "layout": "IPY_MODEL_eb570fd159124e2cbd2df9335b3f9cd6" - } - }, - "3fa18e3b50104af796bd0887f556224a": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "41a27cf0a91246599d4d1b7dae7c7863": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "45675fb5f5c94f8cae575582f7ae41a7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_7eb91920e4384194a008902d6c4a09c7", - "placeholder": "​", - "style": "IPY_MODEL_b379da78cb34463aa5a72eedc3d176cd", - "value": " 704/704 [00:00<00:00, 36.5kB/s]" - } - }, - "471b481a3e5b4d439ab31fdc49fc99c7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "4780ad263ec04b1a97525d985e102049": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_194e3fda3635466b998f96e3dc22746a", - "max": 2260, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_e2ab3cb38b5a41f68d18ed5f0e6ae22c", - "value": 2260 - } - }, - "48724ba7ba4e4f00923445245640739f": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "488feef55878426bbf1c753c6d58735b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_f0b271bcac6c43a9aaddac54259bb514", - "placeholder": "​", - "style": "IPY_MODEL_0dc93d50a283472f9ca64fd0a4c6ff15", - "value": " 2260/2260 [00:00<00:00, 21556.99 examples/s]" - } - }, - "4bfa3103048a47989a09a0d90ac6b9bf": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "4e21a567d1f6461985727823b37166e1": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_fa330d4f0fb241aebd065f6ef4a6892c", - "max": 119, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_cfa1cc6eed8a4f7791a7959308456b6b", - "value": 119 - } - }, - "4ff5af1784904bc9b85515105885e2d8": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_3fa18e3b50104af796bd0887f556224a", - "max": 111, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_4bfa3103048a47989a09a0d90ac6b9bf", - "value": 111 - } - }, - "505f96bc0c7843bcb1498ba1c1ba5f06": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "50dbf8861ca94b0ba1f4a7e2f0d8aead": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_d17c62b889754b5d88cfced5b18ff7a7", - "placeholder": "​", - "style": "IPY_MODEL_990f706db474450ba0997d1dbcd53cb7", - "value": " 269M/269M [00:06<00:00, 43.2MB/s]" - } - }, - "5291041c86db4933816088c047d659d8": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "530fc4c2bf1244628af7dea3e4b35cdf": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "547151540399460fb9a946bbe67afbd9": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "547eeb64ffd34e509c0b8b8ba6d657e2": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_cbc312cb858b48a5a0f8dbcf60b7e684", - "max": 704, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_f70401b6dba74380b19bd1ef887b3bf7", - "value": 704 - } - }, - "55b14c03a41c495aacf8ac2d0f96ba0b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "560ba45d70ca431dadeb327d234c330a": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "58fb913274b54a60a832513c09608a2f": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "59d0997b85614384bbfebeee928340b6": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "5b7b09d983844f7893bdda411f9a0076": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_0206fb9662a349c1aa8a6d87ce01c388", - "placeholder": "​", - "style": "IPY_MODEL_881b6196dfa0446e8c55a2420e484b6b", - "value": " 2.10M/2.10M [00:00<00:00, 20.7MB/s]" - } - }, - "5de5dab3d92f4f41838a8f302d27f0c3": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "606e39d53ed64967a60337418c71c595": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "6227b40396ea4024b3c8710c5e65601f": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "635cc2881a1e4b8788bb26c356740e04": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "646484cf7a36444daebe1dfe4a0e4150": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "648c3c820b39493daf0cce5f57a55467": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "67fffe7d6f8c4963972b408529e05532": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_26b15fa18b1b4963a1ba76a76675e7ee", - "placeholder": "​", - "style": "IPY_MODEL_db09ab1f79db4f3a8de77f0348eca0f7", - "value": "train-00000-of-00001.parquet: 100%" - } - }, - "6934c6d1cbac44dbb08f3fffe3056edb": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "69f38fecf8ad403898634cfdfadf8925": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "6b72a856e5bd4812a5e0dd0c3bfb8455": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_a54ce365be104d27aaa15cf8c63b5ebe", - "placeholder": "​", - "style": "IPY_MODEL_1791220377d141ac9b307246177d0712", - "value": "Generating test split: 100%" - } - }, - "6ceb292f2b8544f2a9a005d16d3e8978": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "70f0eaed6ef14c2db8aecb592edeb1ad": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "72220420f9d340eabec13a01caebc92c": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "745fb1db425e44e5b3a23b36ae7675d1": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "7468a9bc8bda44e5b44574c64fdc6803": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_72220420f9d340eabec13a01caebc92c", - "placeholder": "​", - "style": "IPY_MODEL_55b14c03a41c495aacf8ac2d0f96ba0b", - "value": " 119/119 [00:00<00:00, 2302.28 examples/s]" - } - }, - "7557cd24ba9b4aa3955866d59db94519": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "7612cc9b8908471b90c9118151d6e447": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "76a0341ebe9f4c3face32460d7023be9": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "76bbe8c2beba4c0594085d32a68d2ee7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_05fa0f6eb78b4c56b219b0e57521bd2e", - "max": 2260, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_012aa94e3cf24e32833c6bbca23c52f7", - "value": 2260 - } - }, - "76c1a1cdc9054bbe90d0d3b662cf0ed1": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "76d306c21214412ab44e542d82e547aa": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "76febcd912404a58add3a39f80a8218d": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_0bab42beb845475684e9e71dd1591e1d", - "max": 3658, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_89ecd1b28ab64c90afe3b9736fd48306", - "value": 3658 - } - }, - "77d3d81687e6417ab988b04984fc68f4": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_17023310de9b4c3ebd8cc03758d59ef9", - "placeholder": "​", - "style": "IPY_MODEL_f3e23f781bce4429954d76bfea97aff4", - "value": "special_tokens_map.json: 100%" - } - }, - "77f6c27c3c854138b4aa9789637141a1": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "7a0c705334694da6b750104b28db6dba": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "7b20c7c8f6be40c6815b8531ecb9c936": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_d32017fa83aa44f6b2e3443a602654be", - "placeholder": "​", - "style": "IPY_MODEL_ff8debfb713f4b88be6b9b3bf33bfca2", - "value": "tokenizer.json: 100%" - } - }, - "7eb12d70d2b542a7b651c7680f590279": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "7eb91920e4384194a008902d6c4a09c7": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "8055588a1fa940239c801ef66f3ecf3b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_9d1c06ac6b774d82adca58773f389161", - "max": 119, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_31910159cf30463b8246ec47ffd8ab5b", - "value": 119 - } - }, - "84f393468aa74baa903243d238b2d387": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "85de66e1ee3140cf85eadebe5fea1e9f": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "881b6196dfa0446e8c55a2420e484b6b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "89ecd1b28ab64c90afe3b9736fd48306": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "8def25e6389f4e6192b517b6e80aa05e": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "8e8a0e89a50646c897e546c4077db79e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "8fe2df9a14a0436c9124a856ac7419e4": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_da1a999fb5af4eae9f6a9d1086cbb4cf", - "placeholder": "​", - "style": "IPY_MODEL_77f6c27c3c854138b4aa9789637141a1", - "value": "vocab.json: 100%" - } - }, - "951f60cddcb84dfdbbdf2058369f0541": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_505f96bc0c7843bcb1498ba1c1ba5f06", - "placeholder": "​", - "style": "IPY_MODEL_635cc2881a1e4b8788bb26c356740e04", - "value": " 9.25k/9.25k [00:00<00:00, 428kB/s]" - } - }, - "96c2aae9198441569362135ad4bcbc98": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "990f706db474450ba0997d1dbcd53cb7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "9bea2a23db644ad19b708d10e35d54ee": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_c9747e7a810f413ba1ea108307e3ad1d", - "placeholder": "​", - "style": "IPY_MODEL_d0ea49d1d90f4d34bf2ae70efa96946e", - "value": "test-00000-of-00001.parquet: 100%" - } - }, - "9d1c06ac6b774d82adca58773f389161": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "a026a32dd6d646bea82c1ebb06147d89": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "a13a8f8b702e44ed88c7d358a0a8b4b4": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "a54ce365be104d27aaa15cf8c63b5ebe": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "a6ee323c13904525a99c6f092ba96e18": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_67fffe7d6f8c4963972b408529e05532", - "IPY_MODEL_0055b6b628934affaf88bc58a1572bb6", - "IPY_MODEL_aafbbb9fc5164fa3a88193bfd33d2f79" - ], - "layout": "IPY_MODEL_606e39d53ed64967a60337418c71c595" - } - }, - "aa2d32cb76ba47ebaa5ea391efbf58a7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_7b20c7c8f6be40c6815b8531ecb9c936", - "IPY_MODEL_e90b58981bd34d0e8f975fc1a9658c4c", - "IPY_MODEL_5b7b09d983844f7893bdda411f9a0076" - ], - "layout": "IPY_MODEL_70f0eaed6ef14c2db8aecb592edeb1ad" - } - }, - "aafbbb9fc5164fa3a88193bfd33d2f79": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_1cc682330b24431b8812c73041e987d0", - "placeholder": "​", - "style": "IPY_MODEL_dafe748a452148038779f6a62a22a4ec", - "value": " 946k/946k [00:00<00:00, 28.7MB/s]" - } - }, - "add90ed3746d4293a1b71198137a892c": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_384d26051c04460e8870a3ffe9406c48", - "placeholder": "​", - "style": "IPY_MODEL_8e8a0e89a50646c897e546c4077db79e", - "value": " 52.6k/52.6k [00:00<00:00, 2.34MB/s]" - } - }, - "addad1c100024c44a0959978153da9a8": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_9bea2a23db644ad19b708d10e35d54ee", - "IPY_MODEL_d1174b127571420593971166fbb1966b", - "IPY_MODEL_add90ed3746d4293a1b71198137a892c" - ], - "layout": "IPY_MODEL_8def25e6389f4e6192b517b6e80aa05e" - } - }, - "ae2690497e024095adb3879643cffd33": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_f600aa1fe4094133888ec9a2504a60eb", - "IPY_MODEL_efe9a9fcebfe441b80075fbfe9c32674", - "IPY_MODEL_50dbf8861ca94b0ba1f4a7e2f0d8aead" - ], - "layout": "IPY_MODEL_547151540399460fb9a946bbe67afbd9" - } - }, - "b1fcf477db664ccdade4096fb79de327": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "b31de9bcf83e4070be09c7d663361232": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "b379da78cb34463aa5a72eedc3d176cd": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "b3c42d7e25d6494993029531adc3866d": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_85de66e1ee3140cf85eadebe5fea1e9f", - "placeholder": "​", - "style": "IPY_MODEL_b31de9bcf83e4070be09c7d663361232", - "value": " 111/111 [00:00<00:00, 3.57kB/s]" - } - }, - "b50c9c4433854cf7a6b2593e946b7faa": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "b687aca79e6e470b96254c5e309d6d63": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "b922b90106414644bc0e933f28dea1bf": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_e0a40f83ae2e4ab29376a1d48b53aa6e", - "IPY_MODEL_547eeb64ffd34e509c0b8b8ba6d657e2", - "IPY_MODEL_45675fb5f5c94f8cae575582f7ae41a7" - ], - "layout": "IPY_MODEL_016d5e929f1240cea067372b2191d107" - } - }, - "b9e41ef9e9c54fa7b71bc333604af74e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "bde95b39561145548fc81fb4cc94a1bf": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "be4e145938054f13a510fe4d04a7a60d": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "bfd11f21f197459b8f27ef364bc9b264": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_745fb1db425e44e5b3a23b36ae7675d1", - "placeholder": "​", - "style": "IPY_MODEL_bde95b39561145548fc81fb4cc94a1bf", - "value": " 801k/801k [00:00<00:00, 5.92MB/s]" - } - }, - "c0177c4ad18740d88acfc603ce4735f8": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_ec15d99b3a604405a2b4707931d4bf44", - "placeholder": "​", - "style": "IPY_MODEL_e7f5d507d9564941bb7db742b4bf01c7", - "value": " 466k/466k [00:00<00:00, 3.56MB/s]" - } - }, - "c19f60d4028045399c62004027eaafd9": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_13367fbb763747fa8de94cde40ffae32", - "placeholder": "​", - "style": "IPY_MODEL_b1fcf477db664ccdade4096fb79de327", - "value": "Map: 100%" - } - }, - "c2d74a42fb574b8892d0a288fd92f0a6": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_056b9ef5706843b19cd62fce75743afb", - "IPY_MODEL_17b4d81e40564a53bb79be9fbef4918e", - "IPY_MODEL_951f60cddcb84dfdbbdf2058369f0541" - ], - "layout": "IPY_MODEL_646484cf7a36444daebe1dfe4a0e4150" - } - }, - "c9747e7a810f413ba1ea108307e3ad1d": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "c9836c952b07472880649b82e2347e8d": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_76c1a1cdc9054bbe90d0d3b662cf0ed1", - "placeholder": "​", - "style": "IPY_MODEL_e453f1672772400a851735ba64f42c8b", - "value": " 2260/2260 [00:00<00:00, 10845.53 examples/s]" - } - }, - "cb838c5bed994a9a8e6fcf5c98b76d17": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_182abc7ec4d944d9bb2ec1281c98b4c8", - "placeholder": "​", - "style": "IPY_MODEL_6934c6d1cbac44dbb08f3fffe3056edb", - "value": "Map: 100%" - } - }, - "cbc312cb858b48a5a0f8dbcf60b7e684": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "cc608dfb880c49d4bc5acf2d691b8ec6": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_cb838c5bed994a9a8e6fcf5c98b76d17", - "IPY_MODEL_76bbe8c2beba4c0594085d32a68d2ee7", - "IPY_MODEL_c9836c952b07472880649b82e2347e8d" - ], - "layout": "IPY_MODEL_383db57f997140d482b82b123080837a" - } - }, - "cfa1cc6eed8a4f7791a7959308456b6b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "d0ea49d1d90f4d34bf2ae70efa96946e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "d108e029e743419989e30f64f0c82b90": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_6ceb292f2b8544f2a9a005d16d3e8978", - "max": 800662, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_41a27cf0a91246599d4d1b7dae7c7863", - "value": 800662 - } - }, - "d1174b127571420593971166fbb1966b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_59d0997b85614384bbfebeee928340b6", - "max": 52603, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_269920491c134501873e0110367bc984", - "value": 52603 - } - }, - "d1358f6b16644cb3a2328ca639a4a77a": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_c19f60d4028045399c62004027eaafd9", - "IPY_MODEL_8055588a1fa940239c801ef66f3ecf3b", - "IPY_MODEL_7468a9bc8bda44e5b44574c64fdc6803" - ], - "layout": "IPY_MODEL_a13a8f8b702e44ed88c7d358a0a8b4b4" - } - }, - "d17c62b889754b5d88cfced5b18ff7a7": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "d32017fa83aa44f6b2e3443a602654be": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "d43410dfcc8c4bebb8672f10ed2aeb66": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "d54fb2da9f1f4a89ae962b8816314f43": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_77d3d81687e6417ab988b04984fc68f4", - "IPY_MODEL_fbce0a69847e4099a55d1e39d4118c91", - "IPY_MODEL_1513792bad534a0c9c381a131395c519" - ], - "layout": "IPY_MODEL_69f38fecf8ad403898634cfdfadf8925" - } - }, - "d64d50101891491f96ff80162dc6d26c": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_d65ec0f0dc0b44e0869c6159e6e82ad6", - "IPY_MODEL_76febcd912404a58add3a39f80a8218d", - "IPY_MODEL_f4ea276bdc0d4da2a04b46e3f1ed95b5" - ], - "layout": "IPY_MODEL_0942430d36de4677b4c2fa771d7bcd2a" - } - }, - "d65ec0f0dc0b44e0869c6159e6e82ad6": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_10a0f37020d44156a11e9750778892e0", - "placeholder": "​", - "style": "IPY_MODEL_58fb913274b54a60a832513c09608a2f", - "value": "tokenizer_config.json: 100%" - } - }, - "d8c5dc8df3be4e65b2bbba020d29150f": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_7a0c705334694da6b750104b28db6dba", - "max": 466391, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_0c336ea5c653434da49e2f0e949f83d0", - "value": 466391 - } - }, - "da1a999fb5af4eae9f6a9d1086cbb4cf": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "dafe748a452148038779f6a62a22a4ec": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "db09ab1f79db4f3a8de77f0348eca0f7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "db3bd55d779947028f36a8b24a2621b6": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "dd1a50d4497144388a1809b78bb38f58": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_6b72a856e5bd4812a5e0dd0c3bfb8455", - "IPY_MODEL_4e21a567d1f6461985727823b37166e1", - "IPY_MODEL_ec1efb7598fd496bb170673ae1b8a1df" - ], - "layout": "IPY_MODEL_84f393468aa74baa903243d238b2d387" - } - }, - "de04f344a8d4428e8ba1836a563d8aa1": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e0a40f83ae2e4ab29376a1d48b53aa6e": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_a026a32dd6d646bea82c1ebb06147d89", - "placeholder": "​", - "style": "IPY_MODEL_0479fd3fc1ba476ab46f8c0a98f89468", - "value": "config.json: 100%" - } - }, - "e15fc503bb73476980cedb5f06b51ced": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_5de5dab3d92f4f41838a8f302d27f0c3", - "placeholder": "​", - "style": "IPY_MODEL_471b481a3e5b4d439ab31fdc49fc99c7", - "value": "merges.txt: 100%" - } - }, - "e2ab3cb38b5a41f68d18ed5f0e6ae22c": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "e2f0c39ce1c046e8acb150dfbfaf5aa8": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "e453f1672772400a851735ba64f42c8b": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "e7f5d507d9564941bb7db742b4bf01c7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "e90b58981bd34d0e8f975fc1a9658c4c": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_ed577dea3ac54884a637ad775b42bc68", - "max": 2104556, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_d43410dfcc8c4bebb8672f10ed2aeb66", - "value": 2104556 - } - }, - "ea1f9cb22abf4e7d9f6e76fc86c03387": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "eb570fd159124e2cbd2df9335b3f9cd6": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "ec15d99b3a604405a2b4707931d4bf44": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "ec1efb7598fd496bb170673ae1b8a1df": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_b50c9c4433854cf7a6b2593e946b7faa", - "placeholder": "​", - "style": "IPY_MODEL_7557cd24ba9b4aa3955866d59db94519", - "value": " 119/119 [00:00<00:00, 3547.77 examples/s]" - } - }, - "ed577dea3ac54884a637ad775b42bc68": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "efe9a9fcebfe441b80075fbfe9c32674": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_04ae3f7b640c42f3a8eb1977cd1a585d", - "max": 269060552, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_db3bd55d779947028f36a8b24a2621b6", - "value": 269060552 - } - }, - "f0b271bcac6c43a9aaddac54259bb514": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "f3e23f781bce4429954d76bfea97aff4": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - }, - "f4ea276bdc0d4da2a04b46e3f1ed95b5": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_be4e145938054f13a510fe4d04a7a60d", - "placeholder": "​", - "style": "IPY_MODEL_648c3c820b39493daf0cce5f57a55467", - "value": " 3.66k/3.66k [00:00<00:00, 197kB/s]" - } - }, - "f600aa1fe4094133888ec9a2504a60eb": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HTMLModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HTMLModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HTMLView", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_5291041c86db4933816088c047d659d8", - "placeholder": "​", - "style": "IPY_MODEL_48724ba7ba4e4f00923445245640739f", - "value": "model.safetensors: 100%" - } - }, - "f70401b6dba74380b19bd1ef887b3bf7": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "ProgressStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "ProgressStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "bar_color": null, - "description_width": "" - } - }, - "fa330d4f0fb241aebd065f6ef4a6892c": { - "model_module": "@jupyter-widgets/base", - "model_module_version": "1.2.0", - "model_name": "LayoutModel", - "state": { - "_model_module": "@jupyter-widgets/base", - "_model_module_version": "1.2.0", - "_model_name": "LayoutModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "LayoutView", - "align_content": null, - "align_items": null, - "align_self": null, - "border": null, - "bottom": null, - "display": null, - "flex": null, - "flex_flow": null, - "grid_area": null, - "grid_auto_columns": null, - "grid_auto_flow": null, - "grid_auto_rows": null, - "grid_column": null, - "grid_gap": null, - "grid_row": null, - "grid_template_areas": null, - "grid_template_columns": null, - "grid_template_rows": null, - "height": null, - "justify_content": null, - "justify_items": null, - "left": null, - "margin": null, - "max_height": null, - "max_width": null, - "min_height": null, - "min_width": null, - "object_fit": null, - "object_position": null, - "order": null, - "overflow": null, - "overflow_x": null, - "overflow_y": null, - "padding": null, - "right": null, - "top": null, - "visibility": null, - "width": null - } - }, - "fbce0a69847e4099a55d1e39d4118c91": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "FloatProgressModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "FloatProgressModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "ProgressView", - "bar_style": "success", - "description": "", - "description_tooltip": null, - "layout": "IPY_MODEL_530fc4c2bf1244628af7dea3e4b35cdf", - "max": 831, - "min": 0, - "orientation": "horizontal", - "style": "IPY_MODEL_96c2aae9198441569362135ad4bcbc98", - "value": 831 - } - }, - "ff60308921f9432683acbcd6d29fb78f": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "HBoxModel", - "state": { - "_dom_classes": [], - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "HBoxModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/controls", - "_view_module_version": "1.5.0", - "_view_name": "HBoxView", - "box_style": "", - "children": [ - "IPY_MODEL_3bc8f6339f4e4a3b961d810255c5573e", - "IPY_MODEL_4780ad263ec04b1a97525d985e102049", - "IPY_MODEL_488feef55878426bbf1c753c6d58735b" - ], - "layout": "IPY_MODEL_560ba45d70ca431dadeb327d234c330a" - } - }, - "ff8debfb713f4b88be6b9b3bf33bfca2": { - "model_module": "@jupyter-widgets/controls", - "model_module_version": "1.5.0", - "model_name": "DescriptionStyleModel", - "state": { - "_model_module": "@jupyter-widgets/controls", - "_model_module_version": "1.5.0", - "_model_name": "DescriptionStyleModel", - "_view_count": null, - "_view_module": "@jupyter-widgets/base", - "_view_module_version": "1.2.0", - "_view_name": "StyleView", - "description_width": "" - } - } - } - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/chapters/en/chapter11/sft_finetuning_example.ipynb b/chapters/en/chapter11/sft_finetuning_example.ipynb deleted file mode 100644 index d18479a91..000000000 --- a/chapters/en/chapter11/sft_finetuning_example.ipynb +++ /dev/null @@ -1,273 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Supervised Fine-Tuning with SFTTrainer\n", - "\n", - "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n", - "\n", - "
\n", - "

Exercise: Fine-Tuning SmolLM2 with SFTTrainer

\n", - "

Take a dataset from the Hugging Face hub and finetune a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the `HuggingFaceTB/smoltalk` dataset

\n", - "

🐕 Try out the `bigcode/the-stack-smol` dataset and finetune a code generation model on a specific subset `data/python`.

\n", - "

🦁 Select a dataset that relates to a real world use case your interested in

\n", - "
" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Install the requirements in Google Colab\n", - "# !pip install transformers datasets trl huggingface_hub\n", - "\n", - "# Authenticate to Hugging Face\n", - "\n", - "from huggingface_hub import login\n", - "login()\n", - "\n", - "# for convenience you can create an environment variable containing your hub token as HF_TOKEN" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Import necessary libraries\n", - "from transformers import AutoModelForCausalLM, AutoTokenizer\n", - "from datasets import load_dataset\n", - "from trl import SFTConfig, SFTTrainer, setup_chat_format\n", - "import torch\n", - "\n", - "device = (\n", - " \"cuda\"\n", - " if torch.cuda.is_available()\n", - " else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n", - ")\n", - "\n", - "# Load the model and tokenizer\n", - "model_name = \"HuggingFaceTB/SmolLM2-135M\"\n", - "model = AutoModelForCausalLM.from_pretrained(\n", - " pretrained_model_name_or_path=model_name\n", - ").to(device)\n", - "tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)\n", - "\n", - "# Set up the chat format\n", - "model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)\n", - "\n", - "# Set our name for the finetune to be saved &/ uploaded to\n", - "finetune_name = \"SmolLM2-FT-MyDataset\"\n", - "finetune_tags = [\"smol-course\", \"module_1\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Generate with the base model\n", - "\n", - "Here we will try out the base model which does not have a chat template. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Let's test the base model before training\n", - "prompt = \"Write a haiku about programming\"\n", - "\n", - "# Format with template\n", - "messages = [{\"role\": \"user\", \"content\": prompt}]\n", - "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n", - "\n", - "# Generate response\n", - "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n", - "outputs = model.generate(**inputs, max_new_tokens=100)\n", - "print(\"Before training:\")\n", - "print(tokenizer.decode(outputs[0], skip_special_tokens=True))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Dataset Preparation\n", - "\n", - "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n", - "\n", - "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Load a sample dataset\n", - "from datasets import load_dataset\n", - "\n", - "# TODO: define your dataset and config using the path and name parameters\n", - "ds = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. Refer to the [module](../chat_templates.md)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuring the SFTTrainer\n", - "\n", - "The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Configure the SFTTrainer\n", - "sft_config = SFTConfig(\n", - " output_dir=\"./sft_output\",\n", - " max_steps=1000, # Adjust based on dataset size and desired training duration\n", - " per_device_train_batch_size=4, # Set according to your GPU memory capacity\n", - " learning_rate=5e-5, # Common starting point for fine-tuning\n", - " logging_steps=10, # Frequency of logging training metrics\n", - " save_steps=100, # Frequency of saving model checkpoints\n", - " evaluation_strategy=\"steps\", # Evaluate the model at regular intervals\n", - " eval_steps=50, # Frequency of evaluation\n", - " use_mps_device=(\n", - " True if device == \"mps\" else False\n", - " ), # Use MPS for mixed precision training\n", - " hub_model_id=finetune_name, # Set a unique name for your model\n", - ")\n", - "\n", - "# Initialize the SFTTrainer\n", - "trainer = SFTTrainer(\n", - " model=model,\n", - " args=sft_config,\n", - " train_dataset=ds[\"train\"],\n", - " tokenizer=tokenizer,\n", - " eval_dataset=ds[\"test\"],\n", - ")\n", - "\n", - "# TODO: 🦁 🐕 align the SFTTrainer params with your chosen dataset. For example, if you are using the `bigcode/the-stack-smol` dataset, you will need to choose the `content` column`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Model\n", - "\n", - "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Train the model\n", - "trainer.train()\n", - "\n", - "# Save the model\n", - "trainer.save_model(f\"./{finetune_name}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "trainer.push_to_hub(tags=finetune_tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "

Bonus Exercise: Generate with fine-tuned model

\n", - "

🐕 Use the fine-tuned to model generate a response, just like with the base example..

\n", - "
" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Test the fine-tuned model on the same prompt\n", - "\n", - "# Let's test the base model before training\n", - "prompt = \"Write a haiku about programming\"\n", - "\n", - "# Format with template\n", - "messages = [{\"role\": \"user\", \"content\": prompt}]\n", - "formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)\n", - "\n", - "# Generate response\n", - "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n", - "\n", - "# TODO: use the fine-tuned to model generate a response, just like with the base example." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 💐 You're done!\n", - "\n", - "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", - "\n", - "- Try this notebook on a harder difficulty\n", - "- Review a colleagues PR\n", - "- Improve the course material via an Issue or PR." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "py310", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.15" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/chapters/en/chapter11/supervised_fine_tuning.md b/chapters/en/chapter11/supervised_fine_tuning.md deleted file mode 100644 index dc236962e..000000000 --- a/chapters/en/chapter11/supervised_fine_tuning.md +++ /dev/null @@ -1,41 +0,0 @@ -# Supervised Fine-Tuning - -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on carefully curated datasets with human-validated examples. - -## Understanding Supervised Fine-Tuning - -At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. - -SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. - -## When to Use Supervised Fine-Tuning - -The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. - -For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise. - -## The Fine-Tuning Process - -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. - -First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. - -Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. In [module 4](../4_evaluation), we'll cover how to evaluate your model. - -## The Role of SFT in Preference Alignment - -SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations. - -## Supervised Fine-Tuning With Transformer Reinforcement Learning - -A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models models using reinforcement learning (RL). - -Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo. - -# Next Steps - -Try out the following tutorials to get hands on experience with SFT using TRL: - -⏭️ [Chat Templates Tutorial](./notebooks/chat_templates_example.ipynb) - -⏭️ [Supervised Fine-Tuning Tutorial](./notebooks/sft_finetuning_example.ipynb) From beec8b5be758e5d6702aeb03f23345119815ecf7 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Thu, 30 Jan 2025 15:30:15 +0100 Subject: [PATCH 03/30] review text and read through --- chapters/en/chapter11/1.mdx | 1 - chapters/en/chapter11/5.mdx | 15 +++++++++++---- chapters/en/chapter11/6.mdx | 8 ++------ chapters/en/chapter11/7.mdx | 2 +- chapters/en/chapter11/8.mdx | 4 +++- chapters/en/chapter11/9.mdx | 5 ++++- 6 files changed, 21 insertions(+), 14 deletions(-) diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx index 5acfb97fd..b5e75efc6 100644 --- a/chapters/en/chapter11/1.mdx +++ b/chapters/en/chapter11/1.mdx @@ -14,7 +14,6 @@ Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained lang Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge. - ## 4️⃣ Evaluation Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset. diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index d19d3f5c5..b63c2c38d 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -1,10 +1,10 @@ # Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Let's work through the process step by step. +In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. ## Dataset Preparation -Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. +The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. From 564d9ecfcd7dc3670258572289bb077938cca5c4 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 5 Feb 2025 20:24:06 +0100 Subject: [PATCH 06/30] add toc --- chapters/en/_toctree.yml | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml index 12b6c3726..cefeda057 100644 --- a/chapters/en/_toctree.yml +++ b/chapters/en/_toctree.yml @@ -210,6 +210,34 @@ title: End-of-chapter quiz quiz: 10 +- title: 11. Supervised fine-tuning + sections: + - local: chapter11/1 + title: Introduction + - local: chapter11/2 + title: Chat templates + - local: chapter11/3 + title: Implementing Chat Templates with Transformers + - local: chapter11/4 + title: Introduction to Supervised Fine-Tuning + - local: chapter11/5 + title: Introduction to SFTTrainer in TRL + - local: chapter11/6 + title: Fine-Tuning a Model with SFTTrainer + - local: chapter11/7 + title: LoRA (Low-Rank Adaptation) + - local: chapter11/8 + title: Merging LoRA Adapters + - local: chapter11/9 + title: Evaluating Fine-Tuned Models + - local: chapter11/10 + title: Implementing Evaluation + - local: chapter11/11 + title: Conclusion + - local: chapter11/12 + title: Exam Time! + quiz: 11 + - title: Course Events sections: - local: events/1 From edcf0490dd0a9bbcb053ffac389327ec66b78de0 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 5 Feb 2025 20:28:38 +0100 Subject: [PATCH 07/30] format code blocks --- chapters/en/chapter11/2.mdx | 17 +++++++++++++---- chapters/en/chapter11/3.mdx | 4 +--- chapters/en/chapter11/7.mdx | 1 - chapters/en/chapter11/8.mdx | 8 ++------ 4 files changed, 16 insertions(+), 14 deletions(-) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index 87a82f026..d4582348a 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -28,9 +28,15 @@ The `transformers` library will take care of chat templates for you in relation ```python messages = [ - {"role": "system", "content": "You are a helpful assistant focused on technical topics."}, + { + "role": "system", + "content": "You are a helpful assistant focused on technical topics.", + }, {"role": "user", "content": "Can you explain what a chat template is?"}, - {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."} + { + "role": "assistant", + "content": "A chat template structures conversations between users and AI models...", + }, ] ``` @@ -43,7 +49,7 @@ System messages set the foundation for how the model should behave. They act as ```python system_message = { "role": "system", - "content": "You are a professional customer service agent. Always be polite, clear, and helpful." + "content": "You are a professional customer service agent. Always be polite, clear, and helpful.", } ``` @@ -54,7 +60,10 @@ Chat templates can maintain context through conversation history, storing previo ```python conversation = [ {"role": "user", "content": "I need help with my order"}, - {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"}, + { + "role": "assistant", + "content": "I'd be happy to help. Could you provide your order number?", + }, {"role": "user", "content": "It's ORDER-123"}, ] ``` diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index cf8cc90b1..748bd74a1 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -20,9 +20,7 @@ messages = [ # Apply the chat template formatted_chat = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True + messages, tokenize=False, add_generation_prompt=True ) ``` diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx index 820d689a6..561acf61e 100644 --- a/chapters/en/chapter11/7.mdx +++ b/chapters/en/chapter11/7.mdx @@ -103,7 +103,6 @@ trainer = SFTTrainer( peft_config=lora_config, # LoRA configuration max_seq_length=max_seq_length, # Maximum sequence length tokenizer=tokenizer, - ) ``` diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx index 0f2b0cef5..323a3b646 100644 --- a/chapters/en/chapter11/8.mdx +++ b/chapters/en/chapter11/8.mdx @@ -17,16 +17,12 @@ from peft import PeftModel # 1. Load the base model base_model = AutoModelForCausalLM.from_pretrained( - "base_model_name", - torch_dtype=torch.float16, - device_map="auto" + "base_model_name", torch_dtype=torch.float16, device_map="auto" ) # 2. Load the PEFT model with adapter peft_model = PeftModel.from_pretrained( - base_model, - "path/to/adapter", - torch_dtype=torch.float16 + base_model, "path/to/adapter", torch_dtype=torch.float16 ) # 3. Merge adapter weights with base model From 267c1719fb4a6de87de1e0a29e2c2b3b358590c7 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 5 Feb 2025 21:39:59 +0100 Subject: [PATCH 08/30] combine pages together and add extra guidance --- chapters/en/_toctree.yml | 2 - chapters/en/chapter11/10.mdx | 56 ------ chapters/en/chapter11/11.mdx | 13 -- chapters/en/chapter11/12.mdx | 16 -- chapters/en/chapter11/2.mdx | 216 +++++++++++++++++++- chapters/en/chapter11/3.mdx | 212 ++++++++++++-------- chapters/en/chapter11/4.mdx | 379 +++++++++++++++++++++++------------ chapters/en/chapter11/5.mdx | 191 ++++++++++++------ chapters/en/chapter11/6.mdx | 320 ++++++++++++++++++++--------- chapters/en/chapter11/7.mdx | 120 +---------- chapters/en/chapter11/8.mdx | 53 ++--- chapters/en/chapter11/9.mdx | 188 ----------------- 12 files changed, 967 insertions(+), 799 deletions(-) delete mode 100644 chapters/en/chapter11/10.mdx delete mode 100644 chapters/en/chapter11/11.mdx delete mode 100644 chapters/en/chapter11/12.mdx delete mode 100644 chapters/en/chapter11/9.mdx diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml index cefeda057..8cd568bc1 100644 --- a/chapters/en/_toctree.yml +++ b/chapters/en/_toctree.yml @@ -216,8 +216,6 @@ title: Introduction - local: chapter11/2 title: Chat templates - - local: chapter11/3 - title: Implementing Chat Templates with Transformers - local: chapter11/4 title: Introduction to Supervised Fine-Tuning - local: chapter11/5 diff --git a/chapters/en/chapter11/10.mdx b/chapters/en/chapter11/10.mdx deleted file mode 100644 index bd307e7e9..000000000 --- a/chapters/en/chapter11/10.mdx +++ /dev/null @@ -1,56 +0,0 @@ -# Implementing Evaluation - -In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. - -LightEval tasks are defined using a specific format: - -``` -{suite}|{task}|{num_few_shot}|{auto_reduce} -``` - -| Parameter | Description | -|-----------|-------------| -| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') | -| `task` | Specific task within the suite (e.g., 'abstract_algebra') | -| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) | -| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) | - -Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. - -## Example Evaluation Pipeline - -Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on set of sub tasks that relate to the domain of medicine. - -Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend: - -```bash -lighteval vllm \ - "pretrained=your-model-name" \ - "mmlu|anatomy|0|0" \ - "mmlu|high_school_biology|0|0" \ - "mmlu|high_school_chemistry|0|0" \ - "mmlu|professional_medicine|0|0" \ - --max_samples 40 \ - --batch_size 1 \ - --output_path "./results" \ - --save_generations true -``` - -Results are displayed in a tabular format showing: - -``` -| Task |Version|Metric|Value | |Stderr| -|----------------------------------------|------:|------|-----:|---|-----:| -|all | |acc |0.3333|± |0.1169| -|leaderboard:mmlu:_average:5 | |acc |0.3400|± |0.1121| -|leaderboard:mmlu:anatomy:5 | 0|acc |0.4500|± |0.1141| -|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| -``` - -Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information. - - - -✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval. - - \ No newline at end of file diff --git a/chapters/en/chapter11/11.mdx b/chapters/en/chapter11/11.mdx deleted file mode 100644 index 093de47d6..000000000 --- a/chapters/en/chapter11/11.mdx +++ /dev/null @@ -1,13 +0,0 @@ -# Conclusion - -In this chapter, we explored the essential components of fine-tuning language models: - -1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting. - -2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge. - -3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance. - -4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks. - -These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation. diff --git a/chapters/en/chapter11/12.mdx b/chapters/en/chapter11/12.mdx deleted file mode 100644 index 690547c5f..000000000 --- a/chapters/en/chapter11/12.mdx +++ /dev/null @@ -1,16 +0,0 @@ -# Exam Time! - -It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter. - -To take the quiz, you will need to follow these steps: - -1. Sign in to your Hugging Face account. -2. Answer the questions in the quiz. -3. Submit your answers. - - diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index d4582348a..5b59f4b11 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -1,3 +1,9 @@ + + # Chat Templates Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns. @@ -10,9 +16,91 @@ To make a base model behave like an instruct model, we need to format our prompt It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template. +## Common Chat Template Formats + +Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models: + +We'll use the following conversation structure for all examples: + +```python +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Hello!"}, + {"role": "assistant", "content": "Hi! How can I help you today?"}, + {"role": "user", "content": "What's the weather?"} +] +``` + +This is using the `mistral` template format: + +```sh +[INST] You are a helpful assistant. [/INST] +Hi! How can I help you today? +[INST] Hello! [/INST] +``` + +This is the chat template for a Qwen 2 model: + +```sh +<|im_start|>system +You are a helpful assistant.<|im_end|> +<|im_start|>user +Hello!<|im_end|> +<|im_start|>assistant +Hi! How can I help you today?<|im_end|> +<|im_start|>user +What's the weather?<|im_end|> +<|im_start|>assistant +``` + +Key differences between these formats include: +1. **System Message Handling**: + - Llama 2 wraps system messages in `<>` tags + - Llama 3 uses `<|system|>` tags with `` endings + - Mistral includes system message in the first instruction + - Qwen uses explicit `system` role with `<|im_start|>` tags + - ChatGPT uses `SYSTEM:` prefix + +2. **Message Boundaries**: + - Llama 2 uses `[INST]` and `[/INST]` tags + - Llama 3 uses role-specific tags (`<|system|>`, `<|user|>`, `<|assistant|>`) with `` endings + - Mistral uses `[INST]` and `[/INST]` with `` and `` + - Qwen uses role-specific start/end tokens + +3. **Special Tokens**: + - Llama 2 uses `` and `` for conversation boundaries + - Llama 3 uses `` to end each message + - Mistral uses `` and `` for turn boundaries + - Qwen uses role-specific start/end tokens + +The transformers library handles these differences through model-specific chat templates. When you load a tokenizer, it automatically uses the correct template for that model: + +```python +from transformers import AutoTokenizer + +# These will use different templates automatically +llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") +mistral_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") +qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat") + +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "Hello!"} +] + +# Each will format according to its model's template +llama_chat = llama_tokenizer.apply_chat_template(messages, tokenize=False) +mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False) +qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False) +``` + ## Understanding Chat Templates -At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template: +At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. + +### Basic Chat Template Example + +Here's a basic example of a chat template: ```sh <|im_start|>user @@ -24,23 +112,116 @@ Can I ask a question?<|im_end|> <|im_start|>assistant ``` -The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation: +### Implementation with Transformers + +The transformers library provides built-in support for chat templates through the `apply_chat_template()` method: + +```python +from transformers import AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") + +messages = [ + {"role": "system", "content": "You are a helpful coding assistant."}, + {"role": "user", "content": "Write a Python function to sort a list"}, +] + +# Apply the chat template +formatted_chat = tokenizer.apply_chat_template( + messages, tokenize=False, add_generation_prompt=True +) +``` + +This will return a formatted string that looks like: + +```sh +<|im_start|>system +You are a helpful coding assistant.<|im_end|> +<|im_start|>user +Write a Python function to sort a list<|im_end|> +``` + +### Advanced Chat Templates + +Chat templates can handle more complex scenarios, including: + +1. **Tool Use**: When models need to interact with external tools or APIs +2. **Multimodal Inputs**: For handling images, audio, or other media types +3. **Function Calling**: For structured function execution +4. **Multi-turn Context**: For maintaining conversation history + +Here's an example of a chat template with tool use: ```python messages = [ { "role": "system", - "content": "You are a helpful assistant focused on technical topics.", + "content": "You are an AI assistant that can use tools. Available tools: calculator, weather_api", }, - {"role": "user", "content": "Can you explain what a chat template is?"}, + {"role": "user", "content": "What's 123 * 456 and is it raining in Paris?"}, { "role": "assistant", - "content": "A chat template structures conversations between users and AI models...", + "content": "Let me help you with that.", + "tool_calls": [ + { + "tool": "calculator", + "parameters": {"operation": "multiply", "x": 123, "y": 456} + }, + { + "tool": "weather_api", + "parameters": {"city": "Paris", "country": "France"} + } + ] + }, + { + "role": "tool", + "tool_name": "calculator", + "content": "56088" + }, + { + "role": "tool", + "tool_name": "weather_api", + "content": "{'condition': 'rain', 'temperature': 15}" + } +] +``` + +For multimodal conversations, chat templates can include image references or base64-encoded images: + +```python +messages = [ + { + "role": "system", + "content": "You are a helpful vision assistant that can analyze images." }, + { + "role": "user", + "content": [ + { + "type": "text", + "text": "What's in this image?" + }, + { + "type": "image", + "image_url": "https://example.com/image.jpg" + } + ] + } ] ``` -Let's break down the above example, and see how it maps to the chat template format. +## Working with Chat Templates + +When working with chat templates, you have several options for processing the conversation: + +1. Apply the template without tokenization to return the raw formatted string +2. Apply the template with tokenization to return the token IDs +3. Add a generation prompt to prepare for model inference + +The tokenizer's `apply_chat_template()` method handles all these cases through its parameters: + +- `tokenize`: Whether to return token IDs (True) or the formatted string (False) +- `add_generation_prompt`: Whether to add a prompt for the model to generate a response ## System Messages @@ -68,8 +249,29 @@ conversation = [ ] ``` +## Best Practices + +When working with chat templates, consider these best practices: + +1. **Consistent Formatting**: Always use the same template format throughout your application +2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message +3. **Context Management**: Be mindful of token limits when maintaining conversation history +4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs +5. **Validation**: Validate message structure before sending to the model + -✏️ **Try it out!** Create a chat template for a conversation between a user and an assistant. Then, use the `transformers` library to tokenize the conversation and see how the model responds. You won't need to download the model to do this, as the tokenizer will handle the formatting. +✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file. + +For this exercise, you'll need to: +1. Load the dataset using the Hugging Face datasets library +2. Create a processing function that converts the samples into the correct chat format +3. Apply the chat template using the tokenizer's methods + +## Resources + +- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Transformers Documentation](https://huggingface.co/docs/transformers) +- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) \ No newline at end of file diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 748bd74a1..5ae7b7605 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -1,87 +1,125 @@ -# Implementation with Transformers - - - -Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages. - -```python -from transformers import AutoTokenizer - -tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") - -messages = [ - {"role": "system", "content": "You are a helpful coding assistant."}, - {"role": "user", "content": "Write a Python function to sort a list"}, -] - -# Apply the chat template -formatted_chat = tokenizer.apply_chat_template( - messages, tokenize=False, add_generation_prompt=True -) -``` - -This will return a formatted string that can be passed to the model. It would look like this for the SmolLM2-135M-Instruct model specified: - -```sh -<|im_start|>system -You are a helpful coding assistant.<|im_end|> -<|im_start|>user -Write a Python function to sort a list<|im_end|> -``` - -Note that the `im_start` and `im_end` tokens are used to indicate the start and end of a message. The tokenizer will also have corresponding special tokens for the start and end of messages. For a refresher on how these tokens work, see the [Tokenizers](../chapter2/5.mdx) section. - -Chat templates can handle multi-turn conversations while maintaining context: - -```python -messages = [ - {"role": "system", "content": "You are a math tutor."}, - {"role": "user", "content": "What is calculus?"}, - {"role": "assistant", "content": "Calculus is a branch of mathematics..."}, - {"role": "user", "content": "Can you give me an example?"}, -] -``` - -## Working with Chat Templates - -When working with chat templates, you have several options for processing the conversation: - -1. Apply the template without tokenization to return the raw formatted string -2. Apply the template with tokenization to return the token IDs -3. Add a generation prompt to prepare for model inference - -The tokenizer's `apply_chat_template()` method handles all these cases through its parameters: - -- `tokenize`: Whether to return token IDs (True) or the formatted string (False) -- `add_generation_prompt`: Whether to add a prompt for the model to generate a response - - - -✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file. - -For this exercise, you'll need to: -1. Load the dataset using the Hugging Face datasets library -2. Create a processing function that converts the samples into the correct chat format -3. Apply the chat template using the tokenizer's methods - - - -## Conclusion - -Chat templates are a crucial component for working with language models, especially when fine-tuning or deploying models for chat applications. They provide structure and consistency to conversations, making it easier for models to understand context and generate appropriate responses. - -Understanding how to work with chat templates is essential for: -- Converting datasets for fine-tuning -- Preparing inputs for model inference -- Maintaining conversation context -- Ensuring consistent model behavior - -## Resources - -- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Transformers Documentation](https://huggingface.co/docs/transformers) -- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) +# Supervised Fine-Tuning + +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. + +Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections. + +## Understanding Supervised Fine-Tuning + +Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. + +SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. + +## When to Use Supervised Fine-Tuning + +The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. + +Two core reasons to use SFT are: + +1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs. + +2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise. + +## Quiz + +### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)? + + + +### 2. Which of the following are valid reasons to use SFT? + + + +### 3. What is required for effective Supervised Fine-Tuning? + + + +### 4. How does SFT relate to chat templates? + + + +### 5. What distinguishes SFT from pre-training? + + \ No newline at end of file diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx index 5ae7b7605..183a9fa06 100644 --- a/chapters/en/chapter11/4.mdx +++ b/chapters/en/chapter11/4.mdx @@ -1,125 +1,254 @@ -# Supervised Fine-Tuning - -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. - -Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections. - -## Understanding Supervised Fine-Tuning - -Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. - -SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. - -## When to Use Supervised Fine-Tuning - -The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. - -Two core reasons to use SFT are: - -1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs. - -2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise. - -## Quiz - -### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)? - - - -### 2. Which of the following are valid reasons to use SFT? - - - -### 3. What is required for effective Supervised Fine-Tuning? - - - -### 4. How does SFT relate to chat templates? - - - -### 5. What distinguishes SFT from pre-training? - - \ No newline at end of file + + +# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning + +In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. + +## Dataset Preparation + +The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. + + + +## Understanding Training Dynamics + +When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves. + +### Loss Patterns + +The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data. + +### The Path to Convergence + +As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset. + +### Monitoring Training Progress + +
+ Training and validation loss curves showing healthy convergence +
+ +The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. + +### Warning Signs to Watch For + +Several patterns in the loss curves can indicate potential issues: + +1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: + - Reducing the model size or training time + - Adding regularization + - Increasing the dataset size + - Using techniques like early stopping + +2. If the loss doesn't show significant improvement, the model might be: + - Learning too slowly (try increasing the learning rate) + - Struggling with the task (check data quality and task complexity) + - Hitting architecture limitations (consider a different model) + +3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: + - The model performs poorly on new, similar examples + - The outputs lack diversity + - The responses are too similar to training examples + + + +Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss. + + + +## Training Configuration + +We will configure SFT trainer with the following parameters: + +| Parameter | Description | +|-----------|-------------| +| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) | +| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) | +| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size | +| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) | +| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations | +| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) | +| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | +| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | + +In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. + +## Training and Evaluation + +Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. + +- Iterating over the dataset +- Computing the loss +- Updating the model's parameters +- Regular evaluation on a validation set + +Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. + +## `SFTTrainer` from Transformer Reinforcement Learning + +Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter. + +Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics. + +```python +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer + +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") + +training_args = SFTConfig( + max_seq_length=512, + output_dir="/tmp", +) + +trainer = SFTTrainer( + model_name="HuggingFaceTB/SmolLM2-135M", + train_dataset=dataset, + args=training_args, +) +trainer.train() +``` + +Just like in `transformers`, we work through the following steps: + +1. Load the dataset +2. Configure the SFTTrainer with appropriate parameters +3. Train the model and monitor its progress +4. Save and evaluate the fine-tuned model + + + +✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. + +For this exercise, you'll need to: +1. Load and prepare your chosen dataset +2. Configure the SFTTrainer with appropriate parameters +3. Train the model and monitor its progress +4. Save and evaluate the fine-tuned model + + + +# Supervised Fine-Tuning with SFTTrainer + +In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. + +## Load the base model + +Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. + +```python +# Import necessary libraries +from transformers import AutoModelForCausalLM, AutoTokenizer +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer, setup_chat_format +import torch + +# Set the device to use for training +device = ( + "cuda" + if torch.cuda.is_available() + else "mps" if torch.backends.mps.is_available() else "cpu" +) + +# Load the model and tokenizer +model = AutoModelForCausalLM.from_pretrained( + pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" +).to(device) +tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) + +# Set up the chat format +model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) +``` + +## Generate with the base model + +First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model. + +```python +# Let's test the base model before training +prompt = "Write a haiku about programming" + +# Format with template +messages = [{"role": "user", "content": prompt}] +formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) + +# Generate response +inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) +outputs = model.generate(**inputs, max_new_tokens=100) +``` + +## Dataset Preparation + +We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. + +**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. + +```python +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +``` + +## Configuring the SFTTrainer + +The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. + +```python +# Configure the SFTTrainer +sft_config = SFTConfig( + output_dir="./sft_output", + max_steps=1000, # Adjust based on dataset size and desired training duration + per_device_train_batch_size=4, # Set according to your GPU memory capacity + learning_rate=5e-5, # Common starting point for fine-tuning + logging_steps=10, # Frequency of logging training metrics + save_steps=100, # Frequency of saving model checkpoints + evaluation_strategy="steps", # Evaluate the model at regular intervals + eval_steps=50, # Frequency of evaluation + use_mps_device=( + True if device == "mps" else False + ), # Use MPS for mixed precision training + hub_model_id=finetune_name, # Set a unique name for your model +) + +# Initialize the SFTTrainer +trainer = SFTTrainer( + model=model, + args=sft_config, + train_dataset=ds["train"], + tokenizer=tokenizer, + eval_dataset=ds["test"], +) +``` + +## Training the model + +With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. + +```python +trainer.train() +``` + + + +SFTTrainer Training + +## 💐 Nice work! + +This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: + +- Try this notebook on a harder difficulty +- Review a colleagues PR +- Improve the course material via an Issue or PR. + + +## Resources + +- [TRL Documentation](https://huggingface.co/docs/trl) +- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index b63c2c38d..28583bb58 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -1,91 +1,166 @@ -# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning +# LoRA (Low-Rank Adaptation) -In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. +Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. -## Dataset Preparation +## Understanding LoRA -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. +LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). - +LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. -## Training Configuration +## Key advantages of LoRA -We will configure SFT trainer with the following parameters: +1. **Memory Efficiency**: + - Only adapter parameters are stored in GPU memory + - Base model weights remain frozen and can be loaded in lower precision + - Enables fine-tuning of large models on consumer GPUs + +2. **Training Features**: + - Native PEFT/LoRA integration with minimal setup + - Support for QLoRA (Quantized LoRA) for even better memory efficiency + +3. **Adapter Management**: + - Adapter weight saving during checkpoints + - Features to merge adapters back into base model + +## Loading LoRA Adapters with PEFT + +PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. + +Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. + +```python +from transformers import AutoModelForCausalLM +from peft import PeftModel + +base_model = AutoModelForCausalLM.from_pretrained("") +peft_model_id = "" +model = PeftModel.from_pretrained(base_model, peft_model_id) +``` + + +![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) + +## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA + +The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train. + +We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps: + +1. Define the LoRA configuration (rank, alpha, dropout) +2. Create the SFTTrainer with PEFT config +3. Train and save the adapter weights + +## LoRA Configuration + +Let's walk through the LoRA configuration and key parameters. | Parameter | Description | |-----------|-------------| -| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) | -| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) | -| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size | -| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) | -| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations | -| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) | -| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | -| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | +| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. | +| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. | +| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. | +| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | +| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | + +When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. + +## Using TRL with PEFT + +PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the `LoraConfig` to the model when loading it. + +```python +from peft import LoraConfig + +# TODO: Configure LoRA parameters +# r: rank dimension for LoRA update matrices (smaller = more compression) +rank_dimension = 6 +# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation) +lora_alpha = 8 +# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) +lora_dropout = 0.05 + +peft_config = LoraConfig( + r=rank_dimension, # Rank dimension - typically between 4-32 + lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank + lora_dropout=lora_dropout, # Dropout probability for LoRA layers + bias="none", # Bias type for LoRA. the corresponding biases will be updated during training. + target_modules="all-linear", # Which modules to apply LoRA to + task_type="CAUSAL_LM", # Task type for model architecture +) +``` -In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. +Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. -## Training and Evaluation +We will also need to define the `SFTTrainer` with the LoRA configuration. -Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. +```python +# Create SFTTrainer with LoRA configuration +trainer = SFTTrainer( + model=model, + args=args, + train_dataset=dataset["train"], + peft_config=lora_config, # LoRA configuration + max_seq_length=max_seq_length, # Maximum sequence length + tokenizer=tokenizer, +) +``` -- Iterating over the dataset -- Computing the loss -- Updating the model's parameters -- Regular evaluation on a validation set + -Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. +✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. -## `SFTTrainer` from Transformer Reinforcement Learning + -Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter. +## Merging LoRA Adapters -Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics. +After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. -```python -from datasets import load_dataset -from trl import SFTConfig, SFTTrainer +The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. -training_args = SFTConfig( - max_seq_length=512, - output_dir="/tmp", +## Merging Implementation + +After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: + +```python +import torch +from transformers import AutoModelForCausalLM +from peft import PeftModel + +# 1. Load the base model +base_model = AutoModelForCausalLM.from_pretrained( + "base_model_name", torch_dtype=torch.float16, device_map="auto" ) -trainer = SFTTrainer( - model_name="HuggingFaceTB/SmolLM2-135M", - train_dataset=dataset, - args=training_args, +# 2. Load the PEFT model with adapter +peft_model = PeftModel.from_pretrained( + base_model, "path/to/adapter", torch_dtype=torch.float16 ) -trainer.train() + +# 3. Merge adapter weights with base model +merged_model = peft_model.merge_and_unload() ``` -Just like in `transformers`, we work through the following steps: +If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: -1. Load the dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model +```python +# Save both model and tokenizer +tokenizer = AutoTokenizer.from_pretrained("base_model_name") +merged_model.save_pretrained("path/to/save/merged_model") +tokenizer.save_pretrained("path/to/save/merged_model") +``` -✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. - -For this exercise, you'll need to: -1. Load and prepare your chosen dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model +✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. -## Resources -- [TRL Documentation](https://huggingface.co/docs/trl) -- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file +# Resources + +- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) +- [PEFT Documentation](https://huggingface.co/docs/peft) +- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx index cb103fd6a..d02183d40 100644 --- a/chapters/en/chapter11/6.mdx +++ b/chapters/en/chapter11/6.mdx @@ -1,111 +1,245 @@ -# Supervised Fine-Tuning with SFTTrainer - - - -In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. - -## Load the base model - -Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. - -```python -# Import necessary libraries -from transformers import AutoModelForCausalLM, AutoTokenizer -from datasets import load_dataset -from trl import SFTConfig, SFTTrainer, setup_chat_format -import torch - -# Set the device to use for training -device = ( - "cuda" - if torch.cuda.is_available() - else "mps" if torch.backends.mps.is_available() else "cpu" -) - -# Load the model and tokenizer -model = AutoModelForCausalLM.from_pretrained( - pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" -).to(device) -tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) - -# Set up the chat format -model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) -``` +# Evaluation -## Generate with the base model +With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. -First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model. +## Automatic Benchmarks -```python -# Let's test the base model before training -prompt = "Write a haiku about programming" +Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. -# Format with template -messages = [{"role": "user", "content": prompt}] -formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) +## Understanding Automatic Benchmarks -# Generate response -inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) -outputs = model.generate(**inputs, max_new_tokens=100) -``` +Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. -## Dataset Preparation +However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. -We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. +## General Knowledge Benchmarks -**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. +MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. -```python -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") -``` +## Reasoning Benchmarks +BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. + +## Language Understanding + +HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. + +## Alternative Evaluation Approaches + +Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: + +### LLM-as-Judge + +Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. + +### Evaluation Arenas + +Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. + +### Custom Benchmark Suites + +Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. + +## Creating Your Own Evaluation Strategy + +Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. -## Configuring the SFTTrainer - -The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. - -```python -# Configure the SFTTrainer -sft_config = SFTConfig( - output_dir="./sft_output", - max_steps=1000, # Adjust based on dataset size and desired training duration - per_device_train_batch_size=4, # Set according to your GPU memory capacity - learning_rate=5e-5, # Common starting point for fine-tuning - logging_steps=10, # Frequency of logging training metrics - save_steps=100, # Frequency of saving model checkpoints - evaluation_strategy="steps", # Evaluate the model at regular intervals - eval_steps=50, # Frequency of evaluation - use_mps_device=( - True if device == "mps" else False - ), # Use MPS for mixed precision training - hub_model_id=finetune_name, # Set a unique name for your model -) - -# Initialize the SFTTrainer -trainer = SFTTrainer( - model=model, - args=sft_config, - train_dataset=ds["train"], - tokenizer=tokenizer, - eval_dataset=ds["test"], -) +While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: + +1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. + +2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? + +3. Develop custom evaluation datasets that reflect your actual use case. This might include: + - Real user queries from your domain + - Common edge cases you've encountered + - Examples of particularly challenging scenarios + +4. Consider implementing a multi-layered evaluation strategy: + - Automated metrics for quick feedback + - Human evaluation for nuanced understanding + - Domain expert review for specialized applications + - A/B testing in controlled environments + +# Implementing Evaluation + +In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. + +LightEval tasks are defined using a specific format: + +``` +{suite}|{task}|{num_few_shot}|{auto_reduce} ``` -## Training the model +| Parameter | Description | +|-----------|-------------| +| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') | +| `task` | Specific task within the suite (e.g., 'abstract_algebra') | +| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) | +| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) | + +Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. + +## Example Evaluation Pipeline -With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. +Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on set of sub tasks that relate to the domain of medicine. -```python -trainer.train() +Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend: + +```bash +lighteval vllm \ + "pretrained=your-model-name" \ + "mmlu|anatomy|0|0" \ + "mmlu|high_school_biology|0|0" \ + "mmlu|high_school_chemistry|0|0" \ + "mmlu|professional_medicine|0|0" \ + --max_samples 40 \ + --batch_size 1 \ + --output_path "./results" \ + --save_generations true ``` -## 💐 Nice work! +Results are displayed in a tabular format showing: -This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: +``` +| Task |Version|Metric|Value | |Stderr| +|----------------------------------------|------:|------|-----:|---|-----:| +|all | |acc |0.3333|± |0.1169| +|leaderboard:mmlu:_average:5 | |acc |0.3400|± |0.1121| +|leaderboard:mmlu:anatomy:5 | 0|acc |0.4500|± |0.1141| +|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| +``` -- Try this notebook on a harder difficulty -- Review a colleagues PR -- Improve the course material via an Issue or PR. +Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information. + + + +✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval. + + + +# End-of-chapter quiz[[end-of-chapter-quiz]] + + + +### 1. What are the main advantages of using automatic benchmarks for model evaluation? + + + +### 2. Which benchmark specifically tests knowledge across 57 different subjects? + + + +### 3. What is LLM-as-Judge? + + + +### 4. What should be included in a comprehensive evaluation strategy? + + + +### 5. What is a limitation of automatic benchmarks? + + + +### 6. What is the purpose of creating custom evaluation datasets? + + diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx index 561acf61e..093de47d6 100644 --- a/chapters/en/chapter11/7.mdx +++ b/chapters/en/chapter11/7.mdx @@ -1,119 +1,13 @@ -# LoRA (Low-Rank Adaptation) +# Conclusion -Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. +In this chapter, we explored the essential components of fine-tuning language models: -## Understanding LoRA +1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting. -LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). +2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge. -LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. +3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance. -## Key advantages of LoRA +4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks. -1. **Memory Efficiency**: - - Only adapter parameters are stored in GPU memory - - Base model weights remain frozen and can be loaded in lower precision - - Enables fine-tuning of large models on consumer GPUs - -2. **Training Features**: - - Native PEFT/LoRA integration with minimal setup - - Support for QLoRA (Quantized LoRA) for even better memory efficiency - -3. **Adapter Management**: - - Adapter weight saving during checkpoints - - Features to merge adapters back into base model - -## Loading LoRA Adapters with PEFT - -PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. - -Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. - -```python -from transformers import AutoModelForCausalLM -from peft import PeftModel - -base_model = AutoModelForCausalLM.from_pretrained("") -peft_model_id = "" -model = PeftModel.from_pretrained(base_model, peft_model_id) -``` - - -![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) - -## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA - -The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train. - -We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps: - -1. Define the LoRA configuration (rank, alpha, dropout) -2. Create the SFTTrainer with PEFT config -3. Train and save the adapter weights - -## LoRA Configuration - -Let's walk through the LoRA configuration and key parameters. - -| Parameter | Description | -|-----------|-------------| -| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. | -| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. | -| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. | -| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | -| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | - -When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. - -## Using TRL with PEFT - -PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the `LoraConfig` to the model when loading it. - -```python -from peft import LoraConfig - -# TODO: Configure LoRA parameters -# r: rank dimension for LoRA update matrices (smaller = more compression) -rank_dimension = 6 -# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation) -lora_alpha = 8 -# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) -lora_dropout = 0.05 - -peft_config = LoraConfig( - r=rank_dimension, # Rank dimension - typically between 4-32 - lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank - lora_dropout=lora_dropout, # Dropout probability for LoRA layers - bias="none", # Bias type for LoRA. the corresponding biases will be updated during training. - target_modules="all-linear", # Which modules to apply LoRA to - task_type="CAUSAL_LM", # Task type for model architecture -) -``` - -Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. - -We will also need to define the `SFTTrainer` with the LoRA configuration. - -```python -# Create SFTTrainer with LoRA configuration -trainer = SFTTrainer( - model=model, - args=args, - train_dataset=dataset["train"], - peft_config=lora_config, # LoRA configuration - max_seq_length=max_seq_length, # Maximum sequence length - tokenizer=tokenizer, -) -``` - - - -✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. - - - -# Resources - -- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) -- [PEFT Documentation](https://huggingface.co/docs/peft) -- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file +These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation. diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx index 323a3b646..690547c5f 100644 --- a/chapters/en/chapter11/8.mdx +++ b/chapters/en/chapter11/8.mdx @@ -1,45 +1,16 @@ -## Merging LoRA Adapters +# Exam Time! -After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. +It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter. -The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. +To take the quiz, you will need to follow these steps: -Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. +1. Sign in to your Hugging Face account. +2. Answer the questions in the quiz. +3. Submit your answers. -## Merging Implementation - -After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: - -```python -import torch -from transformers import AutoModelForCausalLM -from peft import PeftModel - -# 1. Load the base model -base_model = AutoModelForCausalLM.from_pretrained( - "base_model_name", torch_dtype=torch.float16, device_map="auto" -) - -# 2. Load the PEFT model with adapter -peft_model = PeftModel.from_pretrained( - base_model, "path/to/adapter", torch_dtype=torch.float16 -) - -# 3. Merge adapter weights with base model -merged_model = peft_model.merge_and_unload() -``` - -If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: - -```python -# Save both model and tokenizer -tokenizer = AutoTokenizer.from_pretrained("base_model_name") -merged_model.save_pretrained("path/to/save/merged_model") -tokenizer.save_pretrained("path/to/save/merged_model") -``` - - - -✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. - - + diff --git a/chapters/en/chapter11/9.mdx b/chapters/en/chapter11/9.mdx deleted file mode 100644 index ab88e9d71..000000000 --- a/chapters/en/chapter11/9.mdx +++ /dev/null @@ -1,188 +0,0 @@ -# Evaluation - -With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. - -## Automatic Benchmarks - -Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. - -## Understanding Automatic Benchmarks - -Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. - -However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. - -## General Knowledge Benchmarks - -MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. - -## Reasoning Benchmarks -BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. - -## Language Understanding - -HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. - -## Alternative Evaluation Approaches - -Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: - -### LLM-as-Judge - -Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. - -### Evaluation Arenas - -Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. - -### Custom Benchmark Suites - -Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. - -## Creating Your Own Evaluation Strategy - -Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. - -While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: - -1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. - -2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? - -3. Develop custom evaluation datasets that reflect your actual use case. This might include: - - Real user queries from your domain - - Common edge cases you've encountered - - Examples of particularly challenging scenarios - -4. Consider implementing a multi-layered evaluation strategy: - - Automated metrics for quick feedback - - Human evaluation for nuanced understanding - - Domain expert review for specialized applications - - A/B testing in controlled environments - -# End-of-chapter quiz[[end-of-chapter-quiz]] - - - -### 1. What are the main advantages of using automatic benchmarks for model evaluation? - - - -### 2. Which benchmark specifically tests knowledge across 57 different subjects? - - - -### 3. What is LLM-as-Judge? - - - -### 4. What should be included in a comprehensive evaluation strategy? - - - -### 5. What is a limitation of automatic benchmarks? - - - -### 6. What is the purpose of creating custom evaluation datasets? - - From a9847d08f681f324f83b9fdeb3635af43d85cd93 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 5 Feb 2025 21:44:16 +0100 Subject: [PATCH 09/30] update toc and format snippets --- chapters/en/_toctree.yml | 18 +++++------------ chapters/en/chapter11/2.mdx | 39 +++++++++++++------------------------ 2 files changed, 18 insertions(+), 39 deletions(-) diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml index 8cd568bc1..a0b6a61fe 100644 --- a/chapters/en/_toctree.yml +++ b/chapters/en/_toctree.yml @@ -215,24 +215,16 @@ - local: chapter11/1 title: Introduction - local: chapter11/2 - title: Chat templates + title: Chat Templates - local: chapter11/4 - title: Introduction to Supervised Fine-Tuning + title: Fine-Tuning with SFTTrainer - local: chapter11/5 - title: Introduction to SFTTrainer in TRL + title: LoRA (Low-Rank Adaptation) - local: chapter11/6 - title: Fine-Tuning a Model with SFTTrainer + title: Evaluation - local: chapter11/7 - title: LoRA (Low-Rank Adaptation) - - local: chapter11/8 - title: Merging LoRA Adapters - - local: chapter11/9 - title: Evaluating Fine-Tuned Models - - local: chapter11/10 - title: Implementing Evaluation - - local: chapter11/11 title: Conclusion - - local: chapter11/12 + - local: chapter11/8 title: Exam Time! quiz: 11 diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index 5b59f4b11..8798e1d05 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -27,7 +27,7 @@ messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, {"role": "assistant", "content": "Hi! How can I help you today?"}, - {"role": "user", "content": "What's the weather?"} + {"role": "user", "content": "What's the weather?"}, ] ``` @@ -85,7 +85,7 @@ qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat") messages = [ {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "Hello!"} + {"role": "user", "content": "Hello!"}, ] # Each will format according to its model's template @@ -165,24 +165,17 @@ messages = [ "tool_calls": [ { "tool": "calculator", - "parameters": {"operation": "multiply", "x": 123, "y": 456} + "parameters": {"operation": "multiply", "x": 123, "y": 456}, }, - { - "tool": "weather_api", - "parameters": {"city": "Paris", "country": "France"} - } - ] - }, - { - "role": "tool", - "tool_name": "calculator", - "content": "56088" + {"tool": "weather_api", "parameters": {"city": "Paris", "country": "France"}}, + ], }, + {"role": "tool", "tool_name": "calculator", "content": "56088"}, { "role": "tool", "tool_name": "weather_api", - "content": "{'condition': 'rain', 'temperature': 15}" - } + "content": "{'condition': 'rain', 'temperature': 15}", + }, ] ``` @@ -192,21 +185,15 @@ For multimodal conversations, chat templates can include image references or bas messages = [ { "role": "system", - "content": "You are a helpful vision assistant that can analyze images." + "content": "You are a helpful vision assistant that can analyze images.", }, { "role": "user", "content": [ - { - "type": "text", - "text": "What's in this image?" - }, - { - "type": "image", - "image_url": "https://example.com/image.jpg" - } - ] - } + {"type": "text", "text": "What's in this image?"}, + {"type": "image", "image_url": "https://example.com/image.jpg"}, + ], + }, ] ``` From 82b1d4ae67fe1618a58f817d12e732ffdb3a9d16 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 5 Feb 2025 21:50:12 +0100 Subject: [PATCH 10/30] update structure --- chapters/en/_toctree.yml | 10 +- chapters/en/chapter11/3.mdx | 257 +++++++++++++++++++++++++++- chapters/en/chapter11/4.mdx | 296 ++++++++++++-------------------- chapters/en/chapter11/5.mdx | 329 ++++++++++++++++++++++-------------- chapters/en/chapter11/6.mdx | 246 +-------------------------- chapters/en/chapter11/7.mdx | 21 ++- chapters/en/chapter11/8.mdx | 16 -- 7 files changed, 588 insertions(+), 587 deletions(-) delete mode 100644 chapters/en/chapter11/8.mdx diff --git a/chapters/en/_toctree.yml b/chapters/en/_toctree.yml index a0b6a61fe..73e2a069b 100644 --- a/chapters/en/_toctree.yml +++ b/chapters/en/_toctree.yml @@ -216,15 +216,15 @@ title: Introduction - local: chapter11/2 title: Chat Templates - - local: chapter11/4 + - local: chapter11/3 title: Fine-Tuning with SFTTrainer - - local: chapter11/5 + - local: chapter11/4 title: LoRA (Low-Rank Adaptation) - - local: chapter11/6 + - local: chapter11/5 title: Evaluation - - local: chapter11/7 + - local: chapter11/6 title: Conclusion - - local: chapter11/8 + - local: chapter11/7 title: Exam Time! quiz: 11 diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 5ae7b7605..f84f77557 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -1,3 +1,9 @@ + + # Supervised Fine-Tuning Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. @@ -122,4 +128,253 @@ Two core reasons to use SFT are: explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples." } ]} -/> \ No newline at end of file +/> + +# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning + +In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. + +## Dataset Preparation + +The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. + + + +## Understanding Training Dynamics + +When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves. + +### Loss Patterns + +The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data. + +### The Path to Convergence + +As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset. + +### Monitoring Training Progress + +
+ Training and validation loss curves showing healthy convergence +
+ +The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. + +### Warning Signs to Watch For + +Several patterns in the loss curves can indicate potential issues: + +1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: + - Reducing the model size or training time + - Adding regularization + - Increasing the dataset size + - Using techniques like early stopping + +2. If the loss doesn't show significant improvement, the model might be: + - Learning too slowly (try increasing the learning rate) + - Struggling with the task (check data quality and task complexity) + - Hitting architecture limitations (consider a different model) + +3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: + - The model performs poorly on new, similar examples + - The outputs lack diversity + - The responses are too similar to training examples + + + +Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss. + + + +## Training Configuration + +We will configure SFT trainer with the following parameters: + +| Parameter | Description | +|-----------|-------------| +| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) | +| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) | +| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size | +| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) | +| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations | +| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) | +| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | +| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | + +In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. + +## Training and Evaluation + +Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. + +- Iterating over the dataset +- Computing the loss +- Updating the model's parameters +- Regular evaluation on a validation set + +Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. + +## `SFTTrainer` from Transformer Reinforcement Learning + +Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter. + +Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics. + +```python +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer + +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") + +training_args = SFTConfig( + max_seq_length=512, + output_dir="/tmp", +) + +trainer = SFTTrainer( + model_name="HuggingFaceTB/SmolLM2-135M", + train_dataset=dataset, + args=training_args, +) +trainer.train() +``` + +Just like in `transformers`, we work through the following steps: + +1. Load the dataset +2. Configure the SFTTrainer with appropriate parameters +3. Train the model and monitor its progress +4. Save and evaluate the fine-tuned model + + + +✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. + +For this exercise, you'll need to: +1. Load and prepare your chosen dataset +2. Configure the SFTTrainer with appropriate parameters +3. Train the model and monitor its progress +4. Save and evaluate the fine-tuned model + + + +# Supervised Fine-Tuning with SFTTrainer + +Let's dive into the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. + +## Load the base model + +Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. + +```python +# Import necessary libraries +from transformers import AutoModelForCausalLM, AutoTokenizer +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer, setup_chat_format +import torch + +# Set the device to use for training +device = ( + "cuda" + if torch.cuda.is_available() + else "mps" if torch.backends.mps.is_available() else "cpu" +) + +# Load the model and tokenizer +model = AutoModelForCausalLM.from_pretrained( + pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" +).to(device) +tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) + +# Set up the chat format +model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) +``` + +## Generate with the base model + +First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model. + +```python +# Let's test the base model before training +prompt = "Write a haiku about programming" + +# Format with template +messages = [{"role": "user", "content": prompt}] +formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) + +# Generate response +inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) +outputs = model.generate(**inputs, max_new_tokens=100) +``` + +## Dataset Preparation + +We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. + +**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. + +```python +dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +``` + +## Configuring the SFTTrainer + +The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. + +```python +# Configure the SFTTrainer +sft_config = SFTConfig( + output_dir="./sft_output", + max_steps=1000, # Adjust based on dataset size and desired training duration + per_device_train_batch_size=4, # Set according to your GPU memory capacity + learning_rate=5e-5, # Common starting point for fine-tuning + logging_steps=10, # Frequency of logging training metrics + save_steps=100, # Frequency of saving model checkpoints + evaluation_strategy="steps", # Evaluate the model at regular intervals + eval_steps=50, # Frequency of evaluation + use_mps_device=( + True if device == "mps" else False + ), # Use MPS for mixed precision training + hub_model_id=finetune_name, # Set a unique name for your model +) + +# Initialize the SFTTrainer +trainer = SFTTrainer( + model=model, + args=sft_config, + train_dataset=ds["train"], + tokenizer=tokenizer, + eval_dataset=ds["test"], +) +``` + +## Training the model + +With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. + +```python +trainer.train() +``` + + + +SFTTrainer Training + +## 💐 Nice work! + +This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: + +- Try this notebook on a harder difficulty +- Review a colleagues PR +- Improve the course material via an Issue or PR. + + +## Resources + +- [TRL Documentation](https://huggingface.co/docs/trl) +- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx index 183a9fa06..28583bb58 100644 --- a/chapters/en/chapter11/4.mdx +++ b/chapters/en/chapter11/4.mdx @@ -1,254 +1,166 @@ - +# LoRA (Low-Rank Adaptation) -# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning +Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. -In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. +## Understanding LoRA -## Dataset Preparation +LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. +LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. - +## Key advantages of LoRA -## Understanding Training Dynamics +1. **Memory Efficiency**: + - Only adapter parameters are stored in GPU memory + - Base model weights remain frozen and can be loaded in lower precision + - Enables fine-tuning of large models on consumer GPUs -When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves. +2. **Training Features**: + - Native PEFT/LoRA integration with minimal setup + - Support for QLoRA (Quantized LoRA) for even better memory efficiency -### Loss Patterns +3. **Adapter Management**: + - Adapter weight saving during checkpoints + - Features to merge adapters back into base model -The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data. +## Loading LoRA Adapters with PEFT -### The Path to Convergence +PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. -As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset. +Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. -### Monitoring Training Progress - -
- Training and validation loss curves showing healthy convergence -
- -The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. - -### Warning Signs to Watch For - -Several patterns in the loss curves can indicate potential issues: +```python +from transformers import AutoModelForCausalLM +from peft import PeftModel -1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: - - Reducing the model size or training time - - Adding regularization - - Increasing the dataset size - - Using techniques like early stopping +base_model = AutoModelForCausalLM.from_pretrained("") +peft_model_id = "" +model = PeftModel.from_pretrained(base_model, peft_model_id) +``` -2. If the loss doesn't show significant improvement, the model might be: - - Learning too slowly (try increasing the learning rate) - - Struggling with the task (check data quality and task complexity) - - Hitting architecture limitations (consider a different model) + +![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) -3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: - - The model performs poorly on new, similar examples - - The outputs lack diversity - - The responses are too similar to training examples +## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA - +The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train. -Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss. +We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps: - +1. Define the LoRA configuration (rank, alpha, dropout) +2. Create the SFTTrainer with PEFT config +3. Train and save the adapter weights -## Training Configuration +## LoRA Configuration -We will configure SFT trainer with the following parameters: +Let's walk through the LoRA configuration and key parameters. | Parameter | Description | |-----------|-------------| -| num_train_epochs | The total number of training epochs to run (e.g., 1-3 epochs) | -| per_device_train_batch_size | The number of training examples processed per GPU in one forward/backward pass (typically 2-8 for large models) | -| gradient_accumulation_steps | Number of updates to accumulate before performing a backward pass, effectively increasing batch size | -| learning_rate | The step size for model weight updates during training (typically 2e-4 for fine-tuning) | -| gradient_checkpointing | Memory optimization technique that trades computation for memory by recomputing intermediate activations | -| warmup_ratio | Portion of training steps used for learning rate warmup (e.g., 0.03 = 3% of steps) | -| logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | -| save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | +| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. | +| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. | +| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. | +| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | +| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | -In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. +When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. -## Training and Evaluation +## Using TRL with PEFT -Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. - -- Iterating over the dataset -- Computing the loss -- Updating the model's parameters -- Regular evaluation on a validation set - -Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. - -## `SFTTrainer` from Transformer Reinforcement Learning - -Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter. - -Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics. +PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the `LoraConfig` to the model when loading it. ```python -from datasets import load_dataset -from trl import SFTConfig, SFTTrainer +from peft import LoraConfig + +# TODO: Configure LoRA parameters +# r: rank dimension for LoRA update matrices (smaller = more compression) +rank_dimension = 6 +# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation) +lora_alpha = 8 +# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) +lora_dropout = 0.05 + +peft_config = LoraConfig( + r=rank_dimension, # Rank dimension - typically between 4-32 + lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank + lora_dropout=lora_dropout, # Dropout probability for LoRA layers + bias="none", # Bias type for LoRA. the corresponding biases will be updated during training. + target_modules="all-linear", # Which modules to apply LoRA to + task_type="CAUSAL_LM", # Task type for model architecture +) +``` -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. -training_args = SFTConfig( - max_seq_length=512, - output_dir="/tmp", -) +We will also need to define the `SFTTrainer` with the LoRA configuration. +```python +# Create SFTTrainer with LoRA configuration trainer = SFTTrainer( - model_name="HuggingFaceTB/SmolLM2-135M", - train_dataset=dataset, - args=training_args, + model=model, + args=args, + train_dataset=dataset["train"], + peft_config=lora_config, # LoRA configuration + max_seq_length=max_seq_length, # Maximum sequence length + tokenizer=tokenizer, ) -trainer.train() ``` -Just like in `transformers`, we work through the following steps: - -1. Load the dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model - -✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. - -For this exercise, you'll need to: -1. Load and prepare your chosen dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model +✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. -# Supervised Fine-Tuning with SFTTrainer - -In this section we will unpack the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. - -## Load the base model - -Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. - -```python -# Import necessary libraries -from transformers import AutoModelForCausalLM, AutoTokenizer -from datasets import load_dataset -from trl import SFTConfig, SFTTrainer, setup_chat_format -import torch - -# Set the device to use for training -device = ( - "cuda" - if torch.cuda.is_available() - else "mps" if torch.backends.mps.is_available() else "cpu" -) +## Merging LoRA Adapters -# Load the model and tokenizer -model = AutoModelForCausalLM.from_pretrained( - pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" -).to(device) -tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) +After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. -# Set up the chat format -model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) -``` +The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. -## Generate with the base model +Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. -First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model. +## Merging Implementation -```python -# Let's test the base model before training -prompt = "Write a haiku about programming" - -# Format with template -messages = [{"role": "user", "content": prompt}] -formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) - -# Generate response -inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) -outputs = model.generate(**inputs, max_new_tokens=100) -``` - -## Dataset Preparation - -We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. - -**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. +After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: ```python -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") -``` - -## Configuring the SFTTrainer - -The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. +import torch +from transformers import AutoModelForCausalLM +from peft import PeftModel -```python -# Configure the SFTTrainer -sft_config = SFTConfig( - output_dir="./sft_output", - max_steps=1000, # Adjust based on dataset size and desired training duration - per_device_train_batch_size=4, # Set according to your GPU memory capacity - learning_rate=5e-5, # Common starting point for fine-tuning - logging_steps=10, # Frequency of logging training metrics - save_steps=100, # Frequency of saving model checkpoints - evaluation_strategy="steps", # Evaluate the model at regular intervals - eval_steps=50, # Frequency of evaluation - use_mps_device=( - True if device == "mps" else False - ), # Use MPS for mixed precision training - hub_model_id=finetune_name, # Set a unique name for your model +# 1. Load the base model +base_model = AutoModelForCausalLM.from_pretrained( + "base_model_name", torch_dtype=torch.float16, device_map="auto" ) -# Initialize the SFTTrainer -trainer = SFTTrainer( - model=model, - args=sft_config, - train_dataset=ds["train"], - tokenizer=tokenizer, - eval_dataset=ds["test"], +# 2. Load the PEFT model with adapter +peft_model = PeftModel.from_pretrained( + base_model, "path/to/adapter", torch_dtype=torch.float16 ) -``` -## Training the model +# 3. Merge adapter weights with base model +merged_model = peft_model.merge_and_unload() +``` -With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. +If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: ```python -trainer.train() +# Save both model and tokenizer +tokenizer = AutoTokenizer.from_pretrained("base_model_name") +merged_model.save_pretrained("path/to/save/merged_model") +tokenizer.save_pretrained("path/to/save/merged_model") ``` + +✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. -SFTTrainer Training - -## 💐 Nice work! - -This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: - -- Try this notebook on a harder difficulty -- Review a colleagues PR -- Improve the course material via an Issue or PR. + -## Resources +# Resources -- [TRL Documentation](https://huggingface.co/docs/trl) -- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file +- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) +- [PEFT Documentation](https://huggingface.co/docs/peft) +- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index 28583bb58..d02183d40 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -1,166 +1,245 @@ -# LoRA (Low-Rank Adaptation) +# Evaluation -Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%. +With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. -## Understanding LoRA +## Automatic Benchmarks -LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). +Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. -LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. +## Understanding Automatic Benchmarks -## Key advantages of LoRA +Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. -1. **Memory Efficiency**: - - Only adapter parameters are stored in GPU memory - - Base model weights remain frozen and can be loaded in lower precision - - Enables fine-tuning of large models on consumer GPUs +However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. -2. **Training Features**: - - Native PEFT/LoRA integration with minimal setup - - Support for QLoRA (Quantized LoRA) for even better memory efficiency +## General Knowledge Benchmarks -3. **Adapter Management**: - - Adapter weight saving during checkpoints - - Features to merge adapters back into base model +MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. -## Loading LoRA Adapters with PEFT +## Reasoning Benchmarks +BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. -PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques. +## Language Understanding -Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. +HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. -```python -from transformers import AutoModelForCausalLM -from peft import PeftModel +## Alternative Evaluation Approaches -base_model = AutoModelForCausalLM.from_pretrained("") -peft_model_id = "" -model = PeftModel.from_pretrained(base_model, peft_model_id) -``` - - -![lora_load_adapter](https://github.com/huggingface/smol-course/raw/main/3_parameter_efficient_finetuning/images/lora_adapter.png) +Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: -## Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA +### LLM-as-Judge -The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. This means that we can fine-tune a model in the same way as we did with SFT, but use LoRA to reduce the number of parameters we need to train. +Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. -We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps: +### Evaluation Arenas -1. Define the LoRA configuration (rank, alpha, dropout) -2. Create the SFTTrainer with PEFT config -3. Train and save the adapter weights +Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. -## LoRA Configuration +### Custom Benchmark Suites -Let's walk through the LoRA configuration and key parameters. +Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. -| Parameter | Description | -|-----------|-------------| -| `r` (rank) | Dimension of the low-rank matrices used for weight updates. Typically between 4-32. Lower values provide more compression but potentially less expressiveness. | -| `lora_alpha` | Scaling factor for LoRA layers, usually set to 2x the rank value. Higher values result in stronger adaptation effects. | -| `lora_dropout` | Dropout probability for LoRA layers, typically 0.05-0.1. Higher values help prevent overfitting during training. | -| `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | -| `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | - -When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. - -## Using TRL with PEFT - -PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for fine-tuning to reduce memory requirements. We can pass the `LoraConfig` to the model when loading it. - -```python -from peft import LoraConfig - -# TODO: Configure LoRA parameters -# r: rank dimension for LoRA update matrices (smaller = more compression) -rank_dimension = 6 -# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation) -lora_alpha = 8 -# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting) -lora_dropout = 0.05 - -peft_config = LoraConfig( - r=rank_dimension, # Rank dimension - typically between 4-32 - lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank - lora_dropout=lora_dropout, # Dropout probability for LoRA layers - bias="none", # Bias type for LoRA. the corresponding biases will be updated during training. - target_modules="all-linear", # Which modules to apply LoRA to - task_type="CAUSAL_LM", # Task type for model architecture -) -``` +## Creating Your Own Evaluation Strategy -Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. +Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. -We will also need to define the `SFTTrainer` with the LoRA configuration. +While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: -```python -# Create SFTTrainer with LoRA configuration -trainer = SFTTrainer( - model=model, - args=args, - train_dataset=dataset["train"], - peft_config=lora_config, # LoRA configuration - max_seq_length=max_seq_length, # Maximum sequence length - tokenizer=tokenizer, -) -``` - - - -✏️ **Try it out!** Build on your fine-tuned model from the previous section, but fine-tune it with LoRA. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. - - +1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. -## Merging LoRA Adapters +2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? -After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. +3. Develop custom evaluation datasets that reflect your actual use case. This might include: + - Real user queries from your domain + - Common edge cases you've encountered + - Examples of particularly challenging scenarios -The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will find the correct device for the model based on your hardware. +4. Consider implementing a multi-layered evaluation strategy: + - Automated metrics for quick feedback + - Human evaluation for nuanced understanding + - Domain expert review for specialized applications + - A/B testing in controlled environments -Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. +# Implementing Evaluation -## Merging Implementation +In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. -After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: +LightEval tasks are defined using a specific format: -```python -import torch -from transformers import AutoModelForCausalLM -from peft import PeftModel - -# 1. Load the base model -base_model = AutoModelForCausalLM.from_pretrained( - "base_model_name", torch_dtype=torch.float16, device_map="auto" -) - -# 2. Load the PEFT model with adapter -peft_model = PeftModel.from_pretrained( - base_model, "path/to/adapter", torch_dtype=torch.float16 -) +``` +{suite}|{task}|{num_few_shot}|{auto_reduce} +``` -# 3. Merge adapter weights with base model -merged_model = peft_model.merge_and_unload() +| Parameter | Description | +|-----------|-------------| +| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') | +| `task` | Specific task within the suite (e.g., 'abstract_algebra') | +| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) | +| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) | + +Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. + +## Example Evaluation Pipeline + +Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on set of sub tasks that relate to the domain of medicine. + +Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend: + +```bash +lighteval vllm \ + "pretrained=your-model-name" \ + "mmlu|anatomy|0|0" \ + "mmlu|high_school_biology|0|0" \ + "mmlu|high_school_chemistry|0|0" \ + "mmlu|professional_medicine|0|0" \ + --max_samples 40 \ + --batch_size 1 \ + --output_path "./results" \ + --save_generations true ``` -If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: +Results are displayed in a tabular format showing: -```python -# Save both model and tokenizer -tokenizer = AutoTokenizer.from_pretrained("base_model_name") -merged_model.save_pretrained("path/to/save/merged_model") -tokenizer.save_pretrained("path/to/save/merged_model") ``` +| Task |Version|Metric|Value | |Stderr| +|----------------------------------------|------:|------|-----:|---|-----:| +|all | |acc |0.3333|± |0.1169| +|leaderboard:mmlu:_average:5 | |acc |0.3400|± |0.1121| +|leaderboard:mmlu:anatomy:5 | 0|acc |0.4500|± |0.1141| +|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| +``` + +Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information. -✏️ **Try it out!** Merge the adapter weights back into the base model. Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, using the LoRA configuration we defined above. +✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval. - -# Resources - -- [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) -- [PEFT Documentation](https://huggingface.co/docs/peft) -- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) \ No newline at end of file +# End-of-chapter quiz[[end-of-chapter-quiz]] + + + +### 1. What are the main advantages of using automatic benchmarks for model evaluation? + + + +### 2. Which benchmark specifically tests knowledge across 57 different subjects? + + + +### 3. What is LLM-as-Judge? + + + +### 4. What should be included in a comprehensive evaluation strategy? + + + +### 5. What is a limitation of automatic benchmarks? + + + +### 6. What is the purpose of creating custom evaluation datasets? + + diff --git a/chapters/en/chapter11/6.mdx b/chapters/en/chapter11/6.mdx index d02183d40..093de47d6 100644 --- a/chapters/en/chapter11/6.mdx +++ b/chapters/en/chapter11/6.mdx @@ -1,245 +1,13 @@ -# Evaluation +# Conclusion -With a finetuned model through either SFT or LoRA SFT, we should evaluate it on standard benchmarks. +In this chapter, we explored the essential components of fine-tuning language models: -## Automatic Benchmarks +1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting. -Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. +2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge. -## Understanding Automatic Benchmarks +3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance. -Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. +4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks. -However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. - -## General Knowledge Benchmarks - -MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. - -## Reasoning Benchmarks -BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. - -## Language Understanding - -HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. - -## Alternative Evaluation Approaches - -Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: - -### LLM-as-Judge - -Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. - -### Evaluation Arenas - -Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. - -### Custom Benchmark Suites - -Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. - -## Creating Your Own Evaluation Strategy - -Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. - -While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: - -1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. - -2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? - -3. Develop custom evaluation datasets that reflect your actual use case. This might include: - - Real user queries from your domain - - Common edge cases you've encountered - - Examples of particularly challenging scenarios - -4. Consider implementing a multi-layered evaluation strategy: - - Automated metrics for quick feedback - - Human evaluation for nuanced understanding - - Domain expert review for specialized applications - - A/B testing in controlled environments - -# Implementing Evaluation - -In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. - -LightEval tasks are defined using a specific format: - -``` -{suite}|{task}|{num_few_shot}|{auto_reduce} -``` - -| Parameter | Description | -|-----------|-------------| -| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') | -| `task` | Specific task within the suite (e.g., 'abstract_algebra') | -| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) | -| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) | - -Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. - -## Example Evaluation Pipeline - -Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on set of sub tasks that relate to the domain of medicine. - -Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend: - -```bash -lighteval vllm \ - "pretrained=your-model-name" \ - "mmlu|anatomy|0|0" \ - "mmlu|high_school_biology|0|0" \ - "mmlu|high_school_chemistry|0|0" \ - "mmlu|professional_medicine|0|0" \ - --max_samples 40 \ - --batch_size 1 \ - --output_path "./results" \ - --save_generations true -``` - -Results are displayed in a tabular format showing: - -``` -| Task |Version|Metric|Value | |Stderr| -|----------------------------------------|------:|------|-----:|---|-----:| -|all | |acc |0.3333|± |0.1169| -|leaderboard:mmlu:_average:5 | |acc |0.3400|± |0.1121| -|leaderboard:mmlu:anatomy:5 | 0|acc |0.4500|± |0.1141| -|leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| -``` - -Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information. - - - -✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval. - - - -# End-of-chapter quiz[[end-of-chapter-quiz]] - - - -### 1. What are the main advantages of using automatic benchmarks for model evaluation? - - - -### 2. Which benchmark specifically tests knowledge across 57 different subjects? - - - -### 3. What is LLM-as-Judge? - - - -### 4. What should be included in a comprehensive evaluation strategy? - - - -### 5. What is a limitation of automatic benchmarks? - - - -### 6. What is the purpose of creating custom evaluation datasets? - - +These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation. diff --git a/chapters/en/chapter11/7.mdx b/chapters/en/chapter11/7.mdx index 093de47d6..690547c5f 100644 --- a/chapters/en/chapter11/7.mdx +++ b/chapters/en/chapter11/7.mdx @@ -1,13 +1,16 @@ -# Conclusion +# Exam Time! -In this chapter, we explored the essential components of fine-tuning language models: +It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter. -1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting. +To take the quiz, you will need to follow these steps: -2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge. +1. Sign in to your Hugging Face account. +2. Answer the questions in the quiz. +3. Submit your answers. -3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance. - -4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks. - -These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation. + diff --git a/chapters/en/chapter11/8.mdx b/chapters/en/chapter11/8.mdx deleted file mode 100644 index 690547c5f..000000000 --- a/chapters/en/chapter11/8.mdx +++ /dev/null @@ -1,16 +0,0 @@ -# Exam Time! - -It's time to put your knowledge to the test! We've prepared a short quiz for you to test your understanding of the concepts covered in this chapter. - -To take the quiz, you will need to follow these steps: - -1. Sign in to your Hugging Face account. -2. Answer the questions in the quiz. -3. Submit your answers. - - From 549612b518471e069969b579dee072b745c5d4fa Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Thu, 6 Feb 2025 13:14:00 +0100 Subject: [PATCH 11/30] followinf readthrough: simplify and add more tips --- chapters/en/chapter11/2.mdx | 164 ++++++----- chapters/en/chapter11/3.mdx | 530 +++++++++++++++++------------------- chapters/en/chapter11/4.mdx | 2 + chapters/en/chapter11/5.mdx | 6 +- 4 files changed, 333 insertions(+), 369 deletions(-) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index 8798e1d05..560db674f 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -6,17 +6,29 @@ # Chat Templates +## Introduction Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns. -## Base Models vs Instruct Models + +Chat templates are crucial for: +- Maintaining consistent conversation structure +- Ensuring proper role identification +- Managing context across multiple turns +- Supporting advanced features like tool use + + +## Model Types and Templates +### Base Models vs Instruct Models A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant. To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). -It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template. + +When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. + -## Common Chat Template Formats +### Common Template Formats Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models: @@ -94,25 +106,9 @@ mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False) qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False) ``` -## Understanding Chat Templates - -At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. - -### Basic Chat Template Example - -Here's a basic example of a chat template: - -```sh -<|im_start|>user -Hi there!<|im_end|> -<|im_start|>assistant -Nice to meet you!<|im_end|> -<|im_start|>user -Can I ask a question?<|im_end|> -<|im_start|>assistant -``` +## Working with Templates -### Implementation with Transformers +### Basic Implementation The transformers library provides built-in support for chat templates through the `apply_chat_template()` method: @@ -141,8 +137,7 @@ You are a helpful coding assistant.<|im_end|> Write a Python function to sort a list<|im_end|> ``` -### Advanced Chat Templates - +### Advanced Features Chat templates can handle more complex scenarios, including: 1. **Tool Use**: When models need to interact with external tools or APIs @@ -150,6 +145,32 @@ Chat templates can handle more complex scenarios, including: 3. **Function Calling**: For structured function execution 4. **Multi-turn Context**: For maintaining conversation history + +When implementing advanced features: +- Test thoroughly with your specific model +- Handle errors gracefully +- Monitor token usage carefully +- Document the expected format for each feature + + +For multimodal conversations, chat templates can include image references or base64-encoded images: + +```python +messages = [ + { + "role": "system", + "content": "You are a helpful vision assistant that can analyze images.", + }, + { + "role": "user", + "content": [ + {"type": "text", "text": "What's in this image?"}, + {"type": "image", "image_url": "https://example.com/image.jpg"}, + ], + }, +] +``` + Here's an example of a chat template with tool use: ```python @@ -179,85 +200,56 @@ messages = [ ] ``` -For multimodal conversations, chat templates can include image references or base64-encoded images: - -```python -messages = [ - { - "role": "system", - "content": "You are a helpful vision assistant that can analyze images.", - }, - { - "role": "user", - "content": [ - {"type": "text", "text": "What's in this image?"}, - {"type": "image", "image_url": "https://example.com/image.jpg"}, - ], - }, -] -``` - -## Working with Chat Templates +## Best Practices -When working with chat templates, you have several options for processing the conversation: +### General Guidelines +When working with chat templates, follow these key practices: -1. Apply the template without tokenization to return the raw formatted string -2. Apply the template with tokenization to return the token IDs -3. Add a generation prompt to prepare for model inference +1. **Consistent Formatting**: Always use the same template format throughout your application +2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message +3. **Context Management**: Be mindful of token limits when maintaining conversation history +4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs +5. **Validation**: Validate message structure before sending to the model -The tokenizer's `apply_chat_template()` method handles all these cases through its parameters: + +Common pitfalls to avoid: +- Mixing different template formats in the same application +- Exceeding token limits with long conversation histories +- Not properly escaping special characters in messages +- Forgetting to validate input message structure +- Ignoring model-specific template requirements + -- `tokenize`: Whether to return token IDs (True) or the formatted string (False) -- `add_generation_prompt`: Whether to add a prompt for the model to generate a response +## Hands-on Exercise -## System Messages +Let's practice implementing chat templates with a real-world example. -System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example: + +Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml format: +1. Load the dataset: ```python -system_message = { - "role": "system", - "content": "You are a professional customer service agent. Always be polite, clear, and helpful.", -} +from datasets import load_dataset +dataset = load_dataset("HuggingFaceTB/smoltalk") ``` -## Conversations - -Chat templates can maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations: - +2. Create a processing function: ```python -conversation = [ - {"role": "user", "content": "I need help with my order"}, - { - "role": "assistant", - "content": "I'd be happy to help. Could you provide your order number?", - }, - {"role": "user", "content": "It's ORDER-123"}, -] +def convert_to_chatml(example): + return { + "messages": [ + {"role": "user", "content": example["input"]}, + {"role": "assistant", "content": example["output"]} + ] + } ``` -## Best Practices - -When working with chat templates, consider these best practices: - -1. **Consistent Formatting**: Always use the same template format throughout your application -2. **Clear Role Definition**: Clearly specify roles (system, user, assistant, tool) for each message -3. **Context Management**: Be mindful of token limits when maintaining conversation history -4. **Error Handling**: Include proper error handling for tool calls and multimodal inputs -5. **Validation**: Validate message structure before sending to the model - - - -✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file. - -For this exercise, you'll need to: -1. Load the dataset using the Hugging Face datasets library -2. Create a processing function that converts the samples into the correct chat format -3. Apply the chat template using the tokenizer's methods +3. Apply the chat template using your chosen model's tokenizer +Remember to validate your output format matches your target model's requirements! -## Resources +## Additional Resources - [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) - [Transformers Documentation](https://huggingface.co/docs/transformers) diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index f84f77557..fb401c0bb 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -6,137 +6,45 @@ # Supervised Fine-Tuning -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. - -Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections. - -## Understanding Supervised Fine-Tuning - -Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. - -SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. - -## When to Use Supervised Fine-Tuning - -The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. - -Two core reasons to use SFT are: - -1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs. - -2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise. - -## Quiz - -### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)? - - - -### 2. Which of the following are valid reasons to use SFT? - - - -### 3. What is required for effective Supervised Fine-Tuning? - - +This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can +adapt the model to perform specific tasks more effectively. -### 4. How does SFT relate to chat templates? +Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. - +## When to Use SFT +The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors: -### 5. What distinguishes SFT from pre-training? +### Template Control +SFT allows precise control over the model's output structure. This is particularly valuable when you need the model to: +1. Generate responses in a specific chat template format +2. Follow strict output schemas +3. Maintain consistent styling across responses - +### Domain Adaptation +When working in specialized domains, SFT helps align the model with domain-specific requirements by: +1. Teaching domain terminology and concepts +2. Enforcing professional standards +3. Handling technical queries appropriately +4. Following industry-specific guidelines -# Fine-Tuning Process with SFTTrainer in Transformers Reinforcement Learning + +Before starting SFT, evaluate whether your use case requires: +- Precise output formatting +- Domain-specific knowledge +- Consistent response patterns +- Adherence to specific guidelines -In this section, we'll walk through the process of fine-tuning a model using the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. +This evaluation will help determine if SFT is the right approach for your needs. + ## Dataset Preparation -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. Your dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. The data format needs to be compatible with the model's chat template. Below is an example of a dataset that can be used for supervised fine-tuning. +The supervised fine-tuning process requires a task-specific dataset structured with input-output pairs. Each pair should consist of: +1. An input prompt +2. The expected model response +3. Any additional context or metadata + +The data format must be compatible with your model's chat template. Here's an example dataset suitable for supervised fine-tuning: -## Understanding Training Dynamics - -When fine-tuning language models, understanding the training dynamics is crucial for monitoring progress and ensuring successful adaptation. Let's look at how to interpret the training process through loss curves. - -### Loss Patterns - -The training loss curve typically follows a characteristic pattern. Initially, you'll observe a sharp drop in loss as the model begins adapting to the new data distribution, task objectives, and chat template. This early phase is crucial as it indicates whether the model is successfully learning from the training data. - -### The Path to Convergence - -As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset. - -### Monitoring Training Progress - -
- Training and validation loss curves showing healthy convergence -
- -The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. - -### Warning Signs to Watch For - -Several patterns in the loss curves can indicate potential issues: - -1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: - - Reducing the model size or training time - - Adding regularization - - Increasing the dataset size - - Using techniques like early stopping - -2. If the loss doesn't show significant improvement, the model might be: - - Learning too slowly (try increasing the learning rate) - - Struggling with the task (check data quality and task complexity) - - Hitting architecture limitations (consider a different model) - -3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: - - The model performs poorly on new, similar examples - - The outputs lack diversity - - The responses are too similar to training examples - - - -Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular qualitative evaluation of the model's responses helps catch issues that metrics alone might miss. - - - ## Training Configuration -We will configure SFT trainer with the following parameters: +### Parameters + +The SFTTrainer configuration requires consideration of several parameters that control the training process: | Parameter | Description | |-----------|-------------| @@ -206,175 +70,283 @@ We will configure SFT trainer with the following parameters: | logging_steps | Frequency of logging training metrics and progress (e.g., every 10 steps) | | save_strategy | When to save model checkpoints (e.g., "epoch" saves after each epoch, "steps" saves every N steps) | -In general, start with a small number of epochs and data using the default parameters in `trl.SFTTrainer`. As you get more comfortable with the process, you can experiment with different configurations to see how they affect the model's performance. +### Core Parameters Explained -## Training and Evaluation +1. **Training Duration Parameters**: + - `num_train_epochs`: Controls total training duration + - `max_steps`: Alternative to epochs, sets maximum number of training steps + - More epochs allow better learning but risk overfitting -Fortunately, the `SFTTrainer` class handles the training and evaluation process for us. We just need to pass in the appropriate parameters and call the `train()` method. For the sake of education, let's break down what happens behind the scenes. +2. **Batch Size Parameters**: + - `per_device_train_batch_size`: Determines memory usage and training stability + - `gradient_accumulation_steps`: Enables larger effective batch sizes + - Larger batches provide more stable gradients but require more memory -- Iterating over the dataset -- Computing the loss -- Updating the model's parameters -- Regular evaluation on a validation set +3. **Learning Rate Parameters**: + - `learning_rate`: Controls size of weight updates + - `warmup_ratio`: Portion of training used for learning rate warmup + - Too high can cause instability, too low results in slow learning -Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. +4. **Monitoring Parameters**: + - `logging_steps`: Frequency of metric logging + - `eval_steps`: How often to evaluate on validation data + - `save_steps`: Frequency of model checkpoint saves -## `SFTTrainer` from Transformer Reinforcement Learning + +Start with conservative values and adjust based on monitoring: +- Begin with 1-3 epochs +- Use smaller batch sizes initially +- Monitor validation metrics closely +- Adjust learning rate if training is unstable + -Transformer Reinforcement Learning (TRL) is a toolkit used to train transformer language models using reinforcement learning (RL) and post-training techniques. Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). But we'll focus on SFT in this chapter. +## Implementation with TRL -Here's a basic simplified example of how to use `SFTTrainer` to fine-tune a model. We'll expand on this example in the next few sections, but for now let's just focus on the basics. +We will use the `SFTTrainer` class from the Transformers Reinforcement Learning (TRL) library, which is built on top of the `transformers` library. Here's a complete example using the TRL library: ```python from datasets import load_dataset from trl import SFTConfig, SFTTrainer +import torch + +# Set device +device = "cuda" if torch.cuda.is_available() else "cpu" -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") +# Load dataset +dataset = load_dataset("HuggingFaceTB/smoltalk") +# Configure trainer training_args = SFTConfig( - max_seq_length=512, - output_dir="/tmp", + output_dir="./sft_output", + max_steps=1000, + per_device_train_batch_size=4, + learning_rate=5e-5, + logging_steps=10, + save_steps=100, + evaluation_strategy="steps", + eval_steps=50 ) +# Initialize trainer trainer = SFTTrainer( - model_name="HuggingFaceTB/SmolLM2-135M", - train_dataset=dataset, + model=model, args=training_args, + train_dataset=dataset["train"], + eval_dataset=dataset["test"], + tokenizer=tokenizer ) + +# Start training trainer.train() ``` -Just like in `transformers`, we work through the following steps: +## Monitoring Training Progress -1. Load the dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model +### Understanding Loss Patterns - +Training loss typically follows three distinct phases: +1. Initial Sharp Drop: Rapid adaptation to new data distribution +2. Gradual Stabilization: Learning rate slows as model fine-tunes +3. Convergence: Loss values stabilize, indicating training completion + +SFTTrainer Training + +### Metrics to Monitor -✏️ **Try it out!** Use the `HuggingFaceTB/smoltalk` dataset to fine-tune a `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. +Effective monitoring involves tracking quantitative metrics, and evaluating qualitative metrics. Available metrics are: -For this exercise, you'll need to: -1. Load and prepare your chosen dataset -2. Configure the SFTTrainer with appropriate parameters -3. Train the model and monitor its progress -4. Save and evaluate the fine-tuned model +- Training loss +- Validation loss +- Learning rate progression +- Gradient norms + +Watch for these warning signs during training: +1. Validation loss increasing while training loss decreases (overfitting) +2. No significant improvement in loss values (underfitting) +3. Extremely low loss values (potential memorization) +4. Inconsistent output formatting (template learning issues) -# Supervised Fine-Tuning with SFTTrainer +### The Path to Convergence -Let's dive into the `SFTTrainer` class and see how it works. We'll also see how to use it to fine-tune a model. We will demonstrate how to fine-tune the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model. +As training progresses, the loss curve should gradually stabilize. The key indicator of healthy training is a small gap between training and validation loss, suggesting +the model is learning generalizable patterns rather than memorizing specific examples. The absolute loss values will vary depending on your task and dataset. -## Load the base model +### Monitoring Training Progress -Here we'll load the base model and tokenizer. We'll also set up the chat format for the model. +
+ Training and validation loss curves 
+    showing healthy convergence +
-```python -# Import necessary libraries -from transformers import AutoModelForCausalLM, AutoTokenizer -from datasets import load_dataset -from trl import SFTConfig, SFTTrainer, setup_chat_format -import torch +The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern +indicates the model is learning effectively while maintaining generalization ability. -# Set the device to use for training -device = ( - "cuda" - if torch.cuda.is_available() - else "mps" if torch.backends.mps.is_available() else "cpu" -) +### Warning Signs to Watch For -# Load the model and tokenizer -model = AutoModelForCausalLM.from_pretrained( - pretrained_model_name_or_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" -).to(device) -tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name) +Several patterns in the loss curves can indicate potential issues: -# Set up the chat format -model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer) -``` +1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: + - Reducing the model size or training time + - Adding regularization + - Increasing the dataset size + - Using techniques like early stopping -## Generate with the base model +2. If the loss doesn't show significant improvement, the model might be: + - Learning too slowly (try increasing the learning rate) + - Struggling with the task (check data quality and task complexity) + - Hitting architecture limitations (consider a different model) -First we will try out the base model which does not have a chat template. Later, we can compare the results of the base model with the fine-tuned model. +3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: + - The model performs poorly on new, similar examples + - The outputs lack diversity + - The responses are too similar to training examples -```python -# Let's test the base model before training -prompt = "Write a haiku about programming" + -# Format with template -messages = [{"role": "user", "content": prompt}] -formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False) +Monitor both the loss values and the model's actual outputs during training. Sometimes the loss can look good while the model develops unwanted behaviors. Regular +qualitative evaluation of the model's responses helps catch issues that metrics alone might miss. -# Generate response -inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) -outputs = model.generate(**inputs, max_new_tokens=100) -``` + -## Dataset Preparation +## Evaluation after SFT -We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model. +In section [11.4](/en/chapter11/4) we will learn how to evaluate the model using benchmark datasets. For now, we will focus on the qualitative evaluation of the model. -**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,. +After completing SFT, consider these follow-up actions: -```python -dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations") -``` +1. Evaluate the model thoroughly on held-out test data +2. Validate template adherence across various inputs +3. Test domain-specific knowledge retention +4. Monitor real-world performance metrics -## Configuring the SFTTrainer + +Document your training process, including: +- Dataset characteristics +- Training parameters +- Performance metrics +- Known limitations +This documentation will be valuable for future model iterations. + -The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources. +## Additional Resources -```python -# Configure the SFTTrainer -sft_config = SFTConfig( - output_dir="./sft_output", - max_steps=1000, # Adjust based on dataset size and desired training duration - per_device_train_batch_size=4, # Set according to your GPU memory capacity - learning_rate=5e-5, # Common starting point for fine-tuning - logging_steps=10, # Frequency of logging training metrics - save_steps=100, # Frequency of saving model checkpoints - evaluation_strategy="steps", # Evaluate the model at regular intervals - eval_steps=50, # Frequency of evaluation - use_mps_device=( - True if device == "mps" else False - ), # Use MPS for mixed precision training - hub_model_id=finetune_name, # Set a unique name for your model -) +- [TRL Documentation](https://huggingface.co/docs/trl) +- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) +- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training) -# Initialize the SFTTrainer -trainer = SFTTrainer( - model=model, - args=sft_config, - train_dataset=ds["train"], - tokenizer=tokenizer, - eval_dataset=ds["test"], -) -``` +## Quiz -## Training the model +### 1. What parameters control the training duration in SFT? -With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss. + -```python -trainer.train() -``` +### 2. Which pattern in the loss curves indicates potential overfitting? + + +### 3. What is gradient_accumulation_steps used for? + -SFTTrainer Training +### 4. What should you monitor during SFT training? -## 💐 Nice work! + + +### 5. What indicates healthy convergence during training? -This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out: + -- Try this notebook on a harder difficulty -- Review a colleagues PR -- Improve the course material via an Issue or PR. +## 💐 Nice work! +You've learned how to fine-tune models using SFT! To continue your learning: +1. Try the notebook with different parameters +2. Experiment with other datasets +3. Contribute improvements to the course material -## Resources +## Additional Resources - [TRL Documentation](https://huggingface.co/docs/trl) -- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) \ No newline at end of file +- [SFT Examples Repository](https://github.com/huggingface/trl/tree/main/examples/sft) +- [Fine-tuning Best Practices](https://huggingface.co/docs/transformers/training) \ No newline at end of file diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx index 28583bb58..48c3bebbf 100644 --- a/chapters/en/chapter11/4.mdx +++ b/chapters/en/chapter11/4.mdx @@ -63,7 +63,9 @@ Let's walk through the LoRA configuration and key parameters. | `bias` | Controls training of bias terms. Options are "none", "all", or "lora_only". "none" is most common for memory efficiency. | | `target_modules` | Specifies which model modules to apply LoRA to. Can be "all-linear" or specific modules like "q_proj,v_proj". More modules enable greater adaptability but increase memory usage. | + When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. + ## Using TRL with PEFT diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index d02183d40..471e48751 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -39,9 +39,7 @@ Platforms like Anthropic's Constitutional AI Arena allow models to interact and Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. -## Creating Your Own Evaluation Strategy - -Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. +## Custom Evaluation While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: @@ -60,7 +58,7 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only - Domain expert review for specialized applications - A/B testing in controlled environments -# Implementing Evaluation +## Implementing Custom Evaluations In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. From 881865e67bf439cdce5ac5f714d822bcd06c4941 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Thu, 6 Feb 2025 16:07:02 +0100 Subject: [PATCH 12/30] format code blocks --- chapters/en/chapter11/2.mdx | 3 ++- chapters/en/chapter11/3.mdx | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index 560db674f..509b33e7d 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -230,6 +230,7 @@ Follow these steps to convert the `HuggingFaceTB/smoltalk` dataset into chatml f 1. Load the dataset: ```python from datasets import load_dataset + dataset = load_dataset("HuggingFaceTB/smoltalk") ``` @@ -239,7 +240,7 @@ def convert_to_chatml(example): return { "messages": [ {"role": "user", "content": example["input"]}, - {"role": "assistant", "content": example["output"]} + {"role": "assistant", "content": example["output"]}, ] } ``` diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index fb401c0bb..351be5182 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -124,7 +124,7 @@ training_args = SFTConfig( logging_steps=10, save_steps=100, evaluation_strategy="steps", - eval_steps=50 + eval_steps=50, ) # Initialize trainer @@ -133,7 +133,7 @@ trainer = SFTTrainer( args=training_args, train_dataset=dataset["train"], eval_dataset=dataset["test"], - tokenizer=tokenizer + tokenizer=tokenizer, ) # Start training From a386bbf400281e99a9b91891491354ebffe9d848 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:15:33 +0000 Subject: [PATCH 13/30] suggestions in intro page --- chapters/en/chapter11/1.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx index b5e75efc6..99c8fe115 100644 --- a/chapters/en/chapter11/1.mdx +++ b/chapters/en/chapter11/1.mdx @@ -1,6 +1,6 @@ # Supervised Fine-Tuning -This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. We will separate this chapter into three sections: +This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. Majority of the LLMs that people interact with on platforms like ChatGPT go through some form of SFT because it's a robust way to adapt models to common use cases. We will separate this chapter into three sections: ## 1️⃣ Chat Templates @@ -12,7 +12,7 @@ Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained lang ## 3️⃣ Low Rank Adaptation (LoRA) -Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge. +Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge. One of the key benefits of LoRA is the significant memory savings it offers, making it possible to fine-tune large models on hardware with limited resources. ## 4️⃣ Evaluation @@ -29,5 +29,5 @@ Evaluation is a crucial step in the fine-tuning process. It allows us to measure - [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) - [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) - [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) -- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma) +- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://github.com/huggingface/alignment-handbook) - [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) From 6cefbc91bd65c0840b1d58cdc112102827e0da6e Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:25:25 +0000 Subject: [PATCH 14/30] respond to suggestions on chat templates page --- chapters/en/chapter11/2.mdx | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index 509b33e7d..d5114f5f1 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -20,12 +20,14 @@ Chat templates are crucial for: ## Model Types and Templates ### Base Models vs Instruct Models -A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant. +A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, [`SmolLM2-135M`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a base model, while [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) is its instruction-tuned variant. + +Instuction tuneds models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling. To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). -When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. +When using an instruct model, always verify you're using the correct chat template format. Using the wrong template can result in poor model performance or unexpected behavior. The easiest way to ensure this is to check the model tokenizer configuration on the Hub. For example, the `SmolLM2-135M-Instruct` model uses [this configuration](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/main/tokenizer_config.json). ### Common Template Formats @@ -43,15 +45,7 @@ messages = [ ] ``` -This is using the `mistral` template format: - -```sh -[INST] You are a helpful assistant. [/INST] -Hi! How can I help you today? -[INST] Hello! [/INST] -``` - -This is the chat template for a Qwen 2 model: +This is the ChatML template used in models like SmolLM2 and Qwen 2: ```sh <|im_start|>system @@ -65,6 +59,14 @@ What's the weather?<|im_end|> <|im_start|>assistant ``` +This is using the `mistral` template format: + +```sh +[INST] You are a helpful assistant. [/INST] +Hi! How can I help you today? +[INST] Hello! [/INST] +``` + Key differences between these formats include: 1. **System Message Handling**: - Llama 2 wraps system messages in `<>` tags From 3b7cc5a6b011ed0d755f408843960c492e31db5e Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:27:35 +0000 Subject: [PATCH 15/30] Update chapters/en/chapter11/3.mdx Co-authored-by: vb --- chapters/en/chapter11/3.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 351be5182..71a667db2 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -6,7 +6,7 @@ # Supervised Fine-Tuning -This page provided a step-by-step guide to fine-tuning the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model using the `SFTTrainer`. By following these steps, you can +This page provided a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. From c6800f1f23d1add7be79e05fa0ff03bec01ec8d1 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:27:48 +0000 Subject: [PATCH 16/30] Update chapters/en/chapter11/5.mdx Co-authored-by: vb --- chapters/en/chapter11/5.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index 471e48751..b9f15ca2e 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -60,7 +60,7 @@ While standard benchmarks provide a useful baseline, they shouldn't be your only ## Implementing Custom Evaluations -In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. +In this section, we will implement evaluation for our finetuned model. We can use [`lighteval`](https://github.com/huggingface/lighteval) to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation. LightEval tasks are defined using a specific format: From 844d7159c14b71d58f47fa85fb36d6d3e1f1d019 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:27:57 +0000 Subject: [PATCH 17/30] Update chapters/en/chapter11/5.mdx Co-authored-by: vb --- chapters/en/chapter11/5.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/en/chapter11/5.mdx b/chapters/en/chapter11/5.mdx index b9f15ca2e..4734ac720 100644 --- a/chapters/en/chapter11/5.mdx +++ b/chapters/en/chapter11/5.mdx @@ -17,7 +17,7 @@ However, it's crucial to understand that benchmark performance doesn't always tr MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. ## Reasoning Benchmarks -BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. +[BBH](https://huggingface.co/datasets/lukaemon/bbh) (Big Bench Hard) and [GSM8K](https://huggingface.co/datasets/openai/gsm8k) focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. ## Language Understanding From 3f9815c3e84a1264b2d4537114705f9e57ad7130 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:33:56 +0000 Subject: [PATCH 18/30] respond to suggestions in SFT page --- chapters/en/chapter11/3.mdx | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 71a667db2..96d98d7bf 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -6,10 +6,9 @@ # Supervised Fine-Tuning -This page provided a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can -adapt the model to perform specific tasks more effectively. +Supervised Fine-Tuning (SFT) is a process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples. +This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. ## When to Use SFT The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors: @@ -180,8 +179,7 @@ the model is learning generalizable patterns rather than memorizing specific exa showing healthy convergence" width="600"/> -The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern -indicates the model is learning effectively while maintaining generalization ability. +The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. ### Warning Signs to Watch For From c47a5a50c50cfb194a71595cec9259977b1895a3 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Tue, 11 Feb 2025 21:58:00 +0000 Subject: [PATCH 19/30] improve loss illustrations on sft page --- chapters/en/chapter11/3.mdx | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 96d98d7bf..0e8f78be9 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -174,29 +174,29 @@ the model is learning generalizable patterns rather than memorizing specific exa ### Monitoring Training Progress -
- Training and validation loss curves 
-    showing healthy convergence -
- The graph above shows a typical training progression. Notice how both training and validation loss decrease sharply at first, then gradually level off. This pattern indicates the model is learning effectively while maintaining generalization ability. ### Warning Signs to Watch For -Several patterns in the loss curves can indicate potential issues: +Several patterns in the loss curves can indicate potential issues. Below we illustrate common warning signs and solutions that we can consider. + +SFTTrainer Training -1. If the validation loss starts increasing while training loss continues to decrease, your model is likely overfitting to the training data. Consider: - - Reducing the model size or training time - - Adding regularization +If the validation loss decreases at a significantly slower rate than training loss, your model is likely overfitting to the training data. Consider: + - Reducing the training steps - Increasing the dataset size - - Using techniques like early stopping + - Validating dataset quality and diversity -2. If the loss doesn't show significant improvement, the model might be: +SFTTrainer Training + +If the loss doesn't show significant improvement, the model might be: - Learning too slowly (try increasing the learning rate) - Struggling with the task (check data quality and task complexity) - Hitting architecture limitations (consider a different model) -3. Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: +SFTTrainer Training + +Extremely low loss values could suggest memorization rather than learning. This is particularly concerning if: - The model performs poorly on new, similar examples - The outputs lack diversity - The responses are too similar to training examples From d66fa86892c272a9ed3a2a5cf816fa6f0fa06ee4 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 12 Feb 2025 11:21:54 +0000 Subject: [PATCH 20/30] respond to feedback in chat template --- chapters/en/chapter11/2.mdx | 67 +++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 36 deletions(-) diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx index d5114f5f1..7b8d19dfb 100644 --- a/chapters/en/chapter11/2.mdx +++ b/chapters/en/chapter11/2.mdx @@ -1,13 +1,14 @@ # Chat Templates ## Introduction -Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns. + +Chat templates are essential for structuring interactions between language models and users. Whether you're building a simple chatbot or a complex AI agent, understanding how to properly format your conversations is crucial for getting the best results from your model. In this guide, we'll explore what chat templates are, why they matter, and how to use them effectively. Chat templates are crucial for: @@ -22,7 +23,7 @@ Chat templates are crucial for: ### Base Models vs Instruct Models A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, [`SmolLM2-135M`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a base model, while [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) is its instruction-tuned variant. -Instuction tuneds models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling. +Instuction tuned models are trained to follow a specific conversational structure, making them more suitable for chatbot applications. Moreover, instruct models can handle complex interactions, including tool use, multimodal inputs, and function calling. To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). @@ -32,7 +33,7 @@ When using an instruct model, always verify you're using the correct chat templa ### Common Template Formats -Different models use different chat template formats. To illustrate this, let's look at a few chat templates. Here's how the same conversation would be formatted for different models: +Before diving into specific implementations, it's important to understand how different models expect their conversations to be formatted. Let's explore some common template formats using a simple example conversation: We'll use the following conversation structure for all examples: @@ -55,8 +56,7 @@ Hello!<|im_end|> <|im_start|>assistant Hi! How can I help you today?<|im_end|> <|im_start|>user -What's the weather?<|im_end|> -<|im_start|>assistant +What's the weather?<|im_start|>assistant ``` This is using the `mistral` template format: @@ -87,15 +87,15 @@ Key differences between these formats include: - Mistral uses `` and `` for turn boundaries - Qwen uses role-specific start/end tokens -The transformers library handles these differences through model-specific chat templates. When you load a tokenizer, it automatically uses the correct template for that model: +Understanding these differences is key to working with various models. Let's look at how the transformers library helps us handle these variations automatically: ```python from transformers import AutoTokenizer # These will use different templates automatically -llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") mistral_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") qwen_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat") +smol_tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") messages = [ {"role": "system", "content": "You are a helpful assistant."}, @@ -103,44 +103,40 @@ messages = [ ] # Each will format according to its model's template -llama_chat = llama_tokenizer.apply_chat_template(messages, tokenize=False) mistral_chat = mistral_tokenizer.apply_chat_template(messages, tokenize=False) qwen_chat = qwen_tokenizer.apply_chat_template(messages, tokenize=False) +smol_chat = smol_tokenizer.apply_chat_template(messages, tokenize=False) ``` -## Working with Templates - -### Basic Implementation - -The transformers library provides built-in support for chat templates through the `apply_chat_template()` method: - -```python -from transformers import AutoTokenizer +
+Click to see template examples -tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct") +Qwen 2 and SmolLM2 ChatML template: -messages = [ - {"role": "system", "content": "You are a helpful coding assistant."}, - {"role": "user", "content": "Write a Python function to sort a list"}, -] - -# Apply the chat template -formatted_chat = tokenizer.apply_chat_template( - messages, tokenize=False, add_generation_prompt=True -) +```sh +<|im_start|>system +You are a helpful assistant.<|im_end|> +<|im_start|>user +Hello!<|im_end|> +<|im_start|>assistant +Hi! How can I help you today?<|im_end|> +<|im_start|>user +What's the weather?<|im_start|>assistant ``` -This will return a formatted string that looks like: +Mistral template: ```sh -<|im_start|>system -You are a helpful coding assistant.<|im_end|> -<|im_start|>user -Write a Python function to sort a list<|im_end|> +[INST] You are a helpful assistant. [/INST] +Hi! How can I help you today? +[INST] Hello! [/INST] ``` +
+ + ### Advanced Features -Chat templates can handle more complex scenarios, including: +Chat templates can handle more complex scenarios beyond just conversational interactions, including: 1. **Tool Use**: When models need to interact with external tools or APIs 2. **Multimodal Inputs**: For handling images, audio, or other media types @@ -149,9 +145,8 @@ Chat templates can handle more complex scenarios, including: When implementing advanced features: -- Test thoroughly with your specific model -- Handle errors gracefully -- Monitor token usage carefully +- Test thoroughly with your specific model. Vision and tool use template are particularly diverse. +- Monitor token usage carefully between each feature and model. - Document the expected format for each feature From 21c8dd1958d1d1e117431f4caf108147a844d7ab Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Wed, 12 Feb 2025 11:36:07 +0000 Subject: [PATCH 21/30] respond to feedback on sft section --- chapters/en/chapter11/3.mdx | 55 ++++++++++++++----------------------- 1 file changed, 20 insertions(+), 35 deletions(-) diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx index 0e8f78be9..16cea0fac 100644 --- a/chapters/en/chapter11/3.mdx +++ b/chapters/en/chapter11/3.mdx @@ -1,7 +1,7 @@ # Supervised Fine-Tuning @@ -11,7 +11,9 @@ Supervised Fine-Tuning (SFT) is a process for adapting pre-trained language mode This page provides a step-by-step guide to fine-tuning the [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using the [`SFTTrainer`](https://huggingface.co/docs/trl/en/sft_trainer). By following these steps, you can adapt the model to perform specific tasks more effectively. ## When to Use SFT -The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors: + +Before diving into implementation, it's important to understand when SFT is the right choice for your project. The supervised structure of the task enables models to learn specific output formats and behaviors. For example, SFT can teach a model to consistently use chat templates or follow domain-specific guidelines. The decision to use Supervised Fine-Tuning depends on two primary factors: +factors: ### Template Control SFT allows precise control over the model's output structure. This is particularly valuable when you need the model to: @@ -43,7 +45,7 @@ The supervised fine-tuning process requires a task-specific dataset structured w 2. The expected model response 3. Any additional context or metadata -The data format must be compatible with your model's chat template. Here's an example dataset suitable for supervised fine-tuning: +The quality of your training data is crucial for successful fine-tuning. Let's look at how to prepare and validate your dataset: + + +## Code Quiz + +In this quiz, you will be asked to write code to complete a task. We'll test you on the code you've studied in the course from libraries like `datasets`, `transformers`, `peft`, and `TRL`. + + \ No newline at end of file