Wrote the state_dict tutorial up to the load_state_dict() part. Did not go into details of tensor serialization in C++ etc.

motus · motus · commit fabfeee99edf · 2018-11-27T19:34:32.000-08:00
diff --git a/load/state_dict.ipynb b/load/state_dict.ipynb
@@ -27,7 +27,7 @@
    "source": [
     "## Loading PyTorch model\n",
     "\n",
-    "Here we'll assume that we already have the file with the saved MNIST model with all default hyperparameters and trained for 10 epochs. Let's load it."
+    "Here we'll assume that we already have the file with the saved MNIST model with all default hyperparameters and trained for 10 epochs. Loading it is simple:"
    ]
   },
   {
@@ -59,7 +59,141 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "So the `torch.load()` function just reads back the dictionary that was passed to `torch.save()`, and for basic Python types it is not different from Python standard [pickle](https://docs.python.org/3.5/library/pickle.html) module (in fact, it *is* a pickle)."
+    "So the `torch.load()` function just reads back the dictionary that was passed to `torch.save()`, and for basic Python types it is not different from Python standard [`pickle`](https://docs.python.org/3.5/library/pickle.html) module (in fact, it *is* a pickle). The most interesting part here are the model's and optimizer's parameters, as returned from [`torch.nn.Module.state_dict()`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.state_dict) method. Let's take a closer look."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model_params: <class 'collections.OrderedDict'> \n",
+      "\n",
+      "conv1.weight: <class 'torch.Tensor'> torch.Size([10, 1, 5, 5])\n",
+      "  conv1.bias: <class 'torch.Tensor'> torch.Size([10])\n",
+      "conv2.weight: <class 'torch.Tensor'> torch.Size([20, 10, 5, 5])\n",
+      "  conv2.bias: <class 'torch.Tensor'> torch.Size([20])\n",
+      "  fc1.weight: <class 'torch.Tensor'> torch.Size([50, 320])\n",
+      "    fc1.bias: <class 'torch.Tensor'> torch.Size([50])\n",
+      "  fc2.weight: <class 'torch.Tensor'> torch.Size([10, 50])\n",
+      "    fc2.bias: <class 'torch.Tensor'> torch.Size([10])\n",
+      "\n",
+      "  conv1.bias: tensor([ 0.0272, -0.0762, -0.0617,  0.0235,  0.1745,  0.0320,  0.0871,  0.0674,\n",
+      "        -0.0222, -0.0541])\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_params = model_state['model_state_dict']\n",
+    "print(\"model_params:\", type(model_params), \"\\n\")\n",
+    "\n",
+    "for (key, val) in model_params.items():\n",
+    "    print(\"%12s: %s %s\" % (key, type(val), val.size()))\n",
+    "    \n",
+    "print(\"\\n%12s: %s\" % (\"conv1.bias\", model_params['conv1.bias']))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "That is, `.state_dict()` produces an `OrderedDict` of tensors, and uses for keys names of the variables and their parameters.\n",
+    "\n",
+    "Now we need to populate the actual model's parameters (on CUDA or CPU) with that data. For that, we have to use the method [`torch.nn.Module.load_state_dict()`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module.load_state_dict). Unfortunately, it won't recreate the model's topology for us. We have to use the code from [MNIST example](https://github.com/pytorch/examples/blob/master/mnist/main.py) to build it explicitly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "class Net(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(Net, self).__init__()\n",
+    "        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)\n",
+    "        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)\n",
+    "        self.conv2_drop = nn.Dropout2d()\n",
+    "        self.fc1 = nn.Linear(320, 50)\n",
+    "        self.fc2 = nn.Linear(50, 10)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = F.relu(F.max_pool2d(self.conv1(x), 2))\n",
+    "        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n",
+    "        x = x.view(-1, 320)\n",
+    "        x = F.relu(self.fc1(x))\n",
+    "        x = F.dropout(x, training=self.training)\n",
+    "        x = self.fc2(x)\n",
+    "        return F.log_softmax(x, dim=1)\n",
+    "\n",
+    "model = Net()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Remember, that initially all parameters are initialized with random values, e.g."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Parameter containing:\n",
+       "tensor([ 0.1578,  0.1650, -0.1272, -0.1976,  0.0318, -0.1246, -0.0474, -0.0620,\n",
+       "         0.1829, -0.1198], requires_grad=True)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.conv1.bias"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can populate them with data from the file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Parameter containing:\n",
+       "tensor([ 0.0272, -0.0762, -0.0617,  0.0235,  0.1745,  0.0320,  0.0871,  0.0674,\n",
+       "        -0.0222, -0.0541], requires_grad=True)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model.load_state_dict(model_params)\n",
+    "\n",
+    "model.conv1.bias"
    ]
   },
   {