Explanation on loss function added

zotroneneis · zotroneneis · commit 755b1e9d5372 · 2020-12-16T08:45:51.000+01:00
diff --git a/linear_regression.ipynb b/linear_regression.ipynb
@@ -26,14 +26,22 @@
    "metadata": {},
    "source": [
     "### Training\n",
-    "A linear regression model can be trained using either  \n",
+    "A linear regression is typically trained using the (mean) squared error (MSE) as a loss function. This computes a [least squares](https://en.wikipedia.org/wiki/Least_squares) solution. The mean squared error minimizes the sum of squared residuals (= difference between true label $y$ and the model prediction $\\hat{y}$):\n",
+    "\n",
+    "$J(\\boldsymbol{w},b) = \\frac{1}{m} \\sum_{i=1}^m \\Big(\\hat{y}^{(i)} - y^{(i)} \\Big)^2$\n",
+    "\n",
+    "\n",
+    "Why do we use the squared error as a loss function? In short, using the MSE corresponds to computing a maximum likelihood solution to our problem. For a more detailed explanation [look here](https://datascience.stackexchange.com/questions/10188/why-do-cost-functions-use-the-square-error).\n",
+    "\n",
+    "With the MSE at hand a linear regression model can be trained using either  \n",
     "a) gradient descent or  \n",
     "b) the normal equation (closed-form solution): $\\boldsymbol{w} = (\\boldsymbol{X}^T \\boldsymbol{X})^{-1} \\boldsymbol{X}^T \\boldsymbol{y}$ \n",
     "\n",
     "where $\\boldsymbol{X}$ is a matrix of shape $(m, n_{features})$ that holds all training examples.  \n",
     "The normal equation requires computing the inverse of $\\boldsymbol{X}^T \\boldsymbol{X}$. The computational complexity of this operation lies between $O(n_{features}^{2.4}$) and $O(n_{features}^3$) (depending on the implementation).\n",
     "Therefore, if the number of features in the training set is large, the normal equation will get very slow. \n",
     "\n",
+    "\n",
     "* * *\n",
     "The training procedure of a linear regression model has different steps. In the beginning (step 0) the model parameters are initialized. The other steps (see below) are repeated for a specified number of training iterations or until the parameters have converged.\n",
     "\n",
@@ -471,7 +479,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.8.3"
   },
   "toc": {
    "nav_menu": {},