Skip to content

Commit 755b1e9

Browse files
committed
Explanation on loss function added
1 parent c21753d commit 755b1e9

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

linear_regression.ipynb

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,22 @@
2626
"metadata": {},
2727
"source": [
2828
"### Training\n",
29-
"A linear regression model can be trained using either \n",
29+
"A linear regression is typically trained using the (mean) squared error (MSE) as a loss function. This computes a [least squares](https://en.wikipedia.org/wiki/Least_squares) solution. The mean squared error minimizes the sum of squared residuals (= difference between true label $y$ and the model prediction $\\hat{y}$):\n",
30+
"\n",
31+
"$J(\\boldsymbol{w},b) = \\frac{1}{m} \\sum_{i=1}^m \\Big(\\hat{y}^{(i)} - y^{(i)} \\Big)^2$\n",
32+
"\n",
33+
"\n",
34+
"Why do we use the squared error as a loss function? In short, using the MSE corresponds to computing a maximum likelihood solution to our problem. For a more detailed explanation [look here](https://datascience.stackexchange.com/questions/10188/why-do-cost-functions-use-the-square-error).\n",
35+
"\n",
36+
"With the MSE at hand a linear regression model can be trained using either \n",
3037
"a) gradient descent or \n",
3138
"b) the normal equation (closed-form solution): $\\boldsymbol{w} = (\\boldsymbol{X}^T \\boldsymbol{X})^{-1} \\boldsymbol{X}^T \\boldsymbol{y}$ \n",
3239
"\n",
3340
"where $\\boldsymbol{X}$ is a matrix of shape $(m, n_{features})$ that holds all training examples. \n",
3441
"The normal equation requires computing the inverse of $\\boldsymbol{X}^T \\boldsymbol{X}$. The computational complexity of this operation lies between $O(n_{features}^{2.4}$) and $O(n_{features}^3$) (depending on the implementation).\n",
3542
"Therefore, if the number of features in the training set is large, the normal equation will get very slow. \n",
3643
"\n",
44+
"\n",
3745
"* * *\n",
3846
"The training procedure of a linear regression model has different steps. In the beginning (step 0) the model parameters are initialized. The other steps (see below) are repeated for a specified number of training iterations or until the parameters have converged.\n",
3947
"\n",
@@ -471,7 +479,7 @@
471479
"name": "python",
472480
"nbconvert_exporter": "python",
473481
"pygments_lexer": "ipython3",
474-
"version": "3.6.5"
482+
"version": "3.8.3"
475483
},
476484
"toc": {
477485
"nav_menu": {},

0 commit comments

Comments
 (0)