|
41 | 41 | "\n",
|
42 | 42 | "In our two previous examples, we were considering classification problems, where the goal was to predict a single discrete label of an \n",
|
43 | 43 | "input data point. Another common type of machine learning problem is \"regression\", which consists of predicting a continuous value instead \n",
|
44 |
| - "instead of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a \n", |
| 44 | + "of a discrete label. For instance, predicting the temperature tomorrow, given meteorological data, or predicting the time that a \n", |
45 | 45 | "software project will take to complete, given its specifications.\n",
|
46 | 46 | "\n",
|
47 | 47 | "Do not mix up \"regression\" with the algorithm \"logistic regression\": confusingly, \"logistic regression\" is not a regression algorithm, \n",
|
|
282 | 282 | "\n",
|
283 | 283 | "def build_model():\n",
|
284 | 284 | " # Because we will need to instantiate\n",
|
285 |
| - " # the same model multiple time,\n", |
| 285 | + " # the same model multiple times,\n", |
286 | 286 | " # we use a function to construct it.\n",
|
287 | 287 | " model = models.Sequential()\n",
|
288 | 288 | " model.add(layers.Dense(64, activation='relu',\n",
|
|
304 | 304 | "we applied a `sigmoid` activation function to our last layer, the network could only learn to predict values between 0 and 1. Here, because \n",
|
305 | 305 | "the last layer is purely linear, the network is free to learn to predict values in any range.\n",
|
306 | 306 | "\n",
|
307 |
| - "Note that we are compiling the network with the `mse` loss function -- Mean Squared Error, the square of the different between the \n", |
| 307 | + "Note that we are compiling the network with the `mse` loss function -- Mean Squared Error, the square of the difference between the \n", |
308 | 308 | "predictions and the targets, a widely used loss function for regression problems.\n",
|
309 | 309 | "\n",
|
310 | 310 | "We are also monitoring a new metric during training: `mae`. This stands for Mean Absolute Error. It is simply the absolute value of the \n",
|
|
431 | 431 | "metadata": {},
|
432 | 432 | "source": [
|
433 | 433 | "\n",
|
434 |
| - "As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.29. Their average (2.4) is a much more \n", |
| 434 | + "As you can notice, the different runs do indeed show rather different validation scores, from 2.1 to 2.9. Their average (2.4) is a much more \n", |
435 | 435 | "reliable metric than any single of these scores -- that's the entire point of K-fold cross-validation. In this case, we are off by \\$2,400 on \n",
|
436 | 436 | "average, which is still significant considering that the prices range from \\$10,000 to \\$50,000. \n",
|
437 | 437 | "\n",
|
|
600 | 600 | "metadata": {},
|
601 | 601 | "source": [
|
602 | 602 | "\n",
|
603 |
| - "According to this plot, it seems that validation MAE stops improving significantly after after 80 epochs. Past that point, we start overfitting.\n", |
| 603 | + "According to this plot, it seems that validation MAE stops improving significantly after 80 epochs. Past that point, we start overfitting.\n", |
604 | 604 | "\n",
|
605 | 605 | "Once we are done tuning other parameters of our model (besides the number of epochs, we could also adjust the size of the hidden layers), we \n",
|
606 | 606 | "can train a final \"production\" model on all of the training data, with the best parameters, then look at its performance on the test data:"
|
|
0 commit comments