Skip to content

Commit f0ed447

Browse files
committed
Small updates to L12/W12
1 parent 3f8b946 commit f0ed447

File tree

2 files changed

+15
-10
lines changed

2 files changed

+15
-10
lines changed

CMP9065 Data Programming in Python/lectures/Lecture 12 - Sklearn.ipynb

-2
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@
66
"source": [
77
"# Lecture 12 - Sklearn\n",
88
"\n",
9-
"# CODE 484983\n",
10-
"\n",
119
"[Sklearn](https://scikit-learn.org/stable/) (package name `scikit-learn`) is a machine learning library, allowing you to run a large selection of machine learning algorithms on your data.\n",
1210
"\n",
1311
"In the lectures so far (especially with Scipy), we were mostly looking at relationships between **two features** (measurements) in a dataset (i.e. _How is the type of vegetation related to the surface area of the lake?_). With machine learning methods, we typically want to relate **all the features** to a single **ground truth** value (this could be a class, such as an animal species, or a value to predict like the price of an item).\n",

CMP9065 Data Programming in Python/workshops/Workshop 12 - Sklearn.ipynb

+15-8
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,8 @@
105105
"In the following task, you are expected to train a regression model predicting the value of `petal_width` from the values of `sepal_length`. You will be training a [`linear_model.LinearRegression()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) model.\n",
106106
"\n",
107107
"you can do this in the following steps:\n",
108-
"- select the feature (`sepal_length`) into the variable `X_regression`. As each sample needs to be represented by an array, even if it has a single feature, you can apply `np.array.reshape(-1, 1)` to the selected feature\n",
109-
"- select the feature to predict (`petal_width`) into the variable `y_regression`\n",
108+
"- select the first feature (`sepal_length` - column 0) into the variable `X_regression`. As each sample needs to be represented by an array, even if it has a single feature, you can apply `np.array.reshape(-1, 1)` to the selected feature\n",
109+
"- select the feature to predict (`petal_width` - column 3) into the variable `y_regression`\n",
110110
"- separate these into training and testing sets (`X_train, X_test, y_train, y_test`) using [`sklearn.model_selection.train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), using $20\\%$ of the samples in the testing set\n",
111111
"- initialise the [`linear_model.LinearRegression()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) model and fit it to the training data\n",
112112
"- predict the values for the testing set and store it into `y_predicted`\n",
@@ -209,8 +209,8 @@
209209
"This exercise is similar to the previous one, except that you are expected to use all other features (`sepal_length`, `sepal_width` and `petal_length`) to predict `petal_width`.\n",
210210
"\n",
211211
"You should perform the following steps:\n",
212-
"- select the features (`sepal_length`, `sepal_width` and `petal_length`) into the variable `X_regression`\n",
213-
"- select the feature to predict (`petal_width`) into the variable `y_regression`\n",
212+
"- select the features(`sepal_length`, `sepal_width` and `petal_length` - columns 0, 1 and 2) into the variable `X_regression`\n",
213+
"- select the feature to predict (`petal_width` - column 3) into the variable `y_regression`\n",
214214
"- separate these into training and testing sets (`X_train, X_test, y_train, y_test`) using [`sklearn.model_selection.train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), using $20\\%$ of the samples in the testing set\n",
215215
"- initialise the [`linear_model.LinearRegression()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) model and fit it to the training data\n",
216216
"- predict the values for the testing set and store it into `y_predicted`\n",
@@ -297,8 +297,8 @@
297297
"In the following task, you are expected to train a **classification model** predicting the class of the iris flower from the values of `sepal_length` and `sepal_width`. You will be training a [`tree.DecisionTreeClassifier()`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) model.\n",
298298
"\n",
299299
"You can do this in the following steps:\n",
300-
"- select the first two features (`sepal_length` and `sepal_width`) into the variable `X_classification`\n",
301-
"- store the labels to predict `y_classification`\n",
300+
"- select the first two features (`sepal_length` and `sepal_width` - column index 0 and 1) into the variable `X_classification`\n",
301+
"- store the labels to predict in `y_classification` (this is currently just stored in `y`)\n",
302302
"- separate these into training and testing sets (`X_train, X_test, y_train, y_test`) using [`sklearn.model_selection.train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), using $20\\%$ of the samples in the testing set\n",
303303
"- initialise the [`tree.DecisionTreeClassifier()`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) model with `max_depth=5` and fit it to the training data\n",
304304
"- predict the values for the testing set and store it into `y_predicted`\n",
@@ -412,11 +412,12 @@
412412
"\n",
413413
"\n",
414414
"You can do this in the following steps:\n",
415-
"- store all the dataset features into the variable `X_classification`\n",
416-
"- store the labels to predict into `y_classification`\n",
415+
"- store all the dataset features into the variable `X_classification` (these are currently just in `X`)\n",
416+
"- store the labels to predict into `y_classification` (there are currently just in `y`)\n",
417417
"- separate these into training and testing sets (`X_train, X_test, y_train, y_test`) using [`sklearn.model_selection.train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), using $20\\%$ of the samples in the testing set\n",
418418
"- initialise the [`tree.DecisionTreeClassifier()`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) model with `max_depth=5` and fit it to the training data\n",
419419
"- predict the values for the testing set and store it into `y_predicted`\n",
420+
"- **Note** that _again_ all the steps but the first (selecting the features) are **exactly the same as in [Exercise 3](#Exercise-3)**. You can copy-paste the rest of your solution for [Exercise 3](#Exercise-3) once you select the features.\n",
420421
"\n",
421422
"The code will then evaluate your model, by calculating [`accuracy_score`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [`f1_score`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) metrics from your `y_test` and `y_predicted` classes.\n",
422423
"\n",
@@ -439,6 +440,12 @@
439440
}
440441
],
441442
"source": [
443+
"###################################\n",
444+
"#### Insert your solution here ####\n",
445+
"###################################\n",
446+
"\n",
447+
"\n",
448+
"\n",
442449
"\n",
443450
"# Evaluation:\n",
444451
"print(\"accuracy on test set: {}\".format(accuracy_score(y_test, y_predicted)))\n",

0 commit comments

Comments
 (0)