Skip to content

Commit 59fcd07

Browse files
Created using Colaboratory
1 parent aa466bd commit 59fcd07

File tree

1 file changed

+283
-0
lines changed

1 file changed

+283
-0
lines changed

11_Multi_Class_Classification.ipynb

+283
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"name": "11 Multi Class Classification.ipynb",
7+
"provenance": [],
8+
"authorship_tag": "ABX9TyPWNOFvn1SCWX1AdS9Jn8me",
9+
"include_colab_link": true
10+
},
11+
"kernelspec": {
12+
"name": "python3",
13+
"display_name": "Python 3"
14+
},
15+
"language_info": {
16+
"name": "python"
17+
}
18+
},
19+
"cells": [
20+
{
21+
"cell_type": "markdown",
22+
"metadata": {
23+
"id": "view-in-github",
24+
"colab_type": "text"
25+
},
26+
"source": [
27+
"<a href=\"https://colab.research.google.com/github/sandipanpaul21/Logistic-regression-in-python/blob/main/11_Multi_Class_Classification.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
28+
]
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"source": [
33+
"**Binary Classifiers for Multi-Class Classification**\n",
34+
"\n",
35+
"- Classification is a predictive modeling problem that involves assigning a class label to an example.\n",
36+
"- **Binary classification** are those tasks where examples are assigned exactly one of two classes. \n",
37+
"- **Binary Classification:** Classification tasks with two classes.\n",
38+
"- **Multi-class classification** is those tasks where examples are assigned exactly one of more than two classes.\n",
39+
"- **Multi-class Classification:** Classification tasks with more than two classes.\n",
40+
"\n",
41+
"*Some algorithms are designed for binary classification problems. Examples include:*\n",
42+
"\n",
43+
"1. Logistic Regression\n",
44+
"2. Support Vector Machines\n",
45+
"\n",
46+
"\n",
47+
"Instead, heuristic methods can be used to split a multi-class classification problem into multiple binary classification datasets and train a binary classification model each.\n",
48+
"\n",
49+
"Two examples of these heuristic methods include:\n",
50+
"\n",
51+
"1. One-vs-Rest (OvR)\n",
52+
"2. One-vs-One (OvO)"
53+
],
54+
"metadata": {
55+
"id": "VyVM3wn3FDmZ"
56+
}
57+
},
58+
{
59+
"cell_type": "markdown",
60+
"source": [
61+
"**One-Vs-Rest (OvR) for Multi-Class Classification**\n",
62+
"\n",
63+
"- **One-vs-rest (OvR)** for short, also referred to as **One-vs-All (OvA)** is a heuristic method for using binary classification algorithms for multi-class classification.\n",
64+
"\n",
65+
"- It involves splitting the multi-class dataset into multiple binary classification problems. \n",
66+
"- A binary classifier is then trained on each binary classification problem and predictions are made using the model that is the most confident.\n",
67+
"\n",
68+
"- For example, \n",
69+
" - Given a multi-class classification problem with examples for each class ‘red,’ ‘blue,’ and ‘green‘. \n",
70+
" - This could be divided into three binary classification datasets as follows:\n",
71+
" 1. Binary Classification Problem 1: red vs [blue, green]\n",
72+
" 2. Binary Classification Problem 2: blue vs [red, green]\n",
73+
" 3. Binary Classification Problem 3: green vs [red, blue]\n",
74+
"\n",
75+
"**Possible Issue**\n",
76+
"- A possible downside of this approach is that it requires one model to be created for each class. \n",
77+
"- For example, three classes requires three models. \n",
78+
" - This could be an issue for large datasets (e.g. millions of rows) \n",
79+
" - Very large numbers of classes (e.g. hundreds of classes).\n",
80+
"\n",
81+
"**Approach of OvA or OvR**\n",
82+
"- The obvious approach is to use a one-versus-the-rest approach (also called one-vs-all), in which we train C binary classifiers, fc(x), where the data from class c is treated as positive, and the data from all the other classes is treated as negative.\n",
83+
"- This approach requires that each model predicts a class membership probability or a probability-like score. The argmax of these scores (class index with the largest score) is then used to predict a class.\n",
84+
"- This approach is commonly used for algorithms that naturally predict numerical class membership probability or score, such as: Logistic Regression. \n",
85+
"As such, the implementation of these algorithms in the scikit-learn library implements the OvR strategy by default when using these algorithms for multi-class classification.\n",
86+
"\n",
87+
"**Python Example**\n",
88+
"- We can demonstrate this with an example on a 3-class classification problem using the LogisticRegression algorithm. \n",
89+
"- The strategy for handling multi-class classification can be set via the “multi_class” argument and can be set to “ovr” for the one-vs-rest strategy."
90+
],
91+
"metadata": {
92+
"id": "LNYaXKSJFjUq"
93+
}
94+
},
95+
{
96+
"cell_type": "code",
97+
"source": [
98+
"# logistic regression for multi-class classification using built-in one-vs-rest\n",
99+
"from sklearn.datasets import make_classification\n",
100+
"from sklearn.linear_model import LogisticRegression\n",
101+
"\n",
102+
"# define dataset\n",
103+
"X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)\n",
104+
"\n",
105+
"# define model\n",
106+
"model = LogisticRegression(multi_class='ovr')\n",
107+
"\n",
108+
"# fit model\n",
109+
"model.fit(X, y)\n",
110+
"\n",
111+
"# make predictions\n",
112+
"yhat = model.predict(X)"
113+
],
114+
"metadata": {
115+
"id": "LoAiNMAzFjEX"
116+
},
117+
"execution_count": 1,
118+
"outputs": []
119+
},
120+
{
121+
"cell_type": "markdown",
122+
"source": [
123+
"- The scikit-learn library also provides a separate OneVsRestClassifier class that allows the one-vs-rest strategy to be used with any classifier.\n",
124+
"\n",
125+
"- This class can be used to use a binary classifier like Logistic Regression or Perceptron for multi-class classification, or even other classifiers that natively support multi-class classification."
126+
],
127+
"metadata": {
128+
"id": "VoM4ch_1FqKC"
129+
}
130+
},
131+
{
132+
"cell_type": "code",
133+
"execution_count": 2,
134+
"metadata": {
135+
"id": "AcL956kJBPes"
136+
},
137+
"outputs": [],
138+
"source": [
139+
"# logistic regression for multi-class classification using a one-vs-rest\n",
140+
"from sklearn.datasets import make_classification\n",
141+
"from sklearn.linear_model import LogisticRegression\n",
142+
"from sklearn.multiclass import OneVsRestClassifier\n",
143+
"\n",
144+
"# define dataset\n",
145+
"X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)\n",
146+
"\n",
147+
"# define model\n",
148+
"model = LogisticRegression()\n",
149+
"\n",
150+
"# define the ovr strategy\n",
151+
"ovr = OneVsRestClassifier(model)\n",
152+
"\n",
153+
"# fit model\n",
154+
"ovr.fit(X, y)\n",
155+
"\n",
156+
"# make predictions\n",
157+
"yhat = ovr.predict(X)"
158+
]
159+
},
160+
{
161+
"cell_type": "markdown",
162+
"source": [
163+
"**One-Vs-One for Multi-Class Classification**\n",
164+
"\n",
165+
"- One-vs-One (OvO for short) is another heuristic method for using binary classification algorithms for multi-class classification.\n",
166+
"\n",
167+
"- Like one-vs-rest, one-vs-one splits a multi-class classification dataset into binary classification problems. \n",
168+
"\n",
169+
"- Unlike one-vs-rest that splits it into one binary dataset for each class, the one-vs-one approach splits the dataset into one dataset for each class versus every other class.\n",
170+
"- For example\n",
171+
" - Consider a multi-class classification problem with four classes: ‘red,’ ‘blue,’ and ‘green,’ ‘yellow.’ \n",
172+
" - This could be divided into six binary classification datasets as follows:\n",
173+
" \n",
174+
" Binary Classification Problem 1: red vs. blue\n",
175+
"\n",
176+
" Binary Classification Problem 2: red vs. green\n",
177+
"\n",
178+
" Binary Classification Problem 3: red vs. yellow\n",
179+
"\n",
180+
" Binary Classification Problem 4: blue vs. green\n",
181+
"\n",
182+
" Binary Classification Problem 5: blue vs. yellow\n",
183+
"\n",
184+
" Binary Classification Problem 6: green vs. yellow\n",
185+
"\n",
186+
"- The formula for calculating the number of binary datasets, and in turn, models, is as follows:\n",
187+
"\n",
188+
" (NumClasses * (NumClasses – 1)) / 2\n",
189+
"\n",
190+
"- We can see that for four classes, this gives us the expected value of six binary classification problems:\n",
191+
"\n",
192+
" (NumClasses * (NumClasses – 1)) / 2\n",
193+
" \n",
194+
" (4 * (4 – 1)) / 2\n",
195+
" \n",
196+
" (4 * 3) / 2\n",
197+
" \n",
198+
" 12 / 2\n",
199+
" \n",
200+
" 6\n",
201+
"\n",
202+
"Each binary classification model may predict one class label and the model with the most predictions or votes is predicted by the one-vs-one strategy.\n",
203+
"\n",
204+
"- An alternative is to introduce K(K − 1)/2 binary discriminant functions, one for every possible pair of classes. \n",
205+
"- This is known as a **one-versus-one classifier**. \n",
206+
"- Each point is then classified according to a *majority vote amongst* the discriminant functions.\n",
207+
"- Similarly, if the binary classification models predict a numerical class membership, such as a *probability, then the argmax of the sum of the scores (class with the largest sum score) is predicted as the class label.*\n",
208+
"\n",
209+
"The support vector machine implementation in the scikit-learn is provided by the SVC class and supports the one-vs-one method for multi-class classification problems. This can be achieved by setting the “decision_function_shape” argument to ‘ovo‘."
210+
],
211+
"metadata": {
212+
"id": "KrhI1hBuFqaP"
213+
}
214+
},
215+
{
216+
"cell_type": "code",
217+
"source": [
218+
"# SVM for multi-class classification using built-in one-vs-one\n",
219+
"from sklearn.datasets import make_classification\n",
220+
"from sklearn.svm import SVC\n",
221+
"\n",
222+
"# define dataset\n",
223+
"X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)\n",
224+
"\n",
225+
"# define model\n",
226+
"model = SVC(decision_function_shape='ovo')\n",
227+
"\n",
228+
"# fit model\n",
229+
"model.fit(X, y)\n",
230+
"\n",
231+
"# make predictions\n",
232+
"yhat = model.predict(X)"
233+
],
234+
"metadata": {
235+
"id": "Avulnt8zL071"
236+
},
237+
"execution_count": 3,
238+
"outputs": []
239+
},
240+
{
241+
"cell_type": "markdown",
242+
"source": [
243+
"- The scikit-learn library also provides a separate OneVsOneClassifier class that allows the one-vs-one strategy to be used with any classifier.\n",
244+
"\n",
245+
"- This class can be used with a binary classifier like SVM, Logistic Regression or Perceptron for multi-class classification, or even other classifiers that natively support multi-class classification.\n",
246+
"\n",
247+
"- It is very easy to use and requires that a classifier that is to be used for binary classification be provided to the OneVsOneClassifier as an argument."
248+
],
249+
"metadata": {
250+
"id": "SraJXdoKL0h3"
251+
}
252+
},
253+
{
254+
"cell_type": "code",
255+
"source": [
256+
"# SVM for multi-class classification using one-vs-one\n",
257+
"from sklearn.datasets import make_classification\n",
258+
"from sklearn.svm import SVC\n",
259+
"from sklearn.multiclass import OneVsOneClassifier\n",
260+
"\n",
261+
"# define dataset\n",
262+
"X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)\n",
263+
"\n",
264+
"# define model\n",
265+
"model = SVC()\n",
266+
"\n",
267+
"# define ovo strategy\n",
268+
"ovo = OneVsOneClassifier(model)\n",
269+
"\n",
270+
"# fit model\n",
271+
"ovo.fit(X, y)\n",
272+
"\n",
273+
"# make predictions\n",
274+
"yhat = ovo.predict(X)"
275+
],
276+
"metadata": {
277+
"id": "DRGfJE3nL15l"
278+
},
279+
"execution_count": 4,
280+
"outputs": []
281+
}
282+
]
283+
}

0 commit comments

Comments
 (0)