Skip to content

Commit 08258ed

Browse files
author
Ashwin Hegde
committed
feat(eg): feature selection + filter method
1 parent 90a4b1c commit 08258ed

File tree

3 files changed

+76429
-0
lines changed

3 files changed

+76429
-0
lines changed

Feature_Selection.ipynb

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Feature selection\n",
8+
"\n",
9+
"What is feature selection?\n",
10+
"\n",
11+
"* Feature or variable selection is the process of selecting a subset of relevant features from a total features available in a dataset to build ML algorithms.\n",
12+
"\n",
13+
"Why should we select features?\n",
14+
"\n",
15+
"* Its easy to understand the output which uses 10 variables as compare to 100 variables. Thus, simple models are easier to interpret.\n",
16+
"* Shorter training times.\n",
17+
"* Reduce the risk of data errors during model use.\n",
18+
"* Reduce the variable redundancy. Exclude co-related variables.\n",
19+
"* Bad learning behaviour in high dimensional space."
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"## Feature selection methods\n",
27+
"\n",
28+
"1. Filter methods\n",
29+
"2. Wrapper methods\n",
30+
"3. Embedded methods"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"metadata": {},
36+
"source": [
37+
"### Wrapper methods\n",
38+
"\n",
39+
"* Use predictive ML models to score the feature subset.\n",
40+
"* Train a new model on each feature subset.\n",
41+
"* Tend to be very computationally expensive.\n",
42+
"* They may not produce the best feature combination for a different ML model.\n"
43+
]
44+
},
45+
{
46+
"cell_type": "markdown",
47+
"metadata": {},
48+
"source": [
49+
"### Embedded methods\n",
50+
"\n",
51+
"* Perform feature selection as part of the model construction process.\n",
52+
"* Consider the interaction between features and models.\n",
53+
"* They are less computationally expensive than wrapper methods, because they fit the ML model only once."
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": null,
59+
"metadata": {},
60+
"outputs": [],
61+
"source": []
62+
}
63+
],
64+
"metadata": {
65+
"kernelspec": {
66+
"display_name": "Python 3",
67+
"language": "python",
68+
"name": "python3"
69+
},
70+
"language_info": {
71+
"codemirror_mode": {
72+
"name": "ipython",
73+
"version": 3
74+
},
75+
"file_extension": ".py",
76+
"mimetype": "text/x-python",
77+
"name": "python",
78+
"nbconvert_exporter": "python",
79+
"pygments_lexer": "ipython3",
80+
"version": "3.6.0"
81+
}
82+
},
83+
"nbformat": 4,
84+
"nbformat_minor": 2
85+
}

0 commit comments

Comments
 (0)