You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: FeatureEngineering.md
+10-12
Original file line number
Diff line number
Diff line change
@@ -8,28 +8,26 @@ We can combine tables, transform features to create new features.
8
8
We can even do so with help. Let's see [featuretools](https://www.featuretools.com/)
9
9
10
10
After we created "meaningful" features, we are now ready to transform their format.
11
-
### OneHot Encoding: Dealing with discrete features
12
11
13
-
### Mathematical Operations: Creating New Features
14
12
15
-
#### Single Column
16
-
Apply functions such as log, sqrt, pow, or other functions that take 1 input.
17
-
18
-
#### Two Columns
19
-
Apply product,ratio, or other transformations that take 2 or more inputs.
20
-
21
-
### Bucketing: Dealing with continuous feature
13
+
## Dimension Reduction (Yin-side)
22
14
23
-
### NLP: dealing with text feature
15
+
There are many available tools for this.
24
16
25
-
## Dimension Reduction (Yin-side)
17
+
### Statistics: remove highly correlated data
18
+
We can do this automatically using [featuretools](https://www.featuretools.com/).
19
+
Or we can remove them by hand.
26
20
27
21
### Principal Component Analysis (PCA)
22
+
This is a classic and fast method, but it has it's limitations.
23
+
Remember standardize your data before you do this.
24
+
We can do this using sklearn very fast.
28
25
29
26
### auto-encoding using deep neural networks
27
+
We need sufficient data to do this well, more complicated than PCA>
30
28
31
-
### Feature selection
32
29
30
+
### Feature selection
33
31
Before the final task, we could try to solve a representive task. Use feature importance for a model, usually trees, to select features. Use SelectKBest (e.g. [here](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)).
34
32
In addition, we can do selection while training - LASSO regularization etc.
0 commit comments