Skip to content

Commit 5c705f8

Browse files
author
peicheng
committed
updated
1 parent 105599c commit 5c705f8

File tree

2 files changed

+15
-6
lines changed

2 files changed

+15
-6
lines changed

FeatureEngineering.md

+14-4
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
# Feature Engineering
2-
Yin-Yang sides of feature engineering
2+
Yin-Yang sides of feature engineering.
3+
34

45
## Transforming or creating new features (Yang-side)
56

7+
We can combine tables, transform features to create new features.
8+
We can even do so with help. Let's see [featuretools](https://www.featuretools.com/)
9+
10+
After we created "meaningful" features, we are now ready to transform their format.
611
### OneHot Encoding: Dealing with discrete features
712

813
### Mathematical Operations: Creating New Features
@@ -17,10 +22,15 @@ Apply product,ratio, or other transformations that take 2 or more inputs.
1722

1823
### NLP: dealing with text feature
1924

20-
## Feature selection (Yin-side)
25+
## Dimension Reduction (Yin-side)
26+
27+
### Principal Component Analysis (PCA)
28+
29+
### auto-encoding using deep neural networks
2130

22-
### Selection before traning
31+
### Feature selection
2332

24-
### Selection while training - LASSO regularization
33+
Before the final task, we could try to solve a representive task. Use feature importance for a model, usually trees, to select features. Use SelectKBest (e.g. [here](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)).
34+
In addition, we can do selection while training - LASSO regularization etc.
2535

2636

README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ There are many great sources to learn Data Science, and here are some advice to
3535
* Take a First Look
3636
* Data Cleaning
3737
* Transform DataFrames
38-
* Dimension Reduction
38+
* [Feature Engineering](FeatureEngineering.md)
3939

4040
4. [Exploring Data](ExploringData.md)
4141
* Simple Data Visualization
@@ -50,7 +50,6 @@ There are many great sources to learn Data Science, and here are some advice to
5050
* Hierarchical clustering
5151

5252
7. [Basic Supervised Learning](SupervisedLearningBasic.md)
53-
* [Feature Engineering](FeatureEngineering.md)
5453
* [Classification via Logistic Regression](https://www.kaggle.com/danielzou/tensorflow-multiclassification)
5554
* Ensemble Learning
5655
* [XGBoost](XGBoost.md)

0 commit comments

Comments
 (0)