@@ -71,12 +71,12 @@ one of those reasons, it is safe to train your TF-DF on train+validation (unless
71
71
the validation split is also used for something else, like hyperparameter
72
72
tuning).
73
73
74
- ``` python {.bad}
75
- model.fit(train_ds, validation_data = val_ds)
74
+ ``` diff {.bad}
75
+ - model.fit(train_ds, validation_data=val_ds)
76
76
```
77
77
78
- ``` python {.good}
79
- model.fit(train_ds.concatenate(val_ds))
78
+ ``` diff {.good}
79
+ + model.fit(train_ds.concatenate(val_ds))
80
80
81
81
# Or just don't create a validation dataset
82
82
```
@@ -91,17 +91,17 @@ needed, it will be extracted automatically from the training dataset.
91
91
92
92
#### Train for exactly 1 epoch
93
93
94
- ``` python {.bad}
94
+ ``` diff {.bad}
95
95
# Number of epochs in Keras
96
- model.fit(train_ds, num_epochs = 5 )
96
+ - model.fit(train_ds, num_epochs=5)
97
97
98
98
# Number of epochs in the dataset
99
- train_ds = train_ds.repeat(5 )
100
- model.fit(train_ds)
99
+ - train_ds = train_ds.repeat(5)
100
+ - model.fit(train_ds)
101
101
```
102
102
103
- ``` python {.good}
104
- model.fit(train_ds)
103
+ ``` diff {.good}
104
+ + model.fit(train_ds)
105
105
```
106
106
107
107
** Rationale:** Users of neural networks often train a model for N steps (which
@@ -116,13 +116,13 @@ unnecessary data I/O, as well as slower training.
116
116
Datasets do not need to be shuffled (unless the input_fn is reading only a
117
117
sample of the dataset).
118
118
119
- ``` python {.bad}
120
- train_ds = train_ds.shuffle(5 )
121
- model.fit(train_ds)
119
+ ``` diff {.bad}
120
+ - train_ds = train_ds.shuffle(5)
121
+ - model.fit(train_ds)
122
122
```
123
123
124
- ``` python {.good}
125
- model.fit(train_ds)
124
+ ``` diff {.good}
125
+ + model.fit(train_ds)
126
126
```
127
127
128
128
** Rationale:** TF-DF shuffles access to the data internally after reading the
@@ -136,15 +136,15 @@ However, this will make the training procedure non-deterministic.
136
136
137
137
The batch size will not affect the model quality
138
138
139
- ``` python {.bad}
140
- train_ds = train_ds.batch(hyper_parameter_batch_size())
141
- model.fit(train_ds)
139
+ ``` diff {.bad}
140
+ - train_ds = train_ds.batch(hyper_parameter_batch_size())
141
+ - model.fit(train_ds)
142
142
```
143
143
144
- ``` python {.good}
144
+ ``` diff {.good}
145
145
# The batch size does not matter.
146
- train_ds = train_ds.batch(64 )
147
- model.fit(train_ds)
146
+ + train_ds = train_ds.batch(64)
147
+ + model.fit(train_ds)
148
148
```
149
149
150
150
*** Rationale:*** Since TF-DF is always trained on the full dataset after it is
@@ -214,37 +214,37 @@ transformations. By default, all of the features in the dataset (other than the
214
214
label) will be detected and used by the model. The feature semantics will be
215
215
auto-detected, and can be overridden manually if needed.
216
216
217
- ``` python {.bad}
217
+ ``` diff {.bad}
218
218
# Estimator code
219
- feature_columns = [
220
- tf.feature_column.numeric_column(feature_1),
221
- tf.feature_column.categorical_column_with_vocabulary_list(feature_2, [' First' , ' Second' , ' Third' ])
222
- ]
223
- model = tf.estimator.LinearClassifier(feature_columns = feature_columnes)
219
+ - feature_columns = [
220
+ - tf.feature_column.numeric_column(feature_1),
221
+ - tf.feature_column.categorical_column_with_vocabulary_list(feature_2, ['First', 'Second', 'Third'])
222
+ - ]
223
+ - model = tf.estimator.LinearClassifier(feature_columns=feature_columnes)
224
224
```
225
225
226
- ``` python {.good}
226
+ ``` diff {.good}
227
227
# Use all the available features. Detect the type automatically.
228
- model = tfdf.keras.GradientBoostedTreesModel()
228
+ + model = tfdf.keras.GradientBoostedTreesModel()
229
229
```
230
230
231
231
You can also specify a subset of input features:
232
232
233
- ``` python {.good}
234
- features = [
235
- tfdf.keras.FeatureUsage(name = " feature_1" ),
236
- tfdf.keras.FeatureUsage(name = " feature_2" )
237
- ]
238
- model = tfdf.keras.GradientBoostedTreesModel(features = features, exclude_non_specified_features = True )
233
+ ``` diff {.good}
234
+ + features = [
235
+ + tfdf.keras.FeatureUsage(name="feature_1"),
236
+ + tfdf.keras.FeatureUsage(name="feature_2")
237
+ + ]
238
+ + model = tfdf.keras.GradientBoostedTreesModel(features=features, exclude_non_specified_features=True)
239
239
```
240
240
241
241
If necessary, you can force the semantic of a feature.
242
242
243
- ``` python {.good}
244
- forced_features = [
245
- tfdf.keras.FeatureUsage(name = " feature_1" , semantic = tfdf.keras.FeatureSemantic.CATEGORICAL ),
246
- ]
247
- model = tfdf.keras.GradientBoostedTreesModel(features = features)
243
+ ``` diff {.good}
244
+ + forced_features = [
245
+ + tfdf.keras.FeatureUsage(name="feature_1", semantic=tfdf.keras.FeatureSemantic.CATEGORICAL),
246
+ + ]
247
+ + model = tfdf.keras.GradientBoostedTreesModel(features=features)
248
248
```
249
249
250
250
** Rationale:** While certain models (like Neural Networks) require a
@@ -262,29 +262,29 @@ remove all pre-processing that was designed to help neural network training.
262
262
263
263
#### Do not normalize numerical features
264
264
265
- ``` python {.bad}
266
- def zscore (value ):
267
- return (value- mean) / sd
265
+ ``` diff {.bad}
266
+ - def zscore(value):
267
+ - return (value-mean) / sd
268
268
269
- feature_columns = [tf.feature_column.numeric_column(" feature_1" ,normalizer_fn = zscore)]
269
+ - feature_columns = [tf.feature_column.numeric_column("feature_1",normalizer_fn=zscore)]
270
270
```
271
271
272
- ** Rationale :** Decision forest algorithms natively support non-normalized
272
+ ** Rational :** Decision forest algorithms natively support non-normalized
273
273
numerical features, since the splitting algorithms do not do any numerical
274
274
transformation of the input. Some types of normalization (e.g. zscore
275
275
normalization) will not help numerical stability of the training procedure, and
276
276
some (e.g. outlier clipping) may hurt the expressiveness of the final model.
277
277
278
278
#### Do not encode categorical features (e.g. hashing, one-hot, or embedding)
279
279
280
- ``` python {.bad}
281
- integerized_column = tf.feature_column.categorical_column_with_hash_bucket(" feature_1" ,hash_bucket_size = 100 )
282
- feature_columns = [tf.feature_column.indicator_column(integerized_column)]
280
+ ``` diff {.bad}
281
+ - integerized_column = tf.feature_column.categorical_column_with_hash_bucket("feature_1",hash_bucket_size=100)
282
+ - feature_columns = [tf.feature_column.indicator_column(integerized_column)]
283
283
```
284
284
285
- ``` python {.bad}
286
- integerized_column = tf.feature_column.categorical_column_with_vocabulary_list(' feature_1' , [' bob' , ' george' , ' wanda' ])
287
- feature_columns = [tf.feature_column.indicator_column(integerized_column)]
285
+ ``` diff {.bad}
286
+ - integerized_column = tf.feature_column.categorical_column_with_vocabulary_list('feature_1', ['bob', 'george', 'wanda'])
287
+ - feature_columns = [tf.feature_column.indicator_column(integerized_column)]
288
288
```
289
289
290
290
** Rationale:** TF-DF has native support for categorical features, and will treat
@@ -315,11 +315,11 @@ networks, which may propagate NaNs to the gradients if there are NaNs in the
315
315
input, TF-DF will train optimally if the algorithm sees the difference between
316
316
missing and a sentinel value.
317
317
318
- ``` python {.bad}
319
- feature_columns = [
320
- tf.feature_column.numeric_column(" feature_1" , default_value = 0 ),
321
- tf.feature_column.numeric_column(" feature_1_is_missing" ),
322
- ]
318
+ ``` diff {.bad}
319
+ - feature_columns = [
320
+ - tf.feature_column.numeric_column("feature_1", default_value=0),
321
+ - tf.feature_column.numeric_column("feature_1_is_missing"),
322
+ - ]
323
323
```
324
324
325
325
#### Handling Images and Time series
@@ -381,22 +381,22 @@ dataset reads are deterministic as well.
381
381
382
382
#### Specify a task (e.g. classification, ranking) instead of a loss (e.g. binary cross-entropy)
383
383
384
- ``` python {.bad}
385
- model = tf.keras.Sequential()
386
- model.add(Dense(64 , activation = relu))
387
- model.add(Dense(1 )) # One output for binary classification
384
+ ``` diff {.bad}
385
+ - model = tf.keras.Sequential()
386
+ - model.add(Dense(64, activation=relu))
387
+ - model.add(Dense(1)) # One output for binary classification
388
388
389
- model.compile(loss = tf.keras.losses.BinaryCrossentropy(from_logits = True ),
390
- optimizer = ' adam' ,
391
- metrics = [' accuracy' ])
389
+ - model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
390
+ - optimizer='adam',
391
+ - metrics=['accuracy'])
392
392
```
393
393
394
- ``` python {.good}
394
+ ``` diff {.good}
395
395
# The loss is automatically determined from the task.
396
- model = tfdf.keras.GradientBoostedTreesModel(task = tf.keras.Task.CLASSIFICATION )
396
+ + model = tfdf.keras.GradientBoostedTreesModel(task=tf.keras.Task.CLASSIFICATION)
397
397
398
398
# Optional if you want to report the accuracy.
399
- model.compile(metrics = [' accuracy' ])
399
+ + model.compile(metrics=['accuracy'])
400
400
```
401
401
402
402
** Rationale:** Not all TF-DF learning algorithms use a loss. For those that do,
@@ -489,13 +489,13 @@ print(tree)
489
489
TF-DF does not yet support TF distribution strategies. Multi-worker setups will
490
490
be ignored, and the training will only happen on the manager.
491
491
492
- ``` python {.bad}
493
- with tf.distribute.MirroredStrategy():
494
- model = ...
492
+ ``` diff {.bad}
493
+ - with tf.distribute.MirroredStrategy():
494
+ - model = ...
495
495
```
496
496
497
- ``` python {.good}
498
- model = ... .
497
+ ``` diff {.good}
498
+ + model = ....
499
499
```
500
500
501
501
#### Stacking Models
0 commit comments