Skip to content

Commit 857c457

Browse files
committed
Internal change
PiperOrigin-RevId: 373338217
1 parent 999372b commit 857c457

File tree

8 files changed

+518
-6337
lines changed

8 files changed

+518
-6337
lines changed

documentation/_book.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,11 @@ upper_tabs:
3232
- heading: Tutorials
3333
- title: Overview
3434
path: /decision_forests/tutorials/index
35-
- title: Beginner colab tutorial
35+
- title: Beginner tutorial
3636
path: /decision_forests/tutorials/beginner_colab
37-
- title: Intermediate colab tutorial
37+
- title: Intermediate tutorial
3838
path: /decision_forests/tutorials/intermediate_colab
39-
- title: Advanced colab tutorial
39+
- title: Advanced tutorial
4040
path: /decision_forests/tutorials/advanced_colab
4141
- heading: Developer
4242
- title: Developer manual

documentation/migration.md

Lines changed: 71 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -71,12 +71,12 @@ one of those reasons, it is safe to train your TF-DF on train+validation (unless
7171
the validation split is also used for something else, like hyperparameter
7272
tuning).
7373

74-
```python {.bad}
75-
model.fit(train_ds, validation_data=val_ds)
74+
```diff {.bad}
75+
- model.fit(train_ds, validation_data=val_ds)
7676
```
7777

78-
```python {.good}
79-
model.fit(train_ds.concatenate(val_ds))
78+
```diff {.good}
79+
+ model.fit(train_ds.concatenate(val_ds))
8080

8181
# Or just don't create a validation dataset
8282
```
@@ -91,17 +91,17 @@ needed, it will be extracted automatically from the training dataset.
9191

9292
#### Train for exactly 1 epoch
9393

94-
```python {.bad}
94+
```diff {.bad}
9595
# Number of epochs in Keras
96-
model.fit(train_ds, num_epochs=5)
96+
- model.fit(train_ds, num_epochs=5)
9797

9898
# Number of epochs in the dataset
99-
train_ds = train_ds.repeat(5)
100-
model.fit(train_ds)
99+
- train_ds = train_ds.repeat(5)
100+
- model.fit(train_ds)
101101
```
102102

103-
```python {.good}
104-
model.fit(train_ds)
103+
```diff {.good}
104+
+ model.fit(train_ds)
105105
```
106106

107107
**Rationale:** Users of neural networks often train a model for N steps (which
@@ -116,13 +116,13 @@ unnecessary data I/O, as well as slower training.
116116
Datasets do not need to be shuffled (unless the input_fn is reading only a
117117
sample of the dataset).
118118

119-
```python {.bad}
120-
train_ds = train_ds.shuffle(5)
121-
model.fit(train_ds)
119+
```diff {.bad}
120+
- train_ds = train_ds.shuffle(5)
121+
- model.fit(train_ds)
122122
```
123123

124-
```python {.good}
125-
model.fit(train_ds)
124+
```diff {.good}
125+
+ model.fit(train_ds)
126126
```
127127

128128
**Rationale:** TF-DF shuffles access to the data internally after reading the
@@ -136,15 +136,15 @@ However, this will make the training procedure non-deterministic.
136136

137137
The batch size will not affect the model quality
138138

139-
```python {.bad}
140-
train_ds = train_ds.batch(hyper_parameter_batch_size())
141-
model.fit(train_ds)
139+
```diff {.bad}
140+
- train_ds = train_ds.batch(hyper_parameter_batch_size())
141+
- model.fit(train_ds)
142142
```
143143

144-
```python {.good}
144+
```diff {.good}
145145
# The batch size does not matter.
146-
train_ds = train_ds.batch(64)
147-
model.fit(train_ds)
146+
+ train_ds = train_ds.batch(64)
147+
+ model.fit(train_ds)
148148
```
149149

150150
***Rationale:*** Since TF-DF is always trained on the full dataset after it is
@@ -214,37 +214,37 @@ transformations. By default, all of the features in the dataset (other than the
214214
label) will be detected and used by the model. The feature semantics will be
215215
auto-detected, and can be overridden manually if needed.
216216

217-
```python {.bad}
217+
```diff {.bad}
218218
# Estimator code
219-
feature_columns = [
220-
tf.feature_column.numeric_column(feature_1),
221-
tf.feature_column.categorical_column_with_vocabulary_list(feature_2, ['First', 'Second', 'Third'])
222-
]
223-
model = tf.estimator.LinearClassifier(feature_columns=feature_columnes)
219+
- feature_columns = [
220+
- tf.feature_column.numeric_column(feature_1),
221+
- tf.feature_column.categorical_column_with_vocabulary_list(feature_2, ['First', 'Second', 'Third'])
222+
- ]
223+
- model = tf.estimator.LinearClassifier(feature_columns=feature_columnes)
224224
```
225225

226-
```python {.good}
226+
```diff {.good}
227227
# Use all the available features. Detect the type automatically.
228-
model = tfdf.keras.GradientBoostedTreesModel()
228+
+ model = tfdf.keras.GradientBoostedTreesModel()
229229
```
230230

231231
You can also specify a subset of input features:
232232

233-
```python {.good}
234-
features = [
235-
tfdf.keras.FeatureUsage(name="feature_1"),
236-
tfdf.keras.FeatureUsage(name="feature_2")
237-
]
238-
model = tfdf.keras.GradientBoostedTreesModel(features=features, exclude_non_specified_features=True)
233+
```diff {.good}
234+
+ features = [
235+
+ tfdf.keras.FeatureUsage(name="feature_1"),
236+
+ tfdf.keras.FeatureUsage(name="feature_2")
237+
+ ]
238+
+ model = tfdf.keras.GradientBoostedTreesModel(features=features, exclude_non_specified_features=True)
239239
```
240240

241241
If necessary, you can force the semantic of a feature.
242242

243-
```python {.good}
244-
forced_features = [
245-
tfdf.keras.FeatureUsage(name="feature_1", semantic=tfdf.keras.FeatureSemantic.CATEGORICAL),
246-
]
247-
model = tfdf.keras.GradientBoostedTreesModel(features=features)
243+
```diff {.good}
244+
+ forced_features = [
245+
+ tfdf.keras.FeatureUsage(name="feature_1", semantic=tfdf.keras.FeatureSemantic.CATEGORICAL),
246+
+ ]
247+
+ model = tfdf.keras.GradientBoostedTreesModel(features=features)
248248
```
249249

250250
**Rationale:** While certain models (like Neural Networks) require a
@@ -262,29 +262,29 @@ remove all pre-processing that was designed to help neural network training.
262262

263263
#### Do not normalize numerical features
264264

265-
```python {.bad}
266-
def zscore(value):
267-
return (value-mean) / sd
265+
```diff {.bad}
266+
- def zscore(value):
267+
- return (value-mean) / sd
268268

269-
feature_columns = [tf.feature_column.numeric_column("feature_1",normalizer_fn=zscore)]
269+
- feature_columns = [tf.feature_column.numeric_column("feature_1",normalizer_fn=zscore)]
270270
```
271271

272-
**Rationale:** Decision forest algorithms natively support non-normalized
272+
**Rational:** Decision forest algorithms natively support non-normalized
273273
numerical features, since the splitting algorithms do not do any numerical
274274
transformation of the input. Some types of normalization (e.g. zscore
275275
normalization) will not help numerical stability of the training procedure, and
276276
some (e.g. outlier clipping) may hurt the expressiveness of the final model.
277277

278278
#### Do not encode categorical features (e.g. hashing, one-hot, or embedding)
279279

280-
```python {.bad}
281-
integerized_column = tf.feature_column.categorical_column_with_hash_bucket("feature_1",hash_bucket_size=100)
282-
feature_columns = [tf.feature_column.indicator_column(integerized_column)]
280+
```diff {.bad}
281+
- integerized_column = tf.feature_column.categorical_column_with_hash_bucket("feature_1",hash_bucket_size=100)
282+
- feature_columns = [tf.feature_column.indicator_column(integerized_column)]
283283
```
284284

285-
```python {.bad}
286-
integerized_column = tf.feature_column.categorical_column_with_vocabulary_list('feature_1', ['bob', 'george', 'wanda'])
287-
feature_columns = [tf.feature_column.indicator_column(integerized_column)]
285+
```diff {.bad}
286+
- integerized_column = tf.feature_column.categorical_column_with_vocabulary_list('feature_1', ['bob', 'george', 'wanda'])
287+
- feature_columns = [tf.feature_column.indicator_column(integerized_column)]
288288
```
289289

290290
**Rationale:** TF-DF has native support for categorical features, and will treat
@@ -315,11 +315,11 @@ networks, which may propagate NaNs to the gradients if there are NaNs in the
315315
input, TF-DF will train optimally if the algorithm sees the difference between
316316
missing and a sentinel value.
317317

318-
```python {.bad}
319-
feature_columns = [
320-
tf.feature_column.numeric_column("feature_1", default_value=0),
321-
tf.feature_column.numeric_column("feature_1_is_missing"),
322-
]
318+
```diff {.bad}
319+
- feature_columns = [
320+
- tf.feature_column.numeric_column("feature_1", default_value=0),
321+
- tf.feature_column.numeric_column("feature_1_is_missing"),
322+
- ]
323323
```
324324

325325
#### Handling Images and Time series
@@ -381,22 +381,22 @@ dataset reads are deterministic as well.
381381

382382
#### Specify a task (e.g. classification, ranking) instead of a loss (e.g. binary cross-entropy)
383383

384-
```python {.bad}
385-
model = tf.keras.Sequential()
386-
model.add(Dense(64, activation=relu))
387-
model.add(Dense(1)) # One output for binary classification
384+
```diff {.bad}
385+
- model = tf.keras.Sequential()
386+
- model.add(Dense(64, activation=relu))
387+
- model.add(Dense(1)) # One output for binary classification
388388

389-
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
390-
optimizer='adam',
391-
metrics=['accuracy'])
389+
- model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
390+
- optimizer='adam',
391+
- metrics=['accuracy'])
392392
```
393393

394-
```python {.good}
394+
```diff {.good}
395395
# The loss is automatically determined from the task.
396-
model = tfdf.keras.GradientBoostedTreesModel(task=tf.keras.Task.CLASSIFICATION)
396+
+ model = tfdf.keras.GradientBoostedTreesModel(task=tf.keras.Task.CLASSIFICATION)
397397

398398
# Optional if you want to report the accuracy.
399-
model.compile(metrics=['accuracy'])
399+
+ model.compile(metrics=['accuracy'])
400400
```
401401

402402
**Rationale:** Not all TF-DF learning algorithms use a loss. For those that do,
@@ -489,13 +489,13 @@ print(tree)
489489
TF-DF does not yet support TF distribution strategies. Multi-worker setups will
490490
be ignored, and the training will only happen on the manager.
491491

492-
```python {.bad}
493-
with tf.distribute.MirroredStrategy():
494-
model = ...
492+
```diff {.bad}
493+
- with tf.distribute.MirroredStrategy():
494+
- model = ...
495495
```
496496

497-
```python {.good}
498-
model = ....
497+
```diff {.good}
498+
+ model = ....
499499
```
500500

501501
#### Stacking Models

0 commit comments

Comments
 (0)