Skip to content

Commit

Permalink
[ci skip] FIX Update explanation regarding number of trees in GBDT (#799
Browse files Browse the repository at this point in the history
  • Loading branch information
ArturoAmorQ committed Jan 29, 2025
1 parent be376ce commit be2e742
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 31 deletions.
15 changes: 8 additions & 7 deletions _sources/python_scripts/ensemble_ex_03.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,21 @@
# Write your code here.

# %% [markdown]
# Both gradient boosting and random forest models improve when increasing the
# number of trees in the ensemble. However, the scores reach a plateau where
# adding new trees just makes fitting and scoring slower.
# Random forest models improve when increasing the number of trees in the
# ensemble. However, the scores reach a plateau where adding new trees just
# makes fitting and scoring slower.
#
# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
# Gradient boosting models overfit when the number of trees is too large. To
# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
# offers an early-stopping option. Internally, the algorithm uses an
# out-of-sample set to compute the generalization performance of the model at
# each addition of a tree. Thus, if the generalization performance is not
# improving for several iterations, it stops adding trees.
#
# Now, create a gradient-boosting model with `n_estimators=1_000`. This number
# of trees is certainly too large. Change the parameter `n_iter_no_change` such
# that the gradient boosting fitting stops after adding 5 trees that do not
# improve the overall generalization performance.
# of trees is certainly too large. Change the parameter `n_iter_no_change`
# such that the gradient boosting fitting stops after adding 5 trees to avoid
# deterioration of the overall generalization performance.

# %%
# Write your code here.
Expand Down
17 changes: 9 additions & 8 deletions _sources/python_scripts/ensemble_sol_03.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,20 +86,21 @@
)

# %% [markdown]
# Both gradient boosting and random forest models improve when increasing the
# number of trees in the ensemble. However, the scores reach a plateau where
# adding new trees just makes fitting and scoring slower.
# Random forest models improve when increasing the number of trees in the
# ensemble. However, the scores reach a plateau where adding new trees just
# makes fitting and scoring slower.
#
# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
# Gradient boosting models overfit when the number of trees is too large. To
# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
# offers an early-stopping option. Internally, the algorithm uses an
# out-of-sample set to compute the generalization performance of the model at
# each addition of a tree. Thus, if the generalization performance is not
# improving for several iterations, it stops adding trees.
#
# Now, create a gradient-boosting model with `n_estimators=1_000`. This number
# of trees is certainly too large. Change the parameter `n_iter_no_change` such
# that the gradient boosting fitting stops after adding 5 trees that do not
# improve the overall generalization performance.
# of trees is certainly too large. Change the parameter `n_iter_no_change`
# such that the gradient boosting fitting stops after adding 5 trees to avoid
# deterioration of the overall generalization performance.

# %%
# solution
Expand All @@ -110,7 +111,7 @@
# %% [markdown] tags=["solution"]
# We see that the number of trees used is far below 1000 with the current
# dataset. Training the gradient boosting model with the entire 1000 trees would
# have been useless.
# have been detrimental.

# %% [markdown]
# Estimate the generalization performance of this model again using the
Expand Down
15 changes: 8 additions & 7 deletions python_scripts/ensemble_ex_03.html
Original file line number Diff line number Diff line change
Expand Up @@ -745,18 +745,19 @@ <h1>📝 Exercise M6.03<a class="headerlink" href="#exercise-m6-03" title="Perma
</div>
</div>
</div>
<p>Both gradient boosting and random forest models improve when increasing the
number of trees in the ensemble. However, the scores reach a plateau where
adding new trees just makes fitting and scoring slower.</p>
<p>To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
<p>Random forest models improve when increasing the number of trees in the
ensemble. However, the scores reach a plateau where adding new trees just
makes fitting and scoring slower.</p>
<p>Gradient boosting models overfit when the number of trees is too large. To
avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
offers an early-stopping option. Internally, the algorithm uses an
out-of-sample set to compute the generalization performance of the model at
each addition of a tree. Thus, if the generalization performance is not
improving for several iterations, it stops adding trees.</p>
<p>Now, create a gradient-boosting model with <code class="docutils literal notranslate"><span class="pre">n_estimators=1_000</span></code>. This number
of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code> such
that the gradient boosting fitting stops after adding 5 trees that do not
improve the overall generalization performance.</p>
of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code>
such that the gradient boosting fitting stops after adding 5 trees to avoid
deterioration of the overall generalization performance.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Write your code here.</span>
Expand Down
17 changes: 9 additions & 8 deletions python_scripts/ensemble_sol_03.html
Original file line number Diff line number Diff line change
Expand Up @@ -776,18 +776,19 @@ <h1>📃 Solution for Exercise M6.03<a class="headerlink" href="#solution-for-ex
<img alt="../_images/b23c1907df9086a091b6f5fcaa8f893e5de559e7265507be985a4c99a14e59c6.png" src="../_images/b23c1907df9086a091b6f5fcaa8f893e5de559e7265507be985a4c99a14e59c6.png" />
</div>
</div>
<p>Both gradient boosting and random forest models improve when increasing the
number of trees in the ensemble. However, the scores reach a plateau where
adding new trees just makes fitting and scoring slower.</p>
<p>To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
<p>Random forest models improve when increasing the number of trees in the
ensemble. However, the scores reach a plateau where adding new trees just
makes fitting and scoring slower.</p>
<p>Gradient boosting models overfit when the number of trees is too large. To
avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
offers an early-stopping option. Internally, the algorithm uses an
out-of-sample set to compute the generalization performance of the model at
each addition of a tree. Thus, if the generalization performance is not
improving for several iterations, it stops adding trees.</p>
<p>Now, create a gradient-boosting model with <code class="docutils literal notranslate"><span class="pre">n_estimators=1_000</span></code>. This number
of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code> such
that the gradient boosting fitting stops after adding 5 trees that do not
improve the overall generalization performance.</p>
of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code>
such that the gradient boosting fitting stops after adding 5 trees to avoid
deterioration of the overall generalization performance.</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># solution</span>
Expand All @@ -805,7 +806,7 @@ <h1>📃 Solution for Exercise M6.03<a class="headerlink" href="#solution-for-ex
</div>
<p>We see that the number of trees used is far below 1000 with the current
dataset. Training the gradient boosting model with the entire 1000 trees would
have been useless.</p>
have been detrimental.</p>
<p>Estimate the generalization performance of this model again using the
<code class="docutils literal notranslate"><span class="pre">sklearn.metrics.mean_absolute_error</span></code> metric but this time using the test set
that we held out at the beginning of the notebook. Compare the resulting value
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit be2e742

Please sign in to comment.