[ci skip] FIX Update explanation regarding number of trees in GBDT (#799

) 0ebeac1
INRIA · Jan 29, 2025 · be2e742 · be2e742
1 parent be376ce
commit be2e742
Show file tree

Hide file tree

Showing 5 changed files with 35 additions and 31 deletions.
diff --git a/_sources/python_scripts/ensemble_ex_03.py b/_sources/python_scripts/ensemble_ex_03.py
@@ -64,20 +64,21 @@
 # Write your code here.
 
 # %% [markdown]
-# Both gradient boosting and random forest models improve when increasing the
-# number of trees in the ensemble. However, the scores reach a plateau where
-# adding new trees just makes fitting and scoring slower.
+# Random forest models improve when increasing the number of trees in the
+# ensemble. However, the scores reach a plateau where adding new trees just
+# makes fitting and scoring slower.
 #
-# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
+# Gradient boosting models overfit when the number of trees is too large. To
+# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
 # offers an early-stopping option. Internally, the algorithm uses an
 # out-of-sample set to compute the generalization performance of the model at
 # each addition of a tree. Thus, if the generalization performance is not
 # improving for several iterations, it stops adding trees.
 #
 # Now, create a gradient-boosting model with `n_estimators=1_000`. This number
-# of trees is certainly too large. Change the parameter `n_iter_no_change` such
-# that the gradient boosting fitting stops after adding 5 trees that do not
-# improve the overall generalization performance.
+# of trees is certainly too large. Change the parameter `n_iter_no_change`
+# such that the gradient boosting fitting stops after adding 5 trees to avoid
+# deterioration of the overall generalization performance.
 
 # %%
 # Write your code here.

diff --git a/_sources/python_scripts/ensemble_sol_03.py b/_sources/python_scripts/ensemble_sol_03.py
@@ -86,20 +86,21 @@
 )
 
 # %% [markdown]
-# Both gradient boosting and random forest models improve when increasing the
-# number of trees in the ensemble. However, the scores reach a plateau where
-# adding new trees just makes fitting and scoring slower.
+# Random forest models improve when increasing the number of trees in the
+# ensemble. However, the scores reach a plateau where adding new trees just
+# makes fitting and scoring slower.
 #
-# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
+# Gradient boosting models overfit when the number of trees is too large. To
+# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
 # offers an early-stopping option. Internally, the algorithm uses an
 # out-of-sample set to compute the generalization performance of the model at
 # each addition of a tree. Thus, if the generalization performance is not
 # improving for several iterations, it stops adding trees.
 #
 # Now, create a gradient-boosting model with `n_estimators=1_000`. This number
-# of trees is certainly too large. Change the parameter `n_iter_no_change` such
-# that the gradient boosting fitting stops after adding 5 trees that do not
-# improve the overall generalization performance.
+# of trees is certainly too large. Change the parameter `n_iter_no_change`
+# such that the gradient boosting fitting stops after adding 5 trees to avoid
+# deterioration of the overall generalization performance.
 
 # %%
 # solution
@@ -110,7 +111,7 @@
 # %% [markdown] tags=["solution"]
 # We see that the number of trees used is far below 1000 with the current
 # dataset. Training the gradient boosting model with the entire 1000 trees would
-# have been useless.
+# have been detrimental.
 
 # %% [markdown]
 # Estimate the generalization performance of this model again using the

diff --git a/python_scripts/ensemble_ex_03.html b/python_scripts/ensemble_ex_03.html
@@ -745,18 +745,19 @@ <h1>📝 Exercise M6.03<a class="headerlink" href="#exercise-m6-03" title="Perma
 </div>
 </div>
 </div>
-<p>Both gradient boosting and random forest models improve when increasing the
-number of trees in the ensemble. However, the scores reach a plateau where
-adding new trees just makes fitting and scoring slower.</p>
-<p>To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
+<p>Random forest models improve when increasing the number of trees in the
+ensemble. However, the scores reach a plateau where adding new trees just
+makes fitting and scoring slower.</p>
+<p>Gradient boosting models overfit when the number of trees is too large. To
+avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
 offers an early-stopping option. Internally, the algorithm uses an
 out-of-sample set to compute the generalization performance of the model at
 each addition of a tree. Thus, if the generalization performance is not
 improving for several iterations, it stops adding trees.</p>
 <p>Now, create a gradient-boosting model with <code class="docutils literal notranslate"><span class="pre">n_estimators=1_000</span></code>. This number
-of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code> such
-that the gradient boosting fitting stops after adding 5 trees that do not
-improve the overall generalization performance.</p>
+of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code>
+such that the gradient boosting fitting stops after adding 5 trees to avoid
+deterioration of the overall generalization performance.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Write your code here.</span>

diff --git a/python_scripts/ensemble_sol_03.html b/python_scripts/ensemble_sol_03.html
@@ -776,18 +776,19 @@ <h1>📃 Solution for Exercise M6.03<a class="headerlink" href="#solution-for-ex
 <img alt="../_images/b23c1907df9086a091b6f5fcaa8f893e5de559e7265507be985a4c99a14e59c6.png" src="../_images/b23c1907df9086a091b6f5fcaa8f893e5de559e7265507be985a4c99a14e59c6.png" />
 </div>
 </div>
-<p>Both gradient boosting and random forest models improve when increasing the
-number of trees in the ensemble. However, the scores reach a plateau where
-adding new trees just makes fitting and scoring slower.</p>
-<p>To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
+<p>Random forest models improve when increasing the number of trees in the
+ensemble. However, the scores reach a plateau where adding new trees just
+makes fitting and scoring slower.</p>
+<p>Gradient boosting models overfit when the number of trees is too large. To
+avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
 offers an early-stopping option. Internally, the algorithm uses an
 out-of-sample set to compute the generalization performance of the model at
 each addition of a tree. Thus, if the generalization performance is not
 improving for several iterations, it stops adding trees.</p>
 <p>Now, create a gradient-boosting model with <code class="docutils literal notranslate"><span class="pre">n_estimators=1_000</span></code>. This number
-of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code> such
-that the gradient boosting fitting stops after adding 5 trees that do not
-improve the overall generalization performance.</p>
+of trees is certainly too large. Change the parameter <code class="docutils literal notranslate"><span class="pre">n_iter_no_change</span></code>
+such that the gradient boosting fitting stops after adding 5 trees to avoid
+deterioration of the overall generalization performance.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># solution</span>
@@ -805,7 +806,7 @@ <h1>📃 Solution for Exercise M6.03<a class="headerlink" href="#solution-for-ex
 </div>
 <p>We see that the number of trees used is far below 1000 with the current
 dataset. Training the gradient boosting model with the entire 1000 trees would
-have been useless.</p>
+have been detrimental.</p>
 <p>Estimate the generalization performance of this model again using the
 <code class="docutils literal notranslate"><span class="pre">sklearn.metrics.mean_absolute_error</span></code> metric but this time using the test set
 that we held out at the beginning of the notebook. Compare the resulting value

diff --git a/searchindex.js b/searchindex.js