Synchronize exercise and notebooks

ArturoAmorQ · ArturoAmorQ · commit f413287e4ed0 · 2025-04-03T11:31:06.000+02:00
diff --git a/notebooks/trees_ex_01.ipynb b/notebooks/trees_ex_01.ipynb
@@ -83,9 +83,9 @@
     "<div class=\"admonition warning alert alert-danger\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
     "<p class=\"last\">At this time, it is not possible to use <tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict_proba\"</span></tt> for\n",
-    "multiclass problems. This is a planned feature for a future version of\n",
-    "scikit-learn. In the mean time, you can use <tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict\"</span></tt>\n",
-    "instead.</p>\n",
+    "multiclass problems on a single plot. This is a planned feature for a future\n",
+    "version of scikit-learn. In the mean time, you can use\n",
+    "<tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict\"</span></tt> instead.</p>\n",
     "</div>"
    ]
   },
diff --git a/notebooks/trees_sol_01.ipynb b/notebooks/trees_sol_01.ipynb
@@ -87,9 +87,9 @@
     "<div class=\"admonition warning alert alert-danger\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
     "<p class=\"last\">At this time, it is not possible to use <tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict_proba\"</span></tt> for\n",
-    "multiclass problems. This is a planned feature for a future version of\n",
-    "scikit-learn. In the mean time, you can use <tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict\"</span></tt>\n",
-    "instead.</p>\n",
+    "multiclass problems on a single plot. This is a planned feature for a future\n",
+    "version of scikit-learn. In the mean time, you can use\n",
+    "<tt class=\"docutils literal\"><span class=\"pre\">response_method=\"predict\"</span></tt> instead.</p>\n",
     "</div>"
    ]
   },
@@ -212,13 +212,15 @@
     "except that for a K-class problem you have K probability outputs for each\n",
     "data point. Visualizing all these on a single plot can quickly become tricky\n",
     "to interpret. It is then common to instead produce K separate plots, one for\n",
-    "each class, in a one-vs-rest (or one-vs-all) fashion.\n",
+    "each class, in a one-vs-rest (or one-vs-all) fashion. This can be achieved by\n",
+    "calling `DecisionBoundaryDisplay` several times, once for each class, and\n",
+    "passing the `class_of_interest` parameter to the function.\n",
     "\n",
-    "For example, in the plot below, the first plot on the left shows in yellow the\n",
-    "certainty on classifying a data point as belonging to the \"Adelie\" class. In\n",
-    "the same plot, the spectre from green to purple represents the certainty of\n",
-    "**not** belonging to the \"Adelie\" class. The same logic applies to the other\n",
-    "plots in the figure."
+    "For example, in the plot below, the first plot on the left shows the\n",
+    "certainty of classifying a data point as belonging to the \"Adelie\" class. The\n",
+    "darker the color, the more certain the model is that a given point in the\n",
+    "feature space belongs to a given class the predictions. The same logic\n",
+    "applies to the other plots in the figure."
    ]
   },
   {
@@ -231,48 +233,39 @@
    },
    "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "\n",
-    "xx = np.linspace(30, 60, 100)\n",
-    "yy = np.linspace(10, 23, 100)\n",
-    "xx, yy = np.meshgrid(xx, yy)\n",
-    "Xfull = pd.DataFrame(\n",
-    "    {\"Culmen Length (mm)\": xx.ravel(), \"Culmen Depth (mm)\": yy.ravel()}\n",
-    ")\n",
-    "\n",
-    "probas = tree.predict_proba(Xfull)\n",
-    "n_classes = len(np.unique(tree.classes_))\n",
+    "# import numpy as np\n",
+    "from matplotlib import cm\n",
     "\n",
     "_, axs = plt.subplots(ncols=3, nrows=1, sharey=True, figsize=(12, 5))\n",
-    "plt.suptitle(\"Predicted probabilities for decision tree model\", y=0.8)\n",
+    "plt.suptitle(\"Predicted probabilities for decision tree model\", y=1.05)\n",
+    "plt.subplots_adjust(bottom=0.45)\n",
     "\n",
-    "for class_of_interest in range(n_classes):\n",
-    "    axs[class_of_interest].set_title(\n",
-    "        f\"Class {tree.classes_[class_of_interest]}\"\n",
-    "    )\n",
-    "    imshow_handle = axs[class_of_interest].imshow(\n",
-    "        probas[:, class_of_interest].reshape((100, 100)),\n",
-    "        extent=(30, 60, 10, 23),\n",
-    "        vmin=0.0,\n",
-    "        vmax=1.0,\n",
-    "        origin=\"lower\",\n",
-    "        cmap=\"viridis\",\n",
+    "for idx, (class_of_interest, ax) in enumerate(zip(tree.classes_, axs)):\n",
+    "    ax.set_title(f\"Class {class_of_interest}\")\n",
+    "    DecisionBoundaryDisplay.from_estimator(\n",
+    "        tree,\n",
+    "        data_test,\n",
+    "        response_method=\"predict_proba\",\n",
+    "        class_of_interest=class_of_interest,\n",
+    "        ax=ax,\n",
+    "        vmin=0,\n",
+    "        vmax=1,\n",
+    "        cmap=\"Blues\",\n",
     "    )\n",
-    "    axs[class_of_interest].set_xlabel(\"Culmen Length (mm)\")\n",
-    "    if class_of_interest == 0:\n",
-    "        axs[class_of_interest].set_ylabel(\"Culmen Depth (mm)\")\n",
-    "    idx = target_test == tree.classes_[class_of_interest]\n",
-    "    axs[class_of_interest].scatter(\n",
-    "        data_test[\"Culmen Length (mm)\"].loc[idx],\n",
-    "        data_test[\"Culmen Depth (mm)\"].loc[idx],\n",
+    "    ax.scatter(\n",
+    "        data_test[\"Culmen Length (mm)\"].loc[target_test == class_of_interest],\n",
+    "        data_test[\"Culmen Depth (mm)\"].loc[target_test == class_of_interest],\n",
     "        marker=\"o\",\n",
     "        c=\"w\",\n",
     "        edgecolor=\"k\",\n",
     "    )\n",
+    "    ax.set_xlabel(\"Culmen Length (mm)\")\n",
+    "    if idx == 0:\n",
+    "        ax.set_ylabel(\"Culmen Depth (mm)\")\n",
     "\n",
-    "ax = plt.axes([0.15, 0.04, 0.7, 0.05])\n",
-    "plt.colorbar(imshow_handle, cax=ax, orientation=\"horizontal\")\n",
-    "_ = plt.title(\"Probability\")"
+    "ax = plt.axes([0.15, 0.14, 0.7, 0.05])\n",
+    "plt.colorbar(cm.ScalarMappable(cmap=\"Blues\"), cax=ax, orientation=\"horizontal\")\n",
+    "_ = ax.set_title(\"Predicted class membership probability\")"
    ]
   },
   {
@@ -283,22 +276,17 @@
     ]
    },
    "source": [
+    "\n",
     "<div class=\"admonition note alert alert-info\">\n",
     "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
-    "<p class=\"last\">You may have noticed that we are no longer using a diverging colormap. Indeed,\n",
-    "the chance level for a one-vs-rest binarization of the multi-class\n",
-    "classification problem is almost never at predicted probability of 0.5. So\n",
-    "using a colormap with a neutral white at 0.5 might give a false impression on\n",
-    "the certainty.</p>\n",
-    "</div>\n",
-    "\n",
-    "In future versions of scikit-learn `DecisionBoundaryDisplay` will support a\n",
-    "`class_of_interest` parameter that will allow in particular for a\n",
-    "visualization of `predict_proba` in multi-class settings.\n",
-    "\n",
-    "We also plan to make it possible to visualize the `predict_proba` values for\n",
-    "the class with the maximum predicted probability (without having to pass a\n",
-    "given a fixed `class_of_interest` value)."
+    "<p class=\"last\">You may notice that we do not use a diverging colormap (2 color gradients with\n",
+    "white in the middle). Indeed, in a multiclass setting, 0.5 is not a\n",
+    "meaningful value, hence using white as the center of the colormap is not\n",
+    "appropriate. Instead, we use a sequential colormap, where the color intensity\n",
+    "indicates the certainty of the classification. The darker the color, the more\n",
+    "certain the model is that a given point in the feature space belongs to a\n",
+    "given class.</p>\n",
+    "</div>"
    ]
   }
  ],
diff --git a/python_scripts/trees_ex_01.py b/python_scripts/trees_ex_01.py
@@ -59,9 +59,9 @@
 #
 # ```{warning}
 # At this time, it is not possible to use `response_method="predict_proba"` for
-# multiclass problems. This is a planned feature for a future version of
-# scikit-learn. In the mean time, you can use `response_method="predict"`
-# instead.
+# multiclass problems on a single plot. This is a planned feature for a future
+# version of scikit-learn. In the mean time, you can use
+# `response_method="predict"` instead.
 # ```
 
 # %%