Final ipynb

fawazsiddiqi · fawazsiddiqi · commit c71e940a033a · 2020-09-15T22:34:14.000+04:00
diff --git a/notebooks/01-clustering_with_scikit-learn-tutorial.ipynb b/notebooks/01-clustering_with_scikit-learn-tutorial.ipynb
@@ -174,13 +174,19 @@
     "Define some functions that will be used repeatedly for visualization."
    ]
   },
+  {
+   "source": [
+    "### 3D matplotlib (plus seaborn) charting with some data prep and optional center points"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# 3D matplotlib (plus seaborn) charting with some data prep and optional center points\n",
     "def show_scatter_3d(df, x_name, y_name, z_name, predicted=None, centers=None,\n",
     "                    marker='o', cmap=None, edgecolors=None, alpha=0.3,\n",
     "                    elev=25, azim=10, show_colorbar=True,\n",
@@ -249,27 +255,38 @@
     "        plt.scatter(center[0], center[1], marker=\"X\", s=300, color='red')        "
    ]
   },
+  {
+   "source": [
+    "### Plotly 3D scatter chart is almost a one-liner, but use this function to keep the params in one place"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Plotly 3D scatter chart is almost a one-liner, but use this function to keep the params in one place\n",
     "def plotly_scatter_3d(df, x, y, z, color=None):\n",
     "    fig = px.scatter_3d(df, x=x, y=y, z=z, color=color,\n",
     "                    opacity=0.2, template='plotly_dark', color_continuous_scale=px.colors.qualitative.Set1)\n",
     "    fig.show()   "
    ]
   },
+  {
+   "source": [
+    "### Use a stacked bar chart for an external evaluation of the churn cluster vs known churn risk"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Use a stacked bar chart for an external evaluation of the churn cluster vs known churn risk\n",
-    "\n",
     "# Map the risk values to sortables (and still OK for the legend)\n",
     "risk_map = {'High': '2: High', 'Medium': '1: Medium', 'Low': '0: Low'}\n",
     "    \n",
@@ -318,14 +335,20 @@
     "show_scatter_3d(blobs_df, 'X', 'Y', 'Z', predicted=blob_labels);"
    ]
   },
+  {
+   "source": [
+    "#### This is the same thing we just showed with matplotlib, but now we have tooltips and we can zoom and rotate.\n",
+    "#### Rotating the chart can be very helpful when clusters are overlapping in 3-dimensional space."
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# This is the same thing we just showed with matplotlib, but now we have tooltips and we can zoom and rotate.\n",
-    "# Rotating the chart can be very helpful when clusters are overlapping in 3-dimensional space.\n",
     "plotly_scatter_3d(blobs_df, 'X', 'Y', 'Z', color='CLUSTER')"
    ]
   },
@@ -584,6 +607,13 @@
     "outliers_df = temp_df[temp_df['CLUSTER']==-1]"
    ]
   },
+  {
+   "source": [
+    "Here we can see the algorithm recognize outliers into a chart using matplotlib and makes those regions, this is based on our mean-shift algorithm"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -723,7 +753,21 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Hierarchical\n"
+    "## Hierarchical Models\n",
+    "\n",
+    "- Builds hierarchy of clusters\n",
+    "\n",
+    "- Starts with all the data points assigned to a cluster of their own\n",
+    "\n",
+    "- Two nearest clusters are merged into the same cluster \n",
+    "\n",
+    "- Terminates when there is only a single cluster left\n",
+    "\n",
+    "#### Agglomerative \n",
+    "\n",
+    "- Bottom up approach when it comes to clustering\n",
+    "\n",
+    "- Start with many small clusters and merge them together to create bigger clusters\n"
    ]
   },
   {
@@ -807,7 +851,7 @@
   "kernelspec": {
    "display_name": "Python 3.6.10 64-bit ('py36': conda)",
    "language": "python",
-   "name": "python_defaultSpec_1600174602311"
+   "name": "python_defaultSpec_1600186116323"
   },
   "language_info": {
    "codemirror_mode": {