Skip to content

Commit c71e940

Browse files
committed
Final ipynb
1 parent 615391d commit c71e940

File tree

1 file changed

+52
-8
lines changed

1 file changed

+52
-8
lines changed

notebooks/01-clustering_with_scikit-learn-tutorial.ipynb

+52-8
Original file line numberDiff line numberDiff line change
@@ -174,13 +174,19 @@
174174
"Define some functions that will be used repeatedly for visualization."
175175
]
176176
},
177+
{
178+
"source": [
179+
"### 3D matplotlib (plus seaborn) charting with some data prep and optional center points"
180+
],
181+
"cell_type": "markdown",
182+
"metadata": {}
183+
},
177184
{
178185
"cell_type": "code",
179186
"execution_count": null,
180187
"metadata": {},
181188
"outputs": [],
182189
"source": [
183-
"# 3D matplotlib (plus seaborn) charting with some data prep and optional center points\n",
184190
"def show_scatter_3d(df, x_name, y_name, z_name, predicted=None, centers=None,\n",
185191
" marker='o', cmap=None, edgecolors=None, alpha=0.3,\n",
186192
" elev=25, azim=10, show_colorbar=True,\n",
@@ -249,27 +255,38 @@
249255
" plt.scatter(center[0], center[1], marker=\"X\", s=300, color='red') "
250256
]
251257
},
258+
{
259+
"source": [
260+
"### Plotly 3D scatter chart is almost a one-liner, but use this function to keep the params in one place"
261+
],
262+
"cell_type": "markdown",
263+
"metadata": {}
264+
},
252265
{
253266
"cell_type": "code",
254267
"execution_count": null,
255268
"metadata": {},
256269
"outputs": [],
257270
"source": [
258-
"# Plotly 3D scatter chart is almost a one-liner, but use this function to keep the params in one place\n",
259271
"def plotly_scatter_3d(df, x, y, z, color=None):\n",
260272
" fig = px.scatter_3d(df, x=x, y=y, z=z, color=color,\n",
261273
" opacity=0.2, template='plotly_dark', color_continuous_scale=px.colors.qualitative.Set1)\n",
262274
" fig.show() "
263275
]
264276
},
277+
{
278+
"source": [
279+
"### Use a stacked bar chart for an external evaluation of the churn cluster vs known churn risk"
280+
],
281+
"cell_type": "markdown",
282+
"metadata": {}
283+
},
265284
{
266285
"cell_type": "code",
267286
"execution_count": null,
268287
"metadata": {},
269288
"outputs": [],
270289
"source": [
271-
"# Use a stacked bar chart for an external evaluation of the churn cluster vs known churn risk\n",
272-
"\n",
273290
"# Map the risk values to sortables (and still OK for the legend)\n",
274291
"risk_map = {'High': '2: High', 'Medium': '1: Medium', 'Low': '0: Low'}\n",
275292
" \n",
@@ -318,14 +335,20 @@
318335
"show_scatter_3d(blobs_df, 'X', 'Y', 'Z', predicted=blob_labels);"
319336
]
320337
},
338+
{
339+
"source": [
340+
"#### This is the same thing we just showed with matplotlib, but now we have tooltips and we can zoom and rotate.\n",
341+
"#### Rotating the chart can be very helpful when clusters are overlapping in 3-dimensional space."
342+
],
343+
"cell_type": "markdown",
344+
"metadata": {}
345+
},
321346
{
322347
"cell_type": "code",
323348
"execution_count": null,
324349
"metadata": {},
325350
"outputs": [],
326351
"source": [
327-
"# This is the same thing we just showed with matplotlib, but now we have tooltips and we can zoom and rotate.\n",
328-
"# Rotating the chart can be very helpful when clusters are overlapping in 3-dimensional space.\n",
329352
"plotly_scatter_3d(blobs_df, 'X', 'Y', 'Z', color='CLUSTER')"
330353
]
331354
},
@@ -584,6 +607,13 @@
584607
"outliers_df = temp_df[temp_df['CLUSTER']==-1]"
585608
]
586609
},
610+
{
611+
"source": [
612+
"Here we can see the algorithm recognize outliers into a chart using matplotlib and makes those regions, this is based on our mean-shift algorithm"
613+
],
614+
"cell_type": "markdown",
615+
"metadata": {}
616+
},
587617
{
588618
"cell_type": "code",
589619
"execution_count": null,
@@ -723,7 +753,21 @@
723753
"cell_type": "markdown",
724754
"metadata": {},
725755
"source": [
726-
"## Hierarchical\n"
756+
"## Hierarchical Models\n",
757+
"\n",
758+
"- Builds hierarchy of clusters\n",
759+
"\n",
760+
"- Starts with all the data points assigned to a cluster of their own\n",
761+
"\n",
762+
"- Two nearest clusters are merged into the same cluster \n",
763+
"\n",
764+
"- Terminates when there is only a single cluster left\n",
765+
"\n",
766+
"#### Agglomerative \n",
767+
"\n",
768+
"- Bottom up approach when it comes to clustering\n",
769+
"\n",
770+
"- Start with many small clusters and merge them together to create bigger clusters\n"
727771
]
728772
},
729773
{
@@ -807,7 +851,7 @@
807851
"kernelspec": {
808852
"display_name": "Python 3.6.10 64-bit ('py36': conda)",
809853
"language": "python",
810-
"name": "python_defaultSpec_1600174602311"
854+
"name": "python_defaultSpec_1600186116323"
811855
},
812856
"language_info": {
813857
"codemirror_mode": {

0 commit comments

Comments
 (0)