You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: how-to-use-azureml/azure-databricks/README.md
+11-10Lines changed: 11 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,21 @@
1
1
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
2
2
3
-
In this section, you will see sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
3
+
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
4
4
5
5
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
6
6
- You can keep the data within the same cluster.
7
7
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
8
8
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
9
9
- You can further tune the model generated by automated machine learning if you chose to.
10
-
- Every run (including the best run) is available as a pipeline.
10
+
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
11
11
- The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.
12
12
13
+
13
14
**Create Azure Databricks Cluster:**
14
15
15
16
Select New Cluster and fill in following detail:
16
17
- Cluster name: _yourclustername_
17
-
- Databricks Runtime: Any 4.x runtime.
18
+
- Databricks Runtime: Any **non ML** runtime (non ML 4.x, 5.x)
18
19
- Python version: **3**
19
20
- Workers: 2 or higher.
20
21
@@ -46,25 +47,25 @@ It will take few minutes to create the cluster. Please ensure that the cluster s
46
47
47
48
- Click Install Library
48
49
49
-
- Do not select _Attach automatically to all clusters_. In case you have selected earlier then you can go to your Home folder and deselect it.
50
+
- Do not select _Attach automatically to all clusters_. In case you selected this earlier, please go to your Home folder and deselect it.
50
51
51
52
- Select the check box _Attach_ next to your cluster name
52
53
53
54
(More details on how to attach and detach libs are here - [https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster](https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster) )
54
55
55
56
- Ensure that there are no errors until Status changes to _Attached_. It may take a couple of minutes.
56
57
57
-
**Note** - If you have the old build the please deselect it from cluster’s installed libs > move to trash. Install the new build and restart the cluster. And if still there is an issue then detach and reattach your cluster.
58
+
**Note** - If you have an old SDK version, please deselect it from cluster’s installed libs > move to trash. Install the new SDK verdion and restart the cluster. If there is an issue after this, please detach and reattach your cluster.
58
59
59
-
iPython Notebooks 1-4 have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing [databricks_amlsdk](Databricks_AMLSDK_1-4_6.dbc) instead of downloading individually.
60
+
**Single file** -
61
+
The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually.
60
62
61
-
Notebooks 1-4 are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. Notebook 6 is an Automated ML sample notebook.
63
+
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
62
64
63
-
For details on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
65
+
Notebook 6 is an Automated ML sample notebook for Classification.
64
66
65
67
Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.
66
68
67
-
You can also use Azure Databricks as a compute target for [training models with an Azure Machine Learning pipeline](https://docs.microsoft.com/machine-learning/service/how-to-set-up-training-targets#databricks).
68
-
69
+
For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
Copy file name to clipboardExpand all lines: how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb
+27-61Lines changed: 27 additions & 61 deletions
Original file line number
Diff line number
Diff line change
@@ -13,45 +13,31 @@
13
13
"cell_type": "markdown",
14
14
"metadata": {},
15
15
"source": [
16
-
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
"# AutoML : Classification with Local Compute on Azure DataBricks\n",
29
-
"\n",
30
-
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
18
+
"In this example we use the scikit-learn's <a href=\"http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset\" target=\"_blank\">digit dataset</a> to showcase how you can use AutoML for a simple classification problem.\n",
31
19
"\n",
32
20
"In this notebook you will learn how to:\n",
33
21
"1. Create Azure Machine Learning Workspace object and initialize your notebook directory to easily reload this object from a configuration file.\n",
34
22
"2. Create an `Experiment` in an existing `Workspace`.\n",
35
-
"3. Configure AutoML using `AutoMLConfig`.\n",
36
-
"4. Train the model using AzureDataBricks.\n",
23
+
"3. Configure Automated ML using `AutoMLConfig`.\n",
24
+
"4. Train the model using Azure Databricks.\n",
37
25
"5. Explore the results.\n",
38
26
"6. Test the best fitted model.\n",
39
27
"\n",
40
-
"Prerequisites:\n",
41
-
"Before running this notebook, please follow the readme for installing necessary libraries to your cluster."
28
+
"Before running this notebook, please follow the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks\" target=\"_blank\">readme for using Automated ML on Azure Databricks</a> for installing necessary libraries to your cluster."
"Microsoft.MachineLearningServices only needs to be registed once in the subscription. To register it:\n",
50
-
"Start the Azure portal.\n",
51
-
"Select your All services and then Subscription.\n",
52
-
"Select the subscription that you want to use.\n",
53
-
"Click on Resource providers\n",
54
-
"Click the Register link next to Microsoft.MachineLearningServices"
35
+
"We support installing AML SDK with Automated ML as library from GUI. When attaching a library follow <a href=\"https://docs.databricks.com/user-guide/libraries.html\" target=\"_blank\">this link</a> and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
187
+
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
217
188
]
218
189
},
219
190
{
@@ -239,15 +210,6 @@
239
210
"from azureml.train.automl.run import AutoMLRun"
240
211
]
241
212
},
242
-
{
243
-
"cell_type": "code",
244
-
"execution_count": null,
245
-
"metadata": {},
246
-
"outputs": [],
247
-
"source": [
248
-
"ws = Workspace.from_config(auth = auth)"
249
-
]
250
-
},
251
213
{
252
214
"cell_type": "code",
253
215
"execution_count": null,
@@ -304,6 +266,9 @@
304
266
"metadata": {},
305
267
"outputs": [],
306
268
"source": [
269
+
"#Automated ML requires a dataflow, which is different from dataframe.\n",
270
+
"#If your data is in a dataframe, please use read_pandas_dataframe to convert a dataframe to dataflow before usind dprep.\n",
271
+
"\n",
307
272
"import azureml.dataprep as dprep\n",
308
273
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
309
274
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
@@ -375,7 +340,6 @@
375
340
" spark_context=sc, #databricks/spark related\n",
376
341
" X = X_train, \n",
377
342
" y = y_train,\n",
378
-
" enable_cache=False,\n",
379
343
" path = project_folder)"
380
344
]
381
345
},
@@ -420,7 +384,7 @@
420
384
"metadata": {},
421
385
"outputs": [],
422
386
"source": [
423
-
"print(local_run.get_portal_url())"
387
+
"displayHTML(\"<a href={} target='_blank'>Your experiment in Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
424
388
]
425
389
},
426
390
{
@@ -548,7 +512,9 @@
548
512
"cell_type": "markdown",
549
513
"metadata": {},
550
514
"source": [
551
-
"When deploying an automated ML trained model, please specify _pip_packages=['azureml-sdk[automl]']_ in your CondaDependencies."
515
+
"When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
516
+
"\n",
517
+
"Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
0 commit comments