Skip to content

Commit d09942f

Browse files
authored
Merge pull request #165 from rastala/master
databricks update
2 parents e2640e5 + 0c9e527 commit d09942f

File tree

4 files changed

+742
-71
lines changed

4 files changed

+742
-71
lines changed
Binary file not shown.

how-to-use-azureml/azure-databricks/README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
11
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
22

3-
In this section, you will see sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
3+
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
44

55
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
66
- You can keep the data within the same cluster.
77
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
88
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
99
- You can further tune the model generated by automated machine learning if you chose to.
10-
- Every run (including the best run) is available as a pipeline.
10+
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
1111
- The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.
1212

13+
1314
**Create Azure Databricks Cluster:**
1415

1516
Select New Cluster and fill in following detail:
1617
- Cluster name: _yourclustername_
17-
- Databricks Runtime: Any 4.x runtime.
18+
- Databricks Runtime: Any **non ML** runtime (non ML 4.x, 5.x)
1819
- Python version: **3**
1920
- Workers: 2 or higher.
2021

@@ -46,25 +47,25 @@ It will take few minutes to create the cluster. Please ensure that the cluster s
4647

4748
- Click Install Library
4849

49-
- Do not select _Attach automatically to all clusters_. In case you have selected earlier then you can go to your Home folder and deselect it.
50+
- Do not select _Attach automatically to all clusters_. In case you selected this earlier, please go to your Home folder and deselect it.
5051

5152
- Select the check box _Attach_ next to your cluster name
5253

5354
(More details on how to attach and detach libs are here - [https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster](https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster) )
5455

5556
- Ensure that there are no errors until Status changes to _Attached_. It may take a couple of minutes.
5657

57-
**Note** - If you have the old build the please deselect it from cluster’s installed libs > move to trash. Install the new build and restart the cluster. And if still there is an issue then detach and reattach your cluster.
58+
**Note** - If you have an old SDK version, please deselect it from cluster’s installed libs > move to trash. Install the new SDK verdion and restart the cluster. If there is an issue after this, please detach and reattach your cluster.
5859

59-
iPython Notebooks 1-4 have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing [databricks_amlsdk](Databricks_AMLSDK_1-4_6.dbc) instead of downloading individually.
60+
**Single file** -
61+
The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually.
6062

61-
Notebooks 1-4 are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. Notebook 6 is an Automated ML sample notebook.
63+
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
6264

63-
For details on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
65+
Notebook 6 is an Automated ML sample notebook for Classification.
6466

6567
Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.
6668

67-
You can also use Azure Databricks as a compute target for [training models with an Azure Machine Learning pipeline](https://docs.microsoft.com/machine-learning/service/how-to-set-up-training-targets#databricks).
68-
69+
For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
6970

7071
**Please let us know your feedback.**

how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb

Lines changed: 27 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -13,45 +13,31 @@
1313
"cell_type": "markdown",
1414
"metadata": {},
1515
"source": [
16-
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
16+
"# Automated ML on Azure Databricks\n",
1717
"\n",
18-
"**install azureml-sdk with Automated ML**\n",
19-
"* Source: Upload Python Egg or PyPi\n",
20-
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
21-
"* Select Install Library"
22-
]
23-
},
24-
{
25-
"cell_type": "markdown",
26-
"metadata": {},
27-
"source": [
28-
"# AutoML : Classification with Local Compute on Azure DataBricks\n",
29-
"\n",
30-
"In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
18+
"In this example we use the scikit-learn's <a href=\"http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset\" target=\"_blank\">digit dataset</a> to showcase how you can use AutoML for a simple classification problem.\n",
3119
"\n",
3220
"In this notebook you will learn how to:\n",
3321
"1. Create Azure Machine Learning Workspace object and initialize your notebook directory to easily reload this object from a configuration file.\n",
3422
"2. Create an `Experiment` in an existing `Workspace`.\n",
35-
"3. Configure AutoML using `AutoMLConfig`.\n",
36-
"4. Train the model using AzureDataBricks.\n",
23+
"3. Configure Automated ML using `AutoMLConfig`.\n",
24+
"4. Train the model using Azure Databricks.\n",
3725
"5. Explore the results.\n",
3826
"6. Test the best fitted model.\n",
3927
"\n",
40-
"Prerequisites:\n",
41-
"Before running this notebook, please follow the readme for installing necessary libraries to your cluster."
28+
"Before running this notebook, please follow the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks\" target=\"_blank\">readme for using Automated ML on Azure Databricks</a> for installing necessary libraries to your cluster."
4229
]
4330
},
4431
{
4532
"cell_type": "markdown",
4633
"metadata": {},
4734
"source": [
48-
"## Register Machine Learning Services Resource Provider\n",
49-
"Microsoft.MachineLearningServices only needs to be registed once in the subscription. To register it:\n",
50-
"Start the Azure portal.\n",
51-
"Select your All services and then Subscription.\n",
52-
"Select the subscription that you want to use.\n",
53-
"Click on Resource providers\n",
54-
"Click the Register link next to Microsoft.MachineLearningServices"
35+
"We support installing AML SDK with Automated ML as library from GUI. When attaching a library follow <a href=\"https://docs.databricks.com/user-guide/libraries.html\" target=\"_blank\">this link</a> and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
36+
"\n",
37+
"**azureml-sdk with automated ml**\n",
38+
"* Source: Upload Python Egg or PyPi\n",
39+
"* PyPi Name: `azureml-sdk[automl_databricks]`\n",
40+
"* Select Install Library"
5541
]
5642
},
5743
{
@@ -97,10 +83,10 @@
9783
"metadata": {},
9884
"outputs": [],
9985
"source": [
100-
"subscription_id = \"<Your SubscriptionId>\"\n",
101-
"resource_group = \"<Resource group - new or existing>\"\n",
102-
"workspace_name = \"<workspace to be created>\"\n",
103-
"workspace_region = \"<azureregion>\" #eg. eastus2, westcentralus, westeurope"
86+
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
87+
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
88+
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
89+
"workspace_region = \"<azureregion>\" #your region"
10490
]
10591
},
10692
{
@@ -132,8 +118,7 @@
132118
"ws = Workspace.create(name = workspace_name,\n",
133119
" subscription_id = subscription_id,\n",
134120
" resource_group = resource_group, \n",
135-
" location = workspace_region,\n",
136-
" auth = auth,\n",
121+
" location = workspace_region, \n",
137122
" exist_ok=True)\n",
138123
"ws.get_details()"
139124
]
@@ -143,21 +128,7 @@
143128
"execution_count": null,
144129
"metadata": {},
145130
"outputs": [],
146-
"source": [
147-
"from azureml.core import Workspace\n",
148-
"import azureml.core\n",
149-
"\n",
150-
"# Check core SDK version number\n",
151-
"print(\"SDK version:\", azureml.core.VERSION)\n",
152-
"\n",
153-
"#'''\n",
154-
"ws = Workspace.from_config()\n",
155-
"print('Workspace name: ' + ws.name, \n",
156-
" 'Azure region: ' + ws.location, \n",
157-
" 'Subscription id: ' + ws.subscription_id, \n",
158-
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
159-
"#'''"
160-
]
131+
"source": []
161132
},
162133
{
163134
"cell_type": "markdown",
@@ -213,7 +184,7 @@
213184
"source": [
214185
"## Create an Experiment\n",
215186
"\n",
216-
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
187+
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
217188
]
218189
},
219190
{
@@ -239,15 +210,6 @@
239210
"from azureml.train.automl.run import AutoMLRun"
240211
]
241212
},
242-
{
243-
"cell_type": "code",
244-
"execution_count": null,
245-
"metadata": {},
246-
"outputs": [],
247-
"source": [
248-
"ws = Workspace.from_config(auth = auth)"
249-
]
250-
},
251213
{
252214
"cell_type": "code",
253215
"execution_count": null,
@@ -304,6 +266,9 @@
304266
"metadata": {},
305267
"outputs": [],
306268
"source": [
269+
"#Automated ML requires a dataflow, which is different from dataframe.\n",
270+
"#If your data is in a dataframe, please use read_pandas_dataframe to convert a dataframe to dataflow before usind dprep.\n",
271+
"\n",
307272
"import azureml.dataprep as dprep\n",
308273
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
309274
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
@@ -375,7 +340,6 @@
375340
" spark_context=sc, #databricks/spark related\n",
376341
" X = X_train, \n",
377342
" y = y_train,\n",
378-
" enable_cache=False,\n",
379343
" path = project_folder)"
380344
]
381345
},
@@ -420,7 +384,7 @@
420384
"metadata": {},
421385
"outputs": [],
422386
"source": [
423-
"print(local_run.get_portal_url())"
387+
"displayHTML(\"<a href={} target='_blank'>Your experiment in Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
424388
]
425389
},
426390
{
@@ -548,7 +512,9 @@
548512
"cell_type": "markdown",
549513
"metadata": {},
550514
"source": [
551-
"When deploying an automated ML trained model, please specify _pip_packages=['azureml-sdk[automl]']_ in your CondaDependencies."
515+
"When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
516+
"\n",
517+
"Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
552518
]
553519
},
554520
{
@@ -586,8 +552,8 @@
586552
"version": "3.7.0"
587553
},
588554
"name": "auto-ml-classification-local-adb",
589-
"notebookId": 3836944406456411
555+
"notebookId": 817220787969977
590556
},
591557
"nbformat": 4,
592-
"nbformat_minor": 1
558+
"nbformat_minor": 0
593559
}

0 commit comments

Comments
 (0)