Azure
diff --git a/‎how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc
472 Bytes b/‎how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc
472 Bytes
diff --git a/‎how-to-use-azureml/azure-databricks/README.md
Lines changed: 11 additions & 10 deletions b/‎how-to-use-azureml/azure-databricks/README.md
Lines changed: 11 additions & 10 deletions
diff --git a/‎how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb
Lines changed: 27 additions & 61 deletions b/‎how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb
Lines changed: 27 additions & 61 deletions
@@ -1,20 +1,21 @@
 Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster. 
 
-In this section, you will see sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. 
+In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. 
 
 - Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning. 
 - You can keep the data within the same cluster. 
 - You can leverage the local worker nodes with autoscale and auto termination capabilities. 
 - You can use multiple cores of your Azure Databricks cluster to perform simultenous training. 
 - You can further tune the model generated by automated machine learning if you chose to. 
-- Every run (including the best run) is available as a pipeline. 
+- Every run (including the best run) is available as a pipeline, which you can tune further if needed. 
 - The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.
 
+
 **Create Azure Databricks Cluster:**
 
 Select New Cluster and fill in following detail:
  - Cluster name: _yourclustername_
- - Databricks Runtime: Any 4.x runtime.
+ - Databricks Runtime: Any **non ML** runtime (non ML 4.x, 5.x)
  - Python version: **3**
  - Workers: 2 or higher.  
 
@@ -46,25 +47,25 @@ It will take few minutes to create the cluster. Please ensure that the cluster s
 
 - Click Install Library
 
-- Do not select _Attach automatically to all clusters_. In case you have selected earlier then you can go to your Home folder and deselect it.
+- Do not select _Attach automatically to all clusters_. In case you selected this earlier, please go to your Home folder and deselect it.
 
 - Select the check box _Attach_ next to your cluster name
 
 (More details on how to attach and detach libs are here - [https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster](https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster) )
 
 - Ensure that there are no errors until Status changes to _Attached_. It may take a couple of minutes.
 
-**Note** - If you have the old build the please deselect it from cluster’s installed libs > move to trash. Install the new build and restart the cluster. And if still there is an issue then detach and reattach your cluster.
+**Note** - If you have an old SDK version, please deselect it from cluster’s installed libs > move to trash. Install the new SDK verdion and restart the cluster. If there is an issue after this, please detach and reattach your cluster.
 
-iPython Notebooks 1-4 have to be run sequentially after making changes based on your subscription. The corresponding DBC archive contains all the notebooks and can be imported into your Databricks workspace. You can the run notebooks after importing [databricks_amlsdk](Databricks_AMLSDK_1-4_6.dbc) instead of downloading individually.
+**Single file** - 
+The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually.
 
-Notebooks 1-4 are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. Notebook 6 is an Automated ML sample notebook.
+Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. 
 
-For details on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
+Notebook 6 is an Automated ML sample notebook for Classification.
 
 Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.
 
-You can also use Azure Databricks as a compute target for [training models with an Azure Machine Learning pipeline](https://docs.microsoft.com/machine-learning/service/how-to-set-up-training-targets#databricks).
-
+For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
 
 **Please let us know your feedback.**
@@ -13,45 +13,31 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
+        "# Automated ML on Azure Databricks\n",
         "\n",
-        "**install azureml-sdk with Automated ML**\n",
-        "* Source: Upload Python Egg or PyPi\n",
-        "* PyPi Name: `azureml-sdk[automl_databricks]`\n",
-        "* Select Install Library"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "# AutoML : Classification with Local Compute on Azure DataBricks\n",
-        "\n",
-        "In this example we use the scikit-learn's [digit dataset](http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset) to showcase how you can use AutoML for a simple classification problem.\n",
+        "In this example we use the scikit-learn's <a href=\"http://scikit-learn.org/stable/datasets/index.html#optical-recognition-of-handwritten-digits-dataset\" target=\"_blank\">digit dataset</a> to showcase how you can use AutoML for a simple classification problem.\n",
         "\n",
         "In this notebook you will learn how to:\n",
         "1. Create Azure Machine Learning Workspace object and initialize your notebook directory to easily reload this object from a configuration file.\n",
         "2. Create an `Experiment` in an existing `Workspace`.\n",
-        "3. Configure AutoML using `AutoMLConfig`.\n",
-        "4. Train the model using AzureDataBricks.\n",
+        "3. Configure Automated ML using `AutoMLConfig`.\n",
+        "4. Train the model using Azure Databricks.\n",
         "5. Explore the results.\n",
         "6. Test the best fitted model.\n",
         "\n",
-        "Prerequisites:\n",
-        "Before running this notebook, please follow the readme for installing necessary libraries to your cluster."
+        "Before running this notebook, please follow the <a href=\"https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks\" target=\"_blank\">readme for using Automated ML on Azure Databricks</a> for installing necessary libraries to your cluster."
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Register Machine Learning Services Resource Provider\n",
-        "Microsoft.MachineLearningServices only needs to be registed once in the subscription. To register it:\n",
-        "Start the Azure portal.\n",
-        "Select your All services and then Subscription.\n",
-        "Select the subscription that you want to use.\n",
-        "Click on Resource providers\n",
-        "Click the Register link next to Microsoft.MachineLearningServices"
+        "We support installing AML SDK with Automated ML as library from GUI. When attaching a library follow <a href=\"https://docs.databricks.com/user-guide/libraries.html\" target=\"_blank\">this link</a> and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
+        "\n",
+        "**azureml-sdk with automated ml**\n",
+        "* Source: Upload Python Egg or PyPi\n",
+        "* PyPi Name: `azureml-sdk[automl_databricks]`\n",
+        "* Select Install Library"
       ]
     },
     {
@@ -97,10 +83,10 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "subscription_id = \"<Your SubscriptionId>\"\n",
-        "resource_group = \"<Resource group - new or existing>\"\n",
-        "workspace_name = \"<workspace to be created>\"\n",
-        "workspace_region = \"<azureregion>\" #eg. eastus2, westcentralus, westeurope"
+        "subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
+        "resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
+        "workspace_name = \"<workspace to be created>\" #your workspace name\n",
+        "workspace_region = \"<azureregion>\" #your region"
       ]
     },
     {
@@ -132,8 +118,7 @@
         "ws = Workspace.create(name = workspace_name,\n",
         "                      subscription_id = subscription_id,\n",
         "                      resource_group = resource_group, \n",
-        "                      location = workspace_region,\n",
-        "                      auth = auth,\n",
+        "                      location = workspace_region,                      \n",
         "                      exist_ok=True)\n",
         "ws.get_details()"
       ]
@@ -143,21 +128,7 @@
       "execution_count": null,
       "metadata": {},
       "outputs": [],
-      "source": [
-        "from azureml.core import Workspace\n",
-        "import azureml.core\n",
-        "\n",
-        "# Check core SDK version number\n",
-        "print(\"SDK version:\", azureml.core.VERSION)\n",
-        "\n",
-        "#'''\n",
-        "ws = Workspace.from_config()\n",
-        "print('Workspace name: ' + ws.name, \n",
-        "      'Azure region: ' + ws.location, \n",
-        "      'Subscription id: ' + ws.subscription_id, \n",
-        "      'Resource group: ' + ws.resource_group, sep = '\\n')\n",
-        "#'''"
-      ]
+      "source": []
     },
     {
       "cell_type": "markdown",
@@ -213,7 +184,7 @@
       "source": [
         "## Create an Experiment\n",
         "\n",
-        "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
+        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
       ]
     },
     {
@@ -239,15 +210,6 @@
         "from azureml.train.automl.run import AutoMLRun"
       ]
     },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "ws = Workspace.from_config(auth = auth)"
-      ]
-    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -304,6 +266,9 @@
       "metadata": {},
       "outputs": [],
       "source": [
+        "#Automated ML requires a dataflow, which is different from dataframe.\n",
+        "#If your data is in a dataframe, please use read_pandas_dataframe to convert a dataframe to dataflow before usind dprep.\n",
+        "\n",
         "import azureml.dataprep as dprep\n",
         "# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
         "# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
@@ -375,7 +340,6 @@
         "                             spark_context=sc, #databricks/spark related\n",
         "                             X = X_train, \n",
         "                             y = y_train,\n",
-        "                             enable_cache=False,\n",
         "                             path = project_folder)"
       ]
     },
@@ -420,7 +384,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "print(local_run.get_portal_url())"
+        "displayHTML(\"<a href={} target='_blank'>Your experiment in Azure Portal: {}</a>\".format(local_run.get_portal_url(), local_run.id))"
       ]
     },
     {
@@ -548,7 +512,9 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "When deploying an automated ML trained model, please specify _pip_packages=['azureml-sdk[automl]']_ in your CondaDependencies."
+        "When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n",
+        "\n",
+        "Please refer to only the **Deploy** section in this notebook - <a href=\"https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment\" target=\"_blank\">Deployment of Automated ML trained model</a>"
       ]
     },
     {
@@ -586,8 +552,8 @@
       "version": "3.7.0"
     },
     "name": "auto-ml-classification-local-adb",
-    "notebookId": 3836944406456411
+    "notebookId": 817220787969977
   },
   "nbformat": 4,
-  "nbformat_minor": 1
+  "nbformat_minor": 0
 }