Skip to content

Commit df2e08e

Browse files
authored
Merge pull request Azure#622 from Azure/release_update/Release-25
update samples - test
2 parents 1a373f1 + 828a976 commit df2e08e

File tree

10 files changed

+27
-150
lines changed

10 files changed

+27
-150
lines changed
Binary file not shown.
+14-54
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,33 @@
1-
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
1+
Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster.
22

3-
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
3+
In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks.
44

5-
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
6-
- You can keep the data within the same cluster.
7-
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
8-
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
9-
- You can further tune the model generated by automated machine learning if you chose to.
10-
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
5+
- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
6+
- You can keep the data within the same cluster.
7+
- You can leverage the local worker nodes with autoscale and auto termination capabilities.
8+
- You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
9+
- You can further tune the model generated by automated machine learning if you chose to.
10+
- Every run (including the best run) is available as a pipeline, which you can tune further if needed.
1111
- The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.
1212

1313
Please follow our [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#azure-databricks) to install the sdk in your Azure Databricks cluster before trying any of the sample notebooks.
1414

15-
**Single file** -
15+
**Single file** -
1616
The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually.
1717

18-
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
18+
Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks.
1919

2020
Notebook 6 is an Automated ML sample notebook for Classification.
2121

2222
Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.
2323

24-
**Databricks as a Compute Target from Azure ML Pipelines**
25-
You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb).
26-
27-
# Linked Azure Databricks and Azure Machine Learning Workspaces (Preview)
28-
Customers can now link Azure Databricks and AzureML Workspaces to better enable cross-Azure ML scenarios by [managing their tracking data in a single place when using the MLflow client](https://mlflow.org/docs/latest/tracking.html#mlflow-tracking) - the Azure ML workspace.
29-
30-
## Linking the Workspaces (Admin operation)
31-
32-
1. The Azure Databricks Azure portal blade now includes a new button to link an Azure ML workspace.
33-
![New ADB Portal Link button](./img/adb-link-button.png)
34-
2. Both a new or existing Azure ML Workspace can be linked in the resulting prompt. Follow any instructions to set up the Azure ML Workspace.
35-
![Link Prompt](./img/link-prompt.png)
36-
3. After a successful link operation, you should see the Azure Databricks overview reflect the linked status
37-
![Linked Successfully](./img/adb-successful-link.png)
38-
39-
## Configure MLflow to send data to Azure ML (All roles)
40-
41-
1. Add azureml-mlflow as a library to any notebook or cluster that should send data to Azure ML. You can do this via:
42-
1. [DBUtils](https://docs.azuredatabricks.net/user-guide/dev-tools/dbutils.html#dbutils-library)
43-
```
44-
dbutils.library.installPyPI("azureml-mlflow")
45-
dbutils.library.restartPython() # Removes Python state
46-
```
47-
2. [Cluster Libraries](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster)
48-
![Cluster Library](./img/cluster-library.png)
49-
2. [Set the MLflow tracking URI](https://mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) to the following scheme:
50-
```
51-
adbazureml://${azuremlRegion}.experiments.azureml.net/history/v1.0/subscriptions/${azuremlSubscriptionId}/resourceGroups/${azuremlResourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${azuremlWorkspaceName}
52-
```
53-
1. You can automatically configure this on your clusters for all subsequent notebook sessions using this helper script instead of manually setting the tracking URI in the notebook:
54-
* [AzureML Tracking Cluster Init Script](./linking/README.md)
55-
3. If configured correctly, you'll now be able to see your MLflow tracking data in both Azure ML (via the REST API and all clients) and Azure Databricks (in the MLflow UI and using the MLflow client)
56-
57-
58-
## Known Preview Limitations
59-
While we roll this experience out to customers for feedback, there are some known limitations we'd love comments on in addition to any other issues seen in your workflow.
60-
### 1-to-1 Workspace linking
61-
Currently, an Azure ML Workspace can only be linked to one Azure Databricks Workspace at a time.
62-
### Data synchronization
63-
At the moment, data is only generated in the Azure Machine Learning workspace for tracking. Editing tags via the Azure Databricks MLflow UI won't be reflected in the Azure ML UI.
64-
### Java and R support
65-
The experience currently is only available from the Python MLflow client.
24+
**Databricks as a Compute Target from AML Pipelines**
25+
You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb).
6626

6727
For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).
6828

6929
**Please let us know your feedback.**
7030

31+
7132

72-
73-
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png)
33+
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png)
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

how-to-use-azureml/azure-databricks/linking/README.md

-56
This file was deleted.

how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh

-24
This file was deleted.

how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb

+6-5
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@
361361
"outputs": [],
362362
"source": [
363363
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
364-
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
364+
" pip_packages=['azureml-monitoring', 'azureml-defaults'])\n",
365365
"\n",
366366
"with open(\"myenv.yml\",\"w\") as f:\n",
367367
" f.write(myenv.serialize_to_string())"
@@ -626,7 +626,8 @@
626626
"metadata": {},
627627
"outputs": [],
628628
"source": [
629-
"target_date = datetime.today()\n",
629+
"now = datetime.utcnow()\n",
630+
"target_date = datetime(now.year, now.month, now.day)\n",
630631
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
631632
]
632633
},
@@ -655,7 +656,7 @@
655656
"source": [
656657
"child_run.wait_for_completion(wait_post_processing=True)\n",
657658
"\n",
658-
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
659+
"drift_metrics = datadrift.get_output(run_id=run.id)\n",
659660
"drift_metrics"
660661
]
661662
},
@@ -668,7 +669,7 @@
668669
"# Show all drift figures, one per serivice.\n",
669670
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
670671
"\n",
671-
"drift_figures = datadrift.show(with_details=True)"
672+
"drift_figures = datadrift.show()"
672673
]
673674
},
674675
{
@@ -691,7 +692,7 @@
691692
"metadata": {
692693
"authors": [
693694
{
694-
"name": "rafarmah"
695+
"name": "dmdatadrift"
695696
}
696697
],
697698
"kernelspec": {

how-to-use-azureml/monitor-models/data-drift/score.py

+7-11
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,10 @@
1-
import pickle
21
import json
3-
import numpy
4-
import azureml.train.automl
5-
from sklearn.externals import joblib
6-
from sklearn.linear_model import Ridge
7-
from azureml.core.model import Model
8-
from azureml.core.run import Run
9-
from azureml.monitoring import ModelDataCollector
102
import time
3+
114
import pandas as pd
5+
from azureml.core.model import Model
6+
from azureml.monitoring import ModelDataCollector
7+
from sklearn.externals import joblib
128

139

1410
def init():
@@ -25,11 +21,11 @@ def init():
2521
categorical_features = ["usaf", "wban", "p_k", "station_name"]
2622

2723
inputs_dc = ModelDataCollector(model_name="driftmodel",
28-
identifier="inputs",
24+
designation="inputs",
2925
feature_names=feature_names)
3026

31-
prediction_dc = ModelDataCollector("driftmodel",
32-
identifier="predictions",
27+
prediction_dc = ModelDataCollector(model_name="driftmodel",
28+
designation="predictions",
3329
feature_names=["temperature"])
3430

3531

0 commit comments

Comments
 (0)