Skip to content

Commit 60158bf

Browse files
authored
Merge pull request Azure#341 from rastala/master
version 1.0.33
2 parents 6e6b2b0 + 8dbbb01 commit 60158bf

File tree

36 files changed

+3211
-211
lines changed

36 files changed

+3211
-211
lines changed

NBSETUP.md

+3-14
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
# Setting up environment
2-
3-
---
1+
# Set up your notebook environment for Azure Machine Learning
42

53
To run the notebooks in this repository use one of following options.
64

@@ -12,9 +10,7 @@ Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. A
1210
1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
1311
1. Open one of the sample notebooks
1412

15-
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook
16-
17-
![set kernel to Python 3.6](images/python36.png)
13+
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
1814

1915
## **Option 2: Use your own notebook server**
2016

@@ -31,9 +27,6 @@ git clone https://github.com/Azure/MachineLearningNotebooks.git
3127
# install the base SDK and a Jupyter notebook server
3228
pip install azureml-sdk[notebooks]
3329

34-
# install the data prep component
35-
pip install azureml-dataprep
36-
3730
# install model explainability component
3831
pip install azureml-sdk[explain]
3932

@@ -58,8 +51,7 @@ Please make sure you start with the [Configuration](configuration.ipynb) noteboo
5851

5952
### Video walkthrough:
6053

61-
[![Get Started video](images/yt_cover.png)](https://youtu.be/VIsXeTuW3FU)
62-
54+
[!VIDEO https://youtu.be/VIsXeTuW3FU]
6355

6456
## **Option 3: Use Docker**
6557

@@ -90,9 +82,6 @@ Now you can point your browser to http://localhost:8887. We recommend that you s
9082
If you need additional Azure ML SDK components, you can either modify the Docker files before you build the Docker images to add additional steps, or install them through command line in the live container after you build the Docker image. For example:
9183

9284
```sh
93-
# install dataprep components
94-
pip install azureml-dataprep
95-
9685
# install the core SDK and automated ml components
9786
pip install azureml-sdk[automl]
9887

README.md

+1-3
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,4 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
5353
Visit following repos to see projects contributed by Azure ML users:
5454

5555
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
56-
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
57-
58-
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)
56+
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)

how-to-use-azureml/automated-machine-learning/README.md

+33-31
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Table of Contents
22
1. [Automated ML Introduction](#introduction)
3-
1. [Running samples in Azure Notebooks](#jupyter)
4-
1. [Running samples in Azure Databricks](#databricks)
5-
1. [Running samples in a Local Conda environment](#localconda)
3+
1. [Setup using Azure Notebooks](#jupyter)
4+
1. [Setup using Azure Databricks](#databricks)
5+
1. [Setup using a Local Conda environment](#localconda)
66
1. [Automated ML SDK Sample Notebooks](#samples)
77
1. [Documentation](#documentation)
88
1. [Running using python command](#pythoncommand)
@@ -13,23 +13,23 @@
1313
Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
1414

1515

16-
If you are new to Data Science, AutoML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
16+
If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
1717

18-
If you are an experienced data scientist, AutoML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. AutoML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
18+
If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. Automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
1919

20-
Below are the three execution environments supported by AutoML.
20+
Below are the three execution environments supported by automated ML.
2121

2222

2323
<a name="jupyter"></a>
24-
## Running samples in Azure Notebooks - Jupyter based notebooks in the Azure cloud
24+
## Setup using Azure Notebooks - Jupyter based notebooks in the Azure cloud
2525

2626
1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks)
2727
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks.
2828
1. Follow the instructions in the [configuration](../../configuration.ipynb) notebook to create and connect to a workspace.
2929
1. Open one of the sample notebooks.
3030

3131
<a name="databricks"></a>
32-
## Running samples in Azure Databricks
32+
## Setup using Azure Databricks
3333

3434
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
3535
**NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
@@ -39,7 +39,7 @@ Below are the three execution environments supported by AutoML.
3939
- Attach the notebook to the cluster.
4040

4141
<a name="localconda"></a>
42-
## Running samples in a Local Conda environment
42+
## Setup using a Local Conda environment
4343

4444
To run these notebook on your own notebook server, use these installation instructions.
4545
The instructions below will install everything you need and then start a Jupyter notebook.
@@ -49,11 +49,15 @@ The instructions below will install everything you need and then start a Jupyter
4949
There's no need to install mini-conda specifically.
5050

5151
### 2. Downloading the sample notebooks
52-
- Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory. The AutoML sample notebooks are in the "automl" folder.
52+
- Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory. The automated ML sample notebooks are in the "automated-machine-learning" folder.
5353

5454
### 3. Setup a new conda environment
55-
The **automl/automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook.
56-
It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
55+
The **automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
56+
57+
Packages installed by the **automl_setup** script:
58+
<ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>scipy</li><li>scikit-learn</li><li>pandas</li><li>tensorflow</li><li>py-xgboost</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
59+
60+
For more details refer to the [automl_env.yml](./automl_env.yml)
5761
## Windows
5862
Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
5963
```
@@ -81,7 +85,7 @@ bash automl_setup_linux.sh
8185

8286
### 5. Running Samples
8387
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
84-
- Follow the instructions in the individual notebooks to explore various features in AutoML
88+
- Follow the instructions in the individual notebooks to explore various features in automated ML.
8589

8690
### 6. Starting jupyter notebook manually
8791
To start your Jupyter notebook manually, use:
@@ -103,22 +107,22 @@ jupyter notebook
103107

104108
- [auto-ml-classification.ipynb](classification/auto-ml-classification.ipynb)
105109
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
106-
- Simple example of using Auto ML for classification
110+
- Simple example of using automated ML for classification
107111
- Uses local compute for training
108112

109113
- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
110114
- Dataset: scikit learn's [diabetes dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html)
111-
- Simple example of using Auto ML for regression
115+
- Simple example of using automated ML for regression
112116
- Uses local compute for training
113117

114118
- [auto-ml-remote-execution.ipynb](remote-execution/auto-ml-remote-execution.ipynb)
115119
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
116-
- Example of using Auto ML for classification using a remote linux DSVM for training
120+
- Example of using automated ML for classification using a remote linux DSVM for training
117121
- Parallel execution of iterations
118122
- Async tracking of progress
119123
- Cancelling individual iterations or entire run
120124
- Retrieving models for any iteration or logged metric
121-
- Specify automl settings as kwargs
125+
- Specify automated ML settings as kwargs
122126

123127
- [auto-ml-remote-amlcompute.ipynb](remote-batchai/auto-ml-remote-amlcompute.ipynb)
124128
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
@@ -127,7 +131,7 @@ jupyter notebook
127131
- Async tracking of progress
128132
- Cancelling individual iterations or entire run
129133
- Retrieving models for any iteration or logged metric
130-
- Specify automl settings as kwargs
134+
- Specify automated ML settings as kwargs
131135

132136
- [auto-ml-remote-attach.ipynb](remote-attach/auto-ml-remote-attach.ipynb)
133137
- Dataset: Scikit learn's [20newsgroup](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html)
@@ -148,8 +152,8 @@ jupyter notebook
148152

149153
- [auto-ml-exploring-previous-runs.ipynb](exploring-previous-runs/auto-ml-exploring-previous-runs.ipynb)
150154
- List all projects for the workspace
151-
- List all AutoML Runs for a given project
152-
- Get details for a AutoML Run. (Automl settings, run widget & all metrics)
155+
- List all automated ML Runs for a given project
156+
- Get details for a automated ML Run. (automated ML settings, run widget & all metrics)
153157
- Download fitted pipeline for any iteration
154158

155159
- [auto-ml-remote-execution-with-datastore.ipynb](remote-execution-with-datastore/auto-ml-remote-execution-with-datastore.ipynb)
@@ -158,7 +162,7 @@ jupyter notebook
158162

159163
- [auto-ml-classification-with-deployment.ipynb](classification-with-deployment/auto-ml-classification-with-deployment.ipynb)
160164
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
161-
- Simple example of using Auto ML for classification
165+
- Simple example of using automated ML for classification
162166
- Registering the model
163167
- Creating Image and creating aci service
164168
- Testing the aci service
@@ -178,20 +182,20 @@ jupyter notebook
178182

179183
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
180184
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
181-
- Simple example of using Auto ML for classification with whitelisting tensorflow models.
185+
- Simple example of using automated ML for classification with whitelisting tensorflow models.
182186
- Uses local compute for training
183187

184188
- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
185189
- Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
186-
- Example of using AutoML for training a forecasting model
190+
- Example of using automated ML for training a forecasting model
187191

188192
- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
189193
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
190-
- Example of training an AutoML forecasting model on multiple time-series
194+
- Example of training an automated ML forecasting model on multiple time-series
191195

192196
- [auto-ml-classification-with-onnx.ipynb](classification-with-onnx/auto-ml-classification-with-onnx.ipynb)
193197
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
194-
- Simple example of using Auto ML for classification with ONNX models
198+
- Simple example of using automated ML for classification with ONNX models
195199
- Uses local compute for training
196200

197201
<a name="documentation"></a>
@@ -259,7 +263,7 @@ There are several reasons why the DsvmCompute.create can fail. The reason is us
259263
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
260264

261265
## Remote run: Unable to establish SSH connection
262-
AutoML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
266+
Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
263267
1) The DSVM is not ready for SSH connections. When DSVM creation completes, the DSVM might still not be ready to acceept SSH connections. The sample notebooks have a one minute delay to allow for this.
264268
2) Your Azure Subscription may restrict the IP address ranges that can access the DSVM on port 22. You can check this in the Azure Portal by selecting the Virtual Machine and then clicking Networking. The Virtual Machine name is the name that you provided in the notebook plus 10 alpha numeric characters to make the name unique. The Inbound Port Rules define what can access the VM on specific ports. Note that there is a priority priority order. So, a Deny entry with a low priority number will override a Allow entry with a higher priority number.
265269

@@ -270,18 +274,16 @@ This is often an issue with the `get_data` method.
270274
3) You can get to the error log for the setup iteration by clicking the `Click here to see the run in Azure portal` link, click `Back to Experiment`, click on the highest run number and then click on Logs.
271275

272276
## Remote run: disk full
273-
AutoML creates files under /tmp/azureml_runs for each iteration that it runs. It creates a folder with the iteration id. For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93. Under this, there is a azureml-logs folder, which contains logs. If you run too many iterations on the same DSVM, these files can fill the disk.
277+
Automated ML creates files under /tmp/azureml_runs for each iteration that it runs. It creates a folder with the iteration id. For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93. Under this, there is a azureml-logs folder, which contains logs. If you run too many iterations on the same DSVM, these files can fill the disk.
274278
You can delete the files under /tmp/azureml_runs or just delete the VM and create a new one.
275279
If your get_data downloads files, make sure the delete them or they can use disk space as well.
276280
When using DataStore, it is good to specify an absolute path for the files so that they are downloaded just once. If you specify a relative path, it will download a file for each iteration.
277281

278282
## Remote run: Iterations fail and the log contains "MemoryError"
279-
This can be caused by insufficient memory on the DSVM. AutoML loads all training data into memory. So, the available memory should be more than the training data size.
283+
This can be caused by insufficient memory on the DSVM. Automated ML loads all training data into memory. So, the available memory should be more than the training data size.
280284
If you are using a remote DSVM, memory is needed for each concurrent iteration. The max_concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
281285
To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
282286

283287
## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
284288
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
285-
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
286-
287-
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/README.png)
289+
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.

how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb

+8-1
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,14 @@
162162
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
163163
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
164164
"|**n_cross_validations**|Number of cross validation splits.|\n",
165-
"|<i>Exit Criteria [optional]</i><br><br>iterations<br>experiment_timeout_minutes|An optional duration parameter that says how long AutoML should be run.<br>This could be either number of iterations or number of minutes AutoML is allowed to run. <br><br><i>iterations</i> number of algorithm iterations to run<br><i>experiment_timeout_minutes</i> is the number of minutes that AutoML should run<br><br>By default, this is set to stop whenever AutoML determines that progress in scores is not being made|"
165+
"|\n",
166+
"\n",
167+
"Automated machine learning trains multiple machine learning pipelines. Each pipelines training is known as an iteration.\n",
168+
"* You can specify a maximum number of iterations using the `iterations` parameter.\n",
169+
"* You can specify a maximum time for the run using the `experiment_timeout_minutes` parameter.\n",
170+
"* If you specify neither the `iterations` nor the `experiment_timeout_minutes`, automated ML keeps running iterations while it continues to see improvements in the scores.\n",
171+
"\n",
172+
"The following example doesn't specify `iterations` or `experiment_timeout_minutes` and so runs until the scores stop improving.\n"
166173
]
167174
},
168175
{

0 commit comments

Comments
 (0)