Skip to content

Commit 644729e

Browse files
authored
Merge pull request Azure#333 from rastala/master
version 1.0.30
2 parents dc69258 + e2b1b3f commit 644729e

File tree

31 files changed

+2045
-901
lines changed

31 files changed

+2045
-901
lines changed

NBSETUP.md

+9-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# Set up your notebook environment for Azure Machine Learning
1+
# Setting up environment
2+
3+
---
24

35
To run the notebooks in this repository use one of following options.
46

@@ -10,7 +12,9 @@ Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. A
1012
1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
1113
1. Open one of the sample notebooks
1214

13-
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
15+
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook
16+
17+
![set kernel to Python 3.6](images/python36.png)
1418

1519
## **Option 2: Use your own notebook server**
1620

@@ -54,7 +58,8 @@ Please make sure you start with the [Configuration](configuration.ipynb) noteboo
5458

5559
### Video walkthrough:
5660

57-
[!VIDEO https://youtu.be/VIsXeTuW3FU]
61+
[![Get Started video](images/yt_cover.png)](https://youtu.be/VIsXeTuW3FU)
62+
5863

5964
## **Option 3: Use Docker**
6065

@@ -98,4 +103,4 @@ pip install azureml-sdk[explain]
98103
pip install azureml-sdk[contrib]
99104
```
100105
Drag and Drop
101-
The image will be downloaded by Fatkun
106+
The image will be downloaded by Fatkun

googled8147fb6c0788258.html

-1
This file was deleted.

how-to-use-azureml/automated-machine-learning/README.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -211,10 +211,18 @@ The main code of the file must be indented so that it is under this condition.
211211
<a name="troubleshooting"></a>
212212
# Troubleshooting
213213
## automl_setup fails
214-
1. On windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
214+
1. On Windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
215215
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
216216
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
217-
4. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
217+
4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
218+
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
219+
220+
## automl_setup_linux.sh fails
221+
If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
222+
1. Make sure that outbound ports 53 and 80 are enabled. On an Azure VM, you can do this from the Azure Portal by selecting the VM and clicking on Networking.
223+
2. Run the command: `sudo apt-get update`
224+
3. Run the command: `sudo apt-get install build-essential --fix-missing`
225+
4. Run `automl_setup_linux.sh` again.
218226

219227
## configuration.ipynb fails
220228
1) For local conda, make sure that you have susccessfully run automl_setup first.

how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb

+2-1
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,8 @@
302302
"source": [
303303
"from azureml.core.conda_dependencies import CondaDependencies\n",
304304
"\n",
305-
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
305+
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
306+
" pip_packages=['azureml-sdk[automl]'])\n",
306307
"\n",
307308
"conda_env_file_name = 'myenv.yml'\n",
308309
"myenv.save_to_file('.', conda_env_file_name)"

how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb

+29-8
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,32 @@
7272
"from azureml.train.automl import AutoMLConfig"
7373
]
7474
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"Accessing the Azure ML workspace requires authentication with Azure.\n",
80+
"\n",
81+
"The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
82+
"\n",
83+
"If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
84+
"\n",
85+
"```\n",
86+
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
87+
"auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
88+
"ws = Workspace.from_config(auth = auth)\n",
89+
"```\n",
90+
"\n",
91+
"If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
92+
"\n",
93+
"```\n",
94+
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
95+
"auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
96+
"ws = Workspace.from_config(auth = auth)\n",
97+
"```\n",
98+
"For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
99+
]
100+
},
75101
{
76102
"cell_type": "code",
77103
"execution_count": null,
@@ -133,11 +159,10 @@
133159
"|-|-|\n",
134160
"|**task**|classification or regression|\n",
135161
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
136-
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
137-
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
138162
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
139163
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
140-
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
164+
"|**n_cross_validations**|Number of cross validation splits.|\n",
165+
"|<i>Exit Criteria [optional]</i><br><br>iterations<br>experiment_timeout_minutes|An optional duration parameter that says how long AutoML should be run.<br>This could be either number of iterations or number of minutes AutoML is allowed to run. <br><br><i>iterations</i> number of algorithm iterations to run<br><i>experiment_timeout_minutes</i> is the number of minutes that AutoML should run<br><br>By default, this is set to stop whenever AutoML determines that progress in scores is not being made|"
141166
]
142167
},
143168
{
@@ -147,14 +172,10 @@
147172
"outputs": [],
148173
"source": [
149174
"automl_config = AutoMLConfig(task = 'classification',\n",
150-
" debug_log = 'automl_errors.log',\n",
151175
" primary_metric = 'AUC_weighted',\n",
152-
" iteration_timeout_minutes = 60,\n",
153-
" iterations = 25,\n",
154-
" verbosity = logging.INFO,\n",
155176
" X = X_train, \n",
156177
" y = y_train,\n",
157-
" path = project_folder)"
178+
" n_cross_validations = 3)"
158179
]
159180
},
160181
{

how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb

+46-12
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@
3737
"2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
3838
"3. Training the Model using local compute\n",
3939
"4. Exploring the results\n",
40-
"5. Testing the fitted model"
40+
"5. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
41+
"6. Testing the fitted model"
4142
]
4243
},
4344
{
@@ -126,7 +127,7 @@
126127
"cell_type": "markdown",
127128
"metadata": {},
128129
"source": [
129-
"### Split the data to train and test\n",
130+
"### Get the train data\n",
130131
"\n"
131132
]
132133
},
@@ -172,14 +173,10 @@
172173
"metadata": {},
173174
"outputs": [],
174175
"source": [
175-
"X_train = train[train['timeStamp'] < '2017-01-01']\n",
176-
"X_valid = train[train['timeStamp'] >= '2017-01-01']\n",
176+
"X_train = train\n",
177177
"y_train = X_train.pop('demand').values\n",
178-
"y_valid = X_valid.pop('demand').values\n",
179178
"print(X_train.shape)\n",
180-
"print(y_train.shape)\n",
181-
"print(X_valid.shape)\n",
182-
"print(y_valid.shape)"
179+
"print(y_train.shape)"
183180
]
184181
},
185182
{
@@ -198,8 +195,7 @@
198195
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
199196
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
200197
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
201-
"|**X_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, n_features]|\n",
202-
"|**y_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, ], targets values.|\n",
198+
"|**n_cross_validations**|Number of cross validation splits.|\n",
203199
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
204200
]
205201
},
@@ -222,8 +218,7 @@
222218
" iteration_timeout_minutes = 5,\n",
223219
" X = X_train,\n",
224220
" y = y_train,\n",
225-
" X_valid = X_valid,\n",
226-
" y_valid = y_valid,\n",
221+
" n_cross_validations = 2,\n",
227222
" path=project_folder,\n",
228223
" verbosity = logging.INFO,\n",
229224
" **automl_settings)"
@@ -273,6 +268,45 @@
273268
"fitted_model.steps"
274269
]
275270
},
271+
{
272+
"cell_type": "markdown",
273+
"metadata": {},
274+
"source": [
275+
"### View the engineered names for featurized data\n",
276+
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
277+
]
278+
},
279+
{
280+
"cell_type": "code",
281+
"execution_count": null,
282+
"metadata": {},
283+
"outputs": [],
284+
"source": [
285+
"fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
286+
]
287+
},
288+
{
289+
"cell_type": "markdown",
290+
"metadata": {},
291+
"source": [
292+
"### View the featurization summary\n",
293+
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
294+
"- Raw feature name\n",
295+
"- Number of engineered features formed out of this raw feature\n",
296+
"- Type detected\n",
297+
"- If feature was dropped\n",
298+
"- List of feature transformations for the raw feature"
299+
]
300+
},
301+
{
302+
"cell_type": "code",
303+
"execution_count": null,
304+
"metadata": {},
305+
"outputs": [],
306+
"source": [
307+
"fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
308+
]
309+
},
276310
{
277311
"cell_type": "markdown",
278312
"metadata": {},

how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb

+41-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@
3636
"1. Create an Experiment in an existing Workspace\n",
3737
"2. Instantiate an AutoMLConfig \n",
3838
"3. Find and train a forecasting model using local compute\n",
39-
"4. Evaluate the performance of the model\n",
39+
"4. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
40+
"5. Evaluate the performance of the model\n",
4041
"\n",
4142
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
4243
]
@@ -320,6 +321,45 @@
320321
"fitted_pipeline.steps"
321322
]
322323
},
324+
{
325+
"cell_type": "markdown",
326+
"metadata": {},
327+
"source": [
328+
"### View the engineered names for featurized data\n",
329+
"Below we display the engineered feature names generated for the featurized data using the time-series featurization."
330+
]
331+
},
332+
{
333+
"cell_type": "code",
334+
"execution_count": null,
335+
"metadata": {},
336+
"outputs": [],
337+
"source": [
338+
"fitted_pipeline.named_steps['timeseriestransformer'].get_engineered_feature_names()"
339+
]
340+
},
341+
{
342+
"cell_type": "markdown",
343+
"metadata": {},
344+
"source": [
345+
"### View the featurization summary\n",
346+
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
347+
"- Raw feature name\n",
348+
"- Number of engineered features formed out of this raw feature\n",
349+
"- Type detected\n",
350+
"- If feature was dropped\n",
351+
"- List of feature transformations for the raw feature"
352+
]
353+
},
354+
{
355+
"cell_type": "code",
356+
"execution_count": null,
357+
"metadata": {},
358+
"outputs": [],
359+
"source": [
360+
"fitted_pipeline.named_steps['timeseriestransformer'].get_featurization_summary()"
361+
]
362+
},
323363
{
324364
"cell_type": "markdown",
325365
"metadata": {},

how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb

+42-2
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,9 @@
3737
"In this notebook you will learn how to:\n",
3838
"1. Create an `Experiment` in an existing `Workspace`.\n",
3939
"2. Configure AutoML using `AutoMLConfig`.\n",
40-
"4. Train the model.\n",
41-
"5. Explore the results.\n",
40+
"3. Train the model.\n",
41+
"4. Explore the results.\n",
42+
"5. Viewing the engineered names for featurized data and featurization summary for all raw features.\n",
4243
"6. Test the best fitted model.\n",
4344
"\n",
4445
"In addition this notebook showcases the following features\n",
@@ -316,6 +317,45 @@
316317
"# best_run, fitted_model = local_run.get_output(iteration = iteration)"
317318
]
318319
},
320+
{
321+
"cell_type": "markdown",
322+
"metadata": {},
323+
"source": [
324+
"#### View the engineered names for featurized data\n",
325+
"Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
326+
]
327+
},
328+
{
329+
"cell_type": "code",
330+
"execution_count": null,
331+
"metadata": {},
332+
"outputs": [],
333+
"source": [
334+
"fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
335+
]
336+
},
337+
{
338+
"cell_type": "markdown",
339+
"metadata": {},
340+
"source": [
341+
"#### View the featurization summary\n",
342+
"Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
343+
"- Raw feature name\n",
344+
"- Number of engineered features formed out of this raw feature\n",
345+
"- Type detected\n",
346+
"- If feature was dropped\n",
347+
"- List of feature transformations for the raw feature"
348+
]
349+
},
350+
{
351+
"cell_type": "code",
352+
"execution_count": null,
353+
"metadata": {},
354+
"outputs": [],
355+
"source": [
356+
"fitted_model.named_steps['datatransformer'].get_featurization_summary()"
357+
]
358+
},
319359
{
320360
"cell_type": "markdown",
321361
"metadata": {},

how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@
305305
"from azureml.train.automl.automlexplainer import explain_model\n",
306306
"\n",
307307
"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
308-
" explain_model(fitted_model, X_train, X_test)"
308+
" explain_model(fitted_model, X_train, X_test, features=features)"
309309
]
310310
},
311311
{

0 commit comments

Comments
 (0)