Skip to content

Commit cca3996

Browse files
committed
release 1.0.15
1 parent 5fd14ba commit cca3996

File tree

34 files changed

+2474
-335
lines changed

34 files changed

+2474
-335
lines changed

Diff for: NBSETUP.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -101,4 +101,6 @@ pip install azureml-sdk[explain]
101101

102102
# install the core SDK and experimental components
103103
pip install azureml-sdk[contrib]
104-
```
104+
```
105+
Drag and Drop
106+
The image will be downloaded by Fatkun

Diff for: README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ and maintaining the complete data science workflow from the cloud.
1111
```sh
1212
pip install azureml-sdk
1313
```
14-
Read more detailed instructions on [how to set up your environment](./NBSETUP.md).
14+
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
1515

1616
## How to navigate and use the example notebooks?
1717
You should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.

Diff for: configuration.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
"source": [
9797
"import azureml.core\n",
9898
"\n",
99-
"print(\"This notebook was created using version 1.0.10 of the Azure ML SDK\")\n",
99+
"print(\"This notebook was created using version 1.0.6 of the Azure ML SDK\")\n",
100100
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
101101
]
102102
},
@@ -373,4 +373,4 @@
373373
},
374374
"nbformat": 4,
375375
"nbformat_minor": 2
376-
}
376+
}

Diff for: how-to-use-azureml/automated-machine-learning/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,9 @@ bash automl_setup_linux.sh
169169
- How to specifying sample_weight
170170
- The difference that it makes to test results
171171

172+
- [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
173+
- How to enable subsampling
174+
172175
- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb)
173176
- Using DataPrep for reading data
174177

Diff for: how-to-use-azureml/automated-machine-learning/automl_env.yml

-12
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,7 @@ dependencies:
1313
- pandas>=0.22.0,<0.23.0
1414
- tensorflow>=1.12.0
1515

16-
# Required for azuremlftk
17-
- dill
18-
- pyodbc
19-
- statsmodels
20-
- numexpr
21-
- keras
22-
- distributed>=1.21.5,<1.24
23-
2416
- pip:
25-
26-
# Required for azuremlftk
27-
- https://azuremlpackages.blob.core.windows.net/forecasting/azuremlftk-0.1.18323.5a1-py3-none-any.whl
28-
2917
# Required packages for AzureML execution, history, and data preparation.
3018
- azureml-sdk[automl,notebooks,explain]
3119
- pandas_ml

Diff for: how-to-use-azureml/automated-machine-learning/automl_env_mac.yml

-12
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,7 @@ dependencies:
1313
- pandas>=0.22.0,<0.23.0
1414
- tensorflow>=1.12.0
1515

16-
# Required for azuremlftk
17-
- dill
18-
- pyodbc
19-
- statsmodels
20-
- numexpr
21-
- keras
22-
- distributed>=1.21.5,<1.24
23-
2416
- pip:
25-
26-
# Required for azuremlftk
27-
- https://azuremlpackages.blob.core.windows.net/forecasting/azuremlftk-0.1.18323.5a1-py3-none-any.whl
28-
2917
# Required packages for AzureML execution, history, and data preparation.
3018
- azureml-sdk[automl,notebooks,explain]
3119
- pandas_ml

Diff for: how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb

-24
Original file line numberDiff line numberDiff line change
@@ -47,30 +47,6 @@
4747
"## Setup\n"
4848
]
4949
},
50-
{
51-
"cell_type": "markdown",
52-
"metadata": {},
53-
"source": [
54-
"To use the *forecasting* task in AutoML, you need to have the **azuremlftk** package installed in your environment. The following cell tests whether this package is installed locally and, if not, gives you instructions for installing it. "
55-
]
56-
},
57-
{
58-
"cell_type": "code",
59-
"execution_count": null,
60-
"metadata": {},
61-
"outputs": [],
62-
"source": [
63-
"try:\n",
64-
" import ftk\n",
65-
" print('Using FTK version ' + ftk.__version__)\n",
66-
"except ImportError:\n",
67-
" print(\"Unable to import forecasting package. This notebook does not work without this package.\\n\"\n",
68-
" + \"Please open a command prompt and run `pip install azuremlftk` to install the package. \\n\"\n",
69-
" + \"Make sure you install the package into AutoML's Python environment.\\n\\n\"\n",
70-
" + \"For instance, if AutoML is installed in a conda environment called `python36`, run:\\n\"\n",
71-
" + \"> activate python36\\n> pip install azuremlftk\")"
72-
]
73-
},
7450
{
7551
"cell_type": "code",
7652
"execution_count": null,

Diff for: how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb

+1-25
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
"3. Find and train a forecasting model using local compute\n",
3939
"4. Evaluate the performance of the model\n",
4040
"\n",
41-
"The examples in the follow code samples use the [University of Chicago's Dominick's Finer Foods dataset](https://research.chicagobooth.edu/kilts/marketing-databases/dominicks) to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
41+
"The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
4242
]
4343
},
4444
{
@@ -48,30 +48,6 @@
4848
"## Setup"
4949
]
5050
},
51-
{
52-
"cell_type": "markdown",
53-
"metadata": {},
54-
"source": [
55-
"To use the *forecasting* task in AutoML, you need to have the **azuremlftk** package installed in your environment. The following cell tests whether this package is installed locally and, if not, gives you instructions for installing it."
56-
]
57-
},
58-
{
59-
"cell_type": "code",
60-
"execution_count": null,
61-
"metadata": {},
62-
"outputs": [],
63-
"source": [
64-
"try:\n",
65-
" import ftk\n",
66-
" print('Using FTK version ' + ftk.__version__)\n",
67-
"except ImportError:\n",
68-
" print(\"Unable to import forecasting package. This notebook does not work without this package.\\n\"\n",
69-
" + \"Please open a command prompt and run `pip install azuremlftk` to install the package. \\n\"\n",
70-
" + \"Make sure you install the package into AutoML's Python environment.\\n\\n\"\n",
71-
" + \"For instance, if AutoML is installed in a conda environment called `python36`, run:\\n\"\n",
72-
" + \"> activate python36\\n> pip install azuremlftk\")"
73-
]
74-
},
7551
{
7652
"cell_type": "code",
7753
"execution_count": null,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
8+
"\n",
9+
"Licensed under the MIT License."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"# Automated Machine Learning\n",
17+
"_**Classification with Local Compute**_\n",
18+
"\n",
19+
"## Contents\n",
20+
"1. [Introduction](#Introduction)\n",
21+
"1. [Setup](#Setup)\n",
22+
"1. [Data](#Data)\n",
23+
"1. [Train](#Train)\n",
24+
"\n"
25+
]
26+
},
27+
{
28+
"cell_type": "markdown",
29+
"metadata": {},
30+
"source": [
31+
"## Introduction\n",
32+
"\n",
33+
"In this example we will explore AutoML's subsampling feature. This is useful for training on large datasets to speed up the convergence.\n",
34+
"\n",
35+
"The setup is quiet similar to a normal classification, with the exception of the `enable_subsampling` option. Keep in mind that even with the `enable_subsampling` flag set, subsampling will only be run for large datasets (>= 50k rows) and large (>= 85) or no iteration restrictions.\n"
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"metadata": {},
41+
"source": [
42+
"## Setup\n",
43+
"\n",
44+
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"import logging\n",
54+
"\n",
55+
"import numpy as np\n",
56+
"import pandas as pd\n",
57+
"\n",
58+
"import azureml.core\n",
59+
"from azureml.core.experiment import Experiment\n",
60+
"from azureml.core.workspace import Workspace\n",
61+
"from azureml.train.automl import AutoMLConfig\n",
62+
"from azureml.train.automl.run import AutoMLRun"
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"metadata": {},
69+
"outputs": [],
70+
"source": [
71+
"ws = Workspace.from_config()\n",
72+
"\n",
73+
"# Choose a name for the experiment and specify the project folder.\n",
74+
"experiment_name = 'automl-subsampling'\n",
75+
"project_folder = './sample_projects/automl-subsampling'\n",
76+
"\n",
77+
"experiment = Experiment(ws, experiment_name)\n",
78+
"\n",
79+
"output = {}\n",
80+
"output['SDK version'] = azureml.core.VERSION\n",
81+
"output['Subscription ID'] = ws.subscription_id\n",
82+
"output['Workspace Name'] = ws.name\n",
83+
"output['Resource Group'] = ws.resource_group\n",
84+
"output['Location'] = ws.location\n",
85+
"output['Project Directory'] = project_folder\n",
86+
"output['Experiment Name'] = experiment.name\n",
87+
"pd.set_option('display.max_colwidth', -1)\n",
88+
"pd.DataFrame(data = output, index = ['']).T"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"metadata": {},
94+
"source": [
95+
"Opt-in diagnostics for better experience, quality, and security of future releases."
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"from azureml.telemetry import set_diagnostics_collection\n",
105+
"set_diagnostics_collection(send_diagnostics = True)"
106+
]
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"metadata": {},
111+
"source": [
112+
"## Data\n",
113+
"\n",
114+
"We will create a simple dataset using the numpy sin function just for this example. We need just over 50k rows."
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": null,
120+
"metadata": {},
121+
"outputs": [],
122+
"source": [
123+
"base = np.arange(60000)\n",
124+
"cos = np.cos(base)\n",
125+
"y = np.round(np.sin(base)).astype('int')\n",
126+
"\n",
127+
"# Exclude the first 100 rows from training so that they can be used for test.\n",
128+
"X_train = np.hstack((base.reshape(-1, 1), cos.reshape(-1, 1)))\n",
129+
"y_train = y"
130+
]
131+
},
132+
{
133+
"cell_type": "markdown",
134+
"metadata": {},
135+
"source": [
136+
"## Train\n",
137+
"\n",
138+
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
139+
"\n",
140+
"|Property|Description|\n",
141+
"|-|-|\n",
142+
"|**enable_subsampling**|This enables subsampling as an option. However it does not guarantee subsampling will be used. It also depends on how large the dataset is and how many iterations it's expected to run at a minimum.|\n",
143+
"|**iterations**|Number of iterations. Subsampling requires a lot of iterations at smaller percent so in order for subsampling to be used we need to set iterations to be a high number.|\n",
144+
"|**experiment_timeout_minutes**|The experiment timeout, it's set to 5 right now to shorten the demo but it should probably be higher if we want to finish all the iterations.|\n",
145+
"\n"
146+
]
147+
},
148+
{
149+
"cell_type": "code",
150+
"execution_count": null,
151+
"metadata": {},
152+
"outputs": [],
153+
"source": [
154+
"automl_config = AutoMLConfig(task = 'classification',\n",
155+
" debug_log = 'automl_errors.log',\n",
156+
" primary_metric = 'accuracy',\n",
157+
" iterations = 85,\n",
158+
" experiment_timeout_minutes = 5,\n",
159+
" n_cross_validations = 2,\n",
160+
" verbosity = logging.INFO,\n",
161+
" X = X_train, \n",
162+
" y = y_train,\n",
163+
" enable_subsampling=True,\n",
164+
" path = project_folder)"
165+
]
166+
},
167+
{
168+
"cell_type": "markdown",
169+
"metadata": {},
170+
"source": [
171+
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
172+
"In this example, we specify `show_output = True` to print currently running iterations to the console."
173+
]
174+
},
175+
{
176+
"cell_type": "code",
177+
"execution_count": null,
178+
"metadata": {},
179+
"outputs": [],
180+
"source": [
181+
"local_run = experiment.submit(automl_config, show_output = True)"
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": null,
187+
"metadata": {},
188+
"outputs": [],
189+
"source": []
190+
}
191+
],
192+
"metadata": {
193+
"authors": [
194+
{
195+
"name": "rogehe"
196+
}
197+
],
198+
"kernelspec": {
199+
"display_name": "Python 3.6",
200+
"language": "python",
201+
"name": "python36"
202+
},
203+
"language_info": {
204+
"codemirror_mode": {
205+
"name": "ipython",
206+
"version": 3
207+
},
208+
"file_extension": ".py",
209+
"mimetype": "text/x-python",
210+
"name": "python",
211+
"nbconvert_exporter": "python",
212+
"pygments_lexer": "ipython3",
213+
"version": "3.6.6"
214+
}
215+
},
216+
"nbformat": 4,
217+
"nbformat_minor": 2
218+
}

0 commit comments

Comments
 (0)