Skip to content

Commit 659fb7a

Browse files
authored
Merge pull request Azure#619 from Azure/release_update/Release-153
update samples from Release-153 as a part of 1.0.69 SDK release
2 parents 5fcf488 + 2e404cf commit 659fb7a

File tree

139 files changed

+1250
-25949
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

139 files changed

+1250
-25949
lines changed

README.md

-4
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,6 @@ pip install azureml-sdk
1212
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
1313

1414
## How to navigate and use the example notebooks?
15-
16-
This [index](https://github.com/Azure/MachineLearningNotebooks/blob/master/index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content.
17-
18-
1915
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
2016

2117
If you want to...

configuration.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@
103103
"source": [
104104
"import azureml.core\n",
105105
"\n",
106-
"print(\"This notebook was created using version 1.0.65 of the Azure ML SDK\")\n",
106+
"print(\"This notebook was created using version 1.0.69 of the Azure ML SDK\")\n",
107107
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
108108
]
109109
},

contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb

+32-37
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@
99
"Licensed under the MIT License."
1010
]
1111
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids/azure-ml-with-nvidia-rapids.png)"
17+
]
18+
},
1219
{
1320
"cell_type": "markdown",
1421
"metadata": {},
@@ -20,7 +27,7 @@
2027
"cell_type": "markdown",
2128
"metadata": {},
2229
"source": [
23-
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c3\u201a\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c2\u00a0in Azure.\n",
30+
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c3\u201a\u00c2\u00a0in Azure.\n",
2431
" \n",
2532
"In this notebook, we will do the following:\n",
2633
" \n",
@@ -119,8 +126,10 @@
119126
"outputs": [],
120127
"source": [
121128
"ws = Workspace.from_config()\n",
129+
"\n",
122130
"# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n",
123131
"# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)\n",
132+
"\n",
124133
"print('Workspace name: ' + ws.name, \n",
125134
" 'Azure region: ' + ws.location, \n",
126135
" 'Subscription id: ' + ws.subscription_id, \n",
@@ -161,7 +170,7 @@
161170
"if gpu_cluster_name in ws.compute_targets:\n",
162171
" gpu_cluster = ws.compute_targets[gpu_cluster_name]\n",
163172
" if gpu_cluster and type(gpu_cluster) is AmlCompute:\n",
164-
" print('found compute target. just use it. ' + gpu_cluster_name)\n",
173+
" print('Found compute target. Will use {0} '.format(gpu_cluster_name))\n",
165174
"else:\n",
166175
" print(\"creating new cluster\")\n",
167176
" # vm_size parameter below could be modified to one of the RAPIDS-supported VM types\n",
@@ -183,7 +192,7 @@
183192
"cell_type": "markdown",
184193
"metadata": {},
185194
"source": [
186-
"The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb)."
195+
"The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS Mortgage E2E example](https://github.com/rapidsai/notebooks-contrib/blob/master/intermediate_notebooks/E2E/mortgage/mortgage_e2e.ipynb)."
187196
]
188197
},
189198
{
@@ -194,10 +203,7 @@
194203
"source": [
195204
"# copy process_data.py into the script folder\n",
196205
"import shutil\n",
197-
"shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))\n",
198-
"\n",
199-
"with open(os.path.join(scripts_folder, './process_data.py'), 'r') as process_data_script:\n",
200-
" print(process_data_script.read())"
206+
"shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))"
201207
]
202208
},
203209
{
@@ -221,13 +227,6 @@
221227
"### Downloading Data"
222228
]
223229
},
224-
{
225-
"cell_type": "markdown",
226-
"metadata": {},
227-
"source": [
228-
"<font color='red'>Important</font>: Python package progressbar2 is necessary to run the following cell. If it is not available in your environment where this notebook is running, please install it."
229-
]
230-
},
231230
{
232231
"cell_type": "code",
233232
"execution_count": null,
@@ -237,7 +236,6 @@
237236
"import tarfile\n",
238237
"import hashlib\n",
239238
"from urllib.request import urlretrieve\n",
240-
"from progressbar import ProgressBar\n",
241239
"\n",
242240
"def validate_downloaded_data(path):\n",
243241
" if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n",
@@ -267,7 +265,7 @@
267265
" url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n",
268266
" url = url_format.format(fileroot)\n",
269267
" print(\"...Downloading file :{0}\".format(filename))\n",
270-
" urlretrieve(url, filename,show_progress)\n",
268+
" urlretrieve(url, filename)\n",
271269
" pbar.finish()\n",
272270
" print(\"...File :{0} finished downloading\".format(filename))\n",
273271
" else:\n",
@@ -282,9 +280,7 @@
282280
" so_far = 0\n",
283281
" for member_info in members:\n",
284282
" tar.extract(member_info,path=path)\n",
285-
" show_progress(so_far, 1, numFiles)\n",
286283
" so_far += 1\n",
287-
" pbar.finish()\n",
288284
" print(\"...All {0} files have been decompressed\".format(numFiles))\n",
289285
" tar.close()"
290286
]
@@ -324,7 +320,9 @@
324320
"\n",
325321
"# download and uncompress data in a local directory before uploading to data store\n",
326322
"# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n",
327-
"ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n",
323+
"\n",
324+
"# ---->>>> UNCOMMENT THE BELOW LINE TO UPLOAD YOUR DATA IF NOT DONE SO ALREADY <<<<----\n",
325+
"# ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n",
328326
"\n",
329327
"# data already uploaded to the datastore\n",
330328
"data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
@@ -360,7 +358,7 @@
360358
"cell_type": "markdown",
361359
"metadata": {},
362360
"source": [
363-
"The following code shows how to use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/) that has a prebuilt conda environment named 'rapids' when creating a RunConfiguration. Note that this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc. This package is automatically installed when you use 'Specify package dependencies' option and that is why it is the recommended option to create RunConfiguraiton in AML."
361+
"The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subequent runs use the cached image and the overhead is minimal."
364362
]
365363
},
366364
{
@@ -369,17 +367,13 @@
369367
"metadata": {},
370368
"outputs": [],
371369
"source": [
372-
"run_config = RunConfiguration()\n",
370+
"cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
371+
"run_config = RunConfiguration(conda_dependencies=cd)\n",
373372
"run_config.framework = 'python'\n",
374-
"run_config.environment.python.user_managed_dependencies = True\n",
375-
"run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n",
376373
"run_config.target = gpu_cluster_name\n",
377374
"run_config.environment.docker.enabled = True\n",
378375
"run_config.environment.docker.gpu_support = True\n",
379-
"run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n",
380-
"# run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
381-
"# run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
382-
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
376+
"run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/base-gpu:intelmpi2018.3-cuda10.0-cudnn7-ubuntu16.04\"\n",
383377
"run_config.environment.spark.precache_packages = False\n",
384378
"run_config.data_references={'data':data_ref.to_config()}"
385379
]
@@ -388,14 +382,14 @@
388382
"cell_type": "markdown",
389383
"metadata": {},
390384
"source": [
391-
"#### Specify package dependencies"
385+
"#### Using Docker"
392386
]
393387
},
394388
{
395389
"cell_type": "markdown",
396390
"metadata": {},
397391
"source": [
398-
"The following code shows how to list package dependencies in a conda environment definition file (rapids.yml) when creating a RunConfiguration"
392+
"Alternatively, you can specify RAPIDS Docker image."
399393
]
400394
},
401395
{
@@ -404,16 +398,17 @@
404398
"metadata": {},
405399
"outputs": [],
406400
"source": [
407-
"# cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
408-
"# run_config = RunConfiguration(conda_dependencies=cd)\n",
401+
"# run_config = RunConfiguration()\n",
409402
"# run_config.framework = 'python'\n",
403+
"# run_config.environment.python.user_managed_dependencies = True\n",
404+
"# run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n",
410405
"# run_config.target = gpu_cluster_name\n",
411406
"# run_config.environment.docker.enabled = True\n",
412407
"# run_config.environment.docker.gpu_support = True\n",
413-
"# run_config.environment.docker.base_image = \"<image>\"\n",
414-
"# run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
415-
"# run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
416-
"# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
408+
"# run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n",
409+
"# # run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
410+
"# # run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
411+
"# # run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
417412
"# run_config.environment.spark.precache_packages = False\n",
418413
"# run_config.data_references={'data':data_ref.to_config()}"
419414
]
@@ -551,9 +546,9 @@
551546
"name": "python",
552547
"nbconvert_exporter": "python",
553548
"pygments_lexer": "ipython3",
554-
"version": "3.6.6"
549+
"version": "3.6.8"
555550
}
556551
},
557552
"nbformat": 4,
558-
"nbformat_minor": 2
553+
"nbformat_minor": 4
559554
}

0 commit comments

Comments
 (0)