YarShev
diff --git a/‎configuration.ipynb
+1-1 b/‎configuration.ipynb
+1-1
diff --git a/‎how-to-use-azureml/README.md
+1-1 b/‎how-to-use-azureml/README.md
+1-1
diff --git a/‎how-to-use-azureml/automated-machine-learning/README.md
+10-10 b/‎how-to-use-azureml/automated-machine-learning/README.md
+10-10
diff --git a/‎how-to-use-azureml/automated-machine-learning/automl_env.yml
+4-1 b/‎how-to-use-azureml/automated-machine-learning/automl_env.yml
+4-1
diff --git a/‎how-to-use-azureml/automated-machine-learning/automl_env_mac.yml
+4-1 b/‎how-to-use-azureml/automated-machine-learning/automl_env_mac.yml
+4-1
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb
+20-33 b/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb
+20-33
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.yml
+2 b/‎how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.yml
+2
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb
+16-21 b/‎how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb
+16-21
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.yml
+2 b/‎how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.yml
+2
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb
+3-3 b/‎how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb
+3-3
@@ -103,7 +103,7 @@
       "source": [
         "import azureml.core\n",
         "\n",
-        "print(\"This notebook was created using version 1.0.55 of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.0.57 of the Azure ML SDK\")\n",
         "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
       ]
     },
 
@@ -8,7 +8,7 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
 * [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
 * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
 * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
-* [logging-api](./training/logging-api): Learn about the details of logging metrics to run history.
+* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
 * [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
 * [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
 * [enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model.
 
@@ -155,11 +155,11 @@ jupyter notebook
 - [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
     - How to enable subsampling
 
-- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb)
-    - Using DataPrep for reading data
+- [auto-ml-dataset.ipynb](dataprep/auto-ml-dataset.ipynb)
+    - Using Dataset for reading data
 
-- [auto-ml-dataprep-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb)
-    - Using DataPrep for reading data with remote execution
+- [auto-ml-dataset-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataset-remote-execution.ipynb)
+    - Using Dataset for reading data with remote execution
 
 - [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
     - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
@@ -229,7 +229,7 @@ The main code of the file must be indented so that it is under this condition.
 2. Check that you have conda 64-bit installed rather than 32-bit.  You can check this with the command `conda info`.  The `platform` should be `win-64` for Windows or `osx-64` for Mac.
 3. Check that you have conda 4.4.10 or later.  You can check the version with the command `conda -V`.  If you have a previous version installed, you can update it using the command: `conda update conda`.
 4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
-5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`. 
+5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
 
 ## automl_setup_linux.sh fails
 If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
@@ -264,13 +264,13 @@ Some Windows environments see an error loading numpy with the latest Python vers
 Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
 You may check the version of tensorflow and uninstall as follows
 1) start a command shell, activate conda environment where automated ml packages are installed
-2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13 
-3) If the listed version is a not a supported version,  `pip uninstall tensorflow` in the command shell and enter y for confirmation. 
+2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
+3) If the listed version is a not a supported version,  `pip uninstall tensorflow` in the command shell and enter y for confirmation.
 
-## Remote run: DsvmCompute.create fails 
+## Remote run: DsvmCompute.create fails
 There are several reasons why the DsvmCompute.create can fail.  The reason is usually in the error message but you have to look at the end of the error message for the detailed reason.  Some common reasons are:
 1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.`  Note that underscore is not allowed in the name.
-2) `The requested VM size xxxxx is not available in the current region.`  You can select a different region or vm_size. 
+2) `The requested VM size xxxxx is not available in the current region.`  You can select a different region or vm_size.
 
 ## Remote run: Unable to establish SSH connection
 Automated ML uses the SSH protocol to communicate with remote DSVMs.  This defaults to port 22.  Possible causes for this error are:
@@ -296,4 +296,4 @@ To resolve this issue, allocate a DSVM with more memory or reduce the value spec
 
 ## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
 This can be caused by too many concurrent iterations for a remote DSVM.  Each concurrent iteration usually takes 100% of a core when it is running.  Some iterations can use multiple cores.  So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
-To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
+To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
@@ -13,10 +13,13 @@ dependencies:
 - scikit-learn>=0.19.0,<=0.20.3
 - pandas>=0.22.0,<=0.23.4
 - py-xgboost<=0.80
+- pyarrow>=0.11.0
 
 - pip:
   # Required packages for AzureML execution, history, and data preparation.
-  - azureml-sdk[automl,explain]
+  - azureml-defaults
+  - azureml-train-automl
   - azureml-widgets
+  - azureml-explain-model
   - pandas_ml
 
@@ -14,10 +14,13 @@ dependencies:
 - scikit-learn>=0.19.0,<=0.20.3
 - pandas>=0.22.0,<0.23.0
 - py-xgboost<=0.80
+- pyarrow>=0.11.0
 
 - pip:
   # Required packages for AzureML execution, history, and data preparation.
-  - azureml-sdk[automl,explain]
+  - azureml-defaults
+  - azureml-train-automl
   - azureml-widgets
+  - azureml-explain-model
   - pandas_ml
 
@@ -69,22 +69,17 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "import json\n",
         "import logging\n",
         "\n",
         "from matplotlib import pyplot as plt\n",
-        "import numpy as np\n",
         "import pandas as pd\n",
         "import os\n",
-        "from sklearn import datasets\n",
-        "import azureml.dataprep as dprep\n",
-        "from sklearn.model_selection import train_test_split\n",
         "\n",
         "import azureml.core\n",
         "from azureml.core.experiment import Experiment\n",
         "from azureml.core.workspace import Workspace\n",
-        "from azureml.train.automl import AutoMLConfig\n",
-        "from azureml.train.automl.run import AutoMLRun"
+        "from azureml.core.dataset import Dataset\n",
+        "from azureml.train.automl import AutoMLConfig"
       ]
     },
     {
@@ -155,11 +150,12 @@
         "    # Create the cluster.\n",
         "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
         "    \n",
-        "    # Can poll for a minimum number of nodes and for a specific timeout.\n",
-        "    # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
-        "    compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
+        "print('Checking cluster status...')\n",
+        "# Can poll for a minimum number of nodes and for a specific timeout.\n",
+        "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
+        "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
         "    \n",
-        "     # For a more detailed view of current AmlCompute status, use get_status()."
+        "# For a more detailed view of current AmlCompute status, use get_status()."
       ]
     },
     {
@@ -200,11 +196,8 @@
         "# Set compute target to AmlCompute\n",
         "conda_run_config.target = compute_target\n",
         "conda_run_config.environment.docker.enabled = True\n",
-        "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
-        "\n",
-        "dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
         "\n",
-        "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
+        "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
         "conda_run_config.environment.python.conda_dependencies = cd"
       ]
     },
@@ -224,11 +217,10 @@
       "outputs": [],
       "source": [
         "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
-        "dflow = dprep.read_csv(data, infer_column_types=True)\n",
-        "dflow.get_profile()\n",
-        "X_train = dflow.drop_columns(columns=['y'])\n",
-        "y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
-        "dflow.head()"
+        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
+        "X_train = dataset.drop_columns(columns=['y'])\n",
+        "y_train = dataset.keep_columns(columns=['y'], validate=True)\n",
+        "dataset.take(5).to_pandas_dataframe()"
       ]
     },
     {
@@ -406,7 +398,7 @@
         "def run(rawdata):\n",
         "    try:\n",
         "        data = json.loads(rawdata)['data']\n",
-        "        data = numpy.array(data)\n",
+        "        data = np.array(data)\n",
         "        result = model.predict(data)\n",
         "    except Exception as e:\n",
         "        result = str(e)\n",
@@ -443,7 +435,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
+        "for p in ['azureml-train-automl', 'azureml-core']:\n",
         "    print('{}\\t{}'.format(p, dependencies[p]))"
       ]
     },
@@ -453,10 +445,8 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from azureml.core.conda_dependencies import CondaDependencies\n",
-        "\n",
         "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
-        "                                 pip_packages=['azureml-sdk[automl]'])\n",
+        "                                 pip_packages=['azureml-train-automl'])\n",
         "\n",
         "conda_env_file_name = 'myenv.yml'\n",
         "myenv.save_to_file('.', conda_env_file_name)"
@@ -476,7 +466,7 @@
         "    content = cefr.read()\n",
         "\n",
         "with open(conda_env_file_name, 'w') as cefw:\n",
-        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
+        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
         "\n",
         "# Substitute the actual model id in the script file.\n",
         "\n",
@@ -618,8 +608,6 @@
       "outputs": [],
       "source": [
         "# Load the bank marketing datasets.\n",
-        "from sklearn.datasets import load_diabetes\n",
-        "from sklearn.model_selection import train_test_split\n",
         "from numpy import array"
       ]
     },
@@ -630,11 +618,10 @@
       "outputs": [],
       "source": [
         "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
-        "dflow = dprep.read_csv(data, infer_column_types=True)\n",
-        "dflow.get_profile()\n",
-        "X_test = dflow.drop_columns(columns=['y'])\n",
-        "y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n",
-        "dflow.head()"
+        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
+        "X_test = dataset.drop_columns(columns=['y'])\n",
+        "y_test = dataset.keep_columns(columns=['y'], validate=True)\n",
+        "dataset.take(5).to_pandas_dataframe()"
       ]
     },
     {
 
@@ -2,6 +2,8 @@ name: auto-ml-classification-bank-marketing
 dependencies:
 - pip:
   - azureml-sdk
+  - azureml-defaults
+  - azureml-explain-model
   - azureml-train-automl
   - azureml-widgets
   - matplotlib
 
@@ -74,14 +74,12 @@
         "from matplotlib import pyplot as plt\n",
         "import pandas as pd\n",
         "import os\n",
-        "from sklearn.model_selection import train_test_split\n",
-        "import azureml.dataprep as dprep\n",
         "\n",
         "import azureml.core\n",
         "from azureml.core.experiment import Experiment\n",
         "from azureml.core.workspace import Workspace\n",
-        "from azureml.train.automl import AutoMLConfig\n",
-        "from azureml.train.automl.run import AutoMLRun"
+        "from azureml.core.dataset import Dataset\n",
+        "from azureml.train.automl import AutoMLConfig"
       ]
     },
     {
@@ -152,11 +150,12 @@
         "    # Create the cluster.\n",
         "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
         "    \n",
-        "    # Can poll for a minimum number of nodes and for a specific timeout.\n",
-        "    # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
-        "    compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
-        "    \n",
-        "     # For a more detailed view of current AmlCompute status, use get_status()."
+        "print('Checking cluster status...')\n",
+        "# Can poll for a minimum number of nodes and for a specific timeout.\n",
+        "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
+        "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
+        "\n",
+        "# For a more detailed view of current AmlCompute status, use get_status()."
       ]
     },
     {
@@ -197,11 +196,8 @@
         "# Set compute target to AmlCompute\n",
         "conda_run_config.target = compute_target\n",
         "conda_run_config.environment.docker.enabled = True\n",
-        "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
-        "\n",
-        "dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n",
         "\n",
-        "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
+        "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
         "conda_run_config.environment.python.conda_dependencies = cd"
       ]
     },
@@ -211,7 +207,7 @@
       "source": [
         "### Load Data\n",
         "\n",
-        "Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data  using train_test_split and return X_train and y_train for training the model."
+        "Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using random_split and return X_train and y_train for training the model."
       ]
     },
     {
@@ -221,10 +217,9 @@
       "outputs": [],
       "source": [
         "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
-        "dflow = dprep.read_csv(data, infer_column_types=True)\n",
-        "dflow.get_profile()\n",
-        "X = dflow.drop_columns(columns=['Class'])\n",
-        "y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
+        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
+        "X = dataset.drop_columns(columns=['Class'])\n",
+        "y = dataset.keep_columns(columns=['Class'], validate=True)\n",
         "X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
         "y_train, y_test = y.random_split(percentage=0.8, seed=223)"
       ]
@@ -447,7 +442,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
+        "for p in ['azureml-train-automl', 'azureml-core']:\n",
         "    print('{}\\t{}'.format(p, dependencies[p]))"
       ]
     },
@@ -458,7 +453,7 @@
       "outputs": [],
       "source": [
         "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
-        "                                 pip_packages=['azureml-sdk[automl]'])\n",
+        "                                 pip_packages=['azureml-train-automl'])\n",
         "\n",
         "conda_env_file_name = 'myenv.yml'\n",
         "myenv.save_to_file('.', conda_env_file_name)"
@@ -478,7 +473,7 @@
         "    content = cefr.read()\n",
         "\n",
         "with open(conda_env_file_name, 'w') as cefw:\n",
-        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
+        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
         "\n",
         "# Substitute the actual model id in the script file.\n",
         "\n",
 
@@ -2,6 +2,8 @@ name: auto-ml-classification-credit-card-fraud
 dependencies:
 - pip:
   - azureml-sdk
+  - azureml-defaults
+  - azureml-explain-model
   - azureml-train-automl
   - azureml-widgets
   - matplotlib
 
@@ -297,7 +297,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
+        "for p in ['azureml-train-automl', 'azureml-core']:\n",
         "    print('{}\\t{}'.format(p, dependencies[p]))"
       ]
     },
@@ -310,7 +310,7 @@
         "from azureml.core.conda_dependencies import CondaDependencies\n",
         "\n",
         "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
-        "                                 pip_packages=['azureml-sdk[automl]'])\n",
+        "                                 pip_packages=['azureml-train-automl'])\n",
         "\n",
         "conda_env_file_name = 'myenv.yml'\n",
         "myenv.save_to_file('.', conda_env_file_name)"
@@ -330,7 +330,7 @@
         "    content = cefr.read()\n",
         "\n",
         "with open(conda_env_file_name, 'w') as cefw:\n",
-        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
+        "    cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
         "\n",
         "# Substitute the actual model id in the script file.\n",
         "\n",
Original file line number	Diff line number	Diff line change
`@@ -103,7 +103,7 @@`
`103`	`103`	`"source": [`
`104`	`104`	`"import azureml.core\n",`
`105`	`105`	`"\n",`
`106`		`- "print(\"This notebook was created using version 1.0.55 of the Azure ML SDK\")\n",`
	`106`	`+ "print(\"This notebook was created using version 1.0.57 of the Azure ML SDK\")\n",`
`107`	`107`	`"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"`
`108`	`108`	`]`
`109`	`109`	`},`