manashgoswami
diff --git a/‎NBSETUP.md
Lines changed: 9 additions & 4 deletions b/‎NBSETUP.md
Lines changed: 9 additions & 4 deletions
diff --git a/‎googled8147fb6c0788258.html
Lines changed: 0 additions & 1 deletion b/‎googled8147fb6c0788258.html
Lines changed: 0 additions & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/README.md
Lines changed: 10 additions & 2 deletions b/‎how-to-use-azureml/automated-machine-learning/README.md
Lines changed: 10 additions & 2 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb
Lines changed: 2 additions & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb
Lines changed: 2 additions & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb
Lines changed: 29 additions & 8 deletions b/‎how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb
Lines changed: 29 additions & 8 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb
Lines changed: 46 additions & 12 deletions b/‎how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb
Lines changed: 46 additions & 12 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
Lines changed: 41 additions & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
Lines changed: 41 additions & 1 deletion
diff --git a/‎how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb
Lines changed: 42 additions & 2 deletions b/‎how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb
Lines changed: 42 additions & 2 deletions
diff --git a/‎how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb
Lines changed: 1 addition & 1 deletion b/‎how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb
Lines changed: 1 addition & 1 deletion
@@ -1,4 +1,6 @@
-# Set up your notebook environment for Azure Machine Learning
+# Setting up environment
+
+---
 
 To run the notebooks in this repository use one of following options.
 
@@ -10,7 +12,9 @@ Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. A
 1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
 1. Open one of the sample notebooks
 
-    **Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
+    **Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook
+
+    ![set kernel to Python 3.6](images/python36.png)
 
 ## **Option 2: Use your own notebook server**
 
@@ -54,7 +58,8 @@ Please make sure you start with the [Configuration](configuration.ipynb) noteboo
 
 ### Video walkthrough:
 
-[!VIDEO https://youtu.be/VIsXeTuW3FU]
+[![Get Started video](images/yt_cover.png)](https://youtu.be/VIsXeTuW3FU)
+
 
 ## **Option 3: Use Docker**
 
@@ -98,4 +103,4 @@ pip install azureml-sdk[explain]
 pip install azureml-sdk[contrib]
 ```
 Drag and Drop
-The image will be downloaded by Fatkun
+The image will be downloaded by Fatkun
@@ -211,10 +211,18 @@ The main code of the file must be indented so that it is under this condition.
 <a name="troubleshooting"></a>
 # Troubleshooting
 ## automl_setup fails
-1. On windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window.  You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt".  If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed.  In that case, you can install it [here](https://conda.io/miniconda.html)
+1. On Windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window.  You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt".  If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed.  In that case, you can install it [here](https://conda.io/miniconda.html)
 2. Check that you have conda 64-bit installed rather than 32-bit.  You can check this with the command `conda info`.  The `platform` should be `win-64` for Windows or `osx-64` for Mac.
 3. Check that you have conda 4.4.10 or later.  You can check the version with the command `conda -V`.  If you have a previous version installed, you can update it using the command: `conda update conda`.
-4. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`. 
+4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
+5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`. 
+
+## automl_setup_linux.sh fails
+If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
+1. Make sure that outbound ports 53 and 80 are enabled.  On an Azure VM, you can do this from the Azure Portal by selecting the VM and clicking on Networking.
+2. Run the command: `sudo apt-get update`
+3. Run the command: `sudo apt-get install build-essential --fix-missing`
+4. Run `automl_setup_linux.sh` again.
 
 ## configuration.ipynb fails
 1) For local conda, make sure that you have susccessfully run automl_setup first.
 
@@ -302,7 +302,8 @@
       "source": [
         "from azureml.core.conda_dependencies import CondaDependencies\n",
         "\n",
-        "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
+        "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
+        "                                 pip_packages=['azureml-sdk[automl]'])\n",
         "\n",
         "conda_env_file_name = 'myenv.yml'\n",
         "myenv.save_to_file('.', conda_env_file_name)"
 
@@ -72,6 +72,32 @@
         "from azureml.train.automl import AutoMLConfig"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Accessing the Azure ML workspace requires authentication with Azure.\n",
+        "\n",
+        "The default authentication is interactive authentication using the default tenant.  Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n",
+        "\n",
+        "If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
+        "\n",
+        "```\n",
+        "from azureml.core.authentication import InteractiveLoginAuthentication\n",
+        "auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
+        "ws = Workspace.from_config(auth = auth)\n",
+        "```\n",
+        "\n",
+        "If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n",
+        "\n",
+        "```\n",
+        "from azureml.core.authentication import ServicePrincipalAuthentication\n",
+        "auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
+        "ws = Workspace.from_config(auth = auth)\n",
+        "```\n",
+        "For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)"
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,
@@ -133,11 +159,10 @@
         "|-|-|\n",
         "|**task**|classification or regression|\n",
         "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
-        "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
-        "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
         "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
         "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
-        "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
+        "|**n_cross_validations**|Number of cross validation splits.|\n",
+        "|<i>Exit Criteria [optional]</i><br><br>iterations<br>experiment_timeout_minutes|An optional duration parameter that says how long AutoML should be run.<br>This could be either number of iterations or number of minutes AutoML is allowed to run. <br><br><i>iterations</i> number of algorithm iterations to run<br><i>experiment_timeout_minutes</i> is the number of minutes that AutoML should run<br><br>By default, this is set to stop whenever AutoML determines that progress in scores is not being made|"
       ]
     },
     {
@@ -147,14 +172,10 @@
       "outputs": [],
       "source": [
         "automl_config = AutoMLConfig(task = 'classification',\n",
-        "                             debug_log = 'automl_errors.log',\n",
         "                             primary_metric = 'AUC_weighted',\n",
-        "                             iteration_timeout_minutes = 60,\n",
-        "                             iterations = 25,\n",
-        "                             verbosity = logging.INFO,\n",
         "                             X = X_train, \n",
         "                             y = y_train,\n",
-        "                             path = project_folder)"
+        "                             n_cross_validations = 3)"
       ]
     },
     {
 
@@ -37,7 +37,8 @@
         "2. Instantiating AutoMLConfig with new task type \"forecasting\" for timeseries data training, and other timeseries related settings: for this dataset we use the basic one: \"time_column_name\" \n",
         "3. Training the Model using local compute\n",
         "4. Exploring the results\n",
-        "5. Testing the fitted model"
+        "5. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
+        "6. Testing the fitted model"
       ]
     },
     {
@@ -126,7 +127,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "### Split the data to train and test\n",
+        "### Get the train data\n",
         "\n"
       ]
     },
@@ -172,14 +173,10 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "X_train = train[train['timeStamp'] < '2017-01-01']\n",
-        "X_valid = train[train['timeStamp'] >= '2017-01-01']\n",
+        "X_train = train\n",
         "y_train = X_train.pop('demand').values\n",
-        "y_valid = X_valid.pop('demand').values\n",
         "print(X_train.shape)\n",
-        "print(y_train.shape)\n",
-        "print(X_valid.shape)\n",
-        "print(y_valid.shape)"
+        "print(y_train.shape)"
       ]
     },
     {
@@ -198,8 +195,7 @@
         "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
         "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
         "|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
-        "|**X_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, n_features]|\n",
-        "|**y_valid**|Data used to evaluate a model in a iteration. (sparse) array-like, shape = [n_samples, ], targets values.|\n",
+        "|**n_cross_validations**|Number of cross validation splits.|\n",
         "|**path**|Relative path to the project folder.  AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
       ]
     },
@@ -222,8 +218,7 @@
         "                             iteration_timeout_minutes = 5,\n",
         "                             X = X_train,\n",
         "                             y = y_train,\n",
-        "                             X_valid = X_valid,\n",
-        "                             y_valid = y_valid,\n",
+        "                             n_cross_validations = 2,\n",
         "                             path=project_folder,\n",
         "                             verbosity = logging.INFO,\n",
         "                            **automl_settings)"
@@ -273,6 +268,45 @@
         "fitted_model.steps"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### View the engineered names for featurized data\n",
+        "Below we display the engineered feature names generated for the featurized data using the time-series featurization."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_model.named_steps['timeseriestransformer'].get_engineered_feature_names()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### View the featurization summary\n",
+        "Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
+        "- Raw feature name\n",
+        "- Number of engineered features formed out of this raw feature\n",
+        "- Type detected\n",
+        "- If feature was dropped\n",
+        "- List of feature transformations for the raw feature"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_model.named_steps['timeseriestransformer'].get_featurization_summary()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
 
@@ -36,7 +36,8 @@
         "1. Create an Experiment in an existing Workspace\n",
         "2. Instantiate an AutoMLConfig \n",
         "3. Find and train a forecasting model using local compute\n",
-        "4. Evaluate the performance of the model\n",
+        "4. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
+        "5. Evaluate the performance of the model\n",
         "\n",
         "The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
       ]
@@ -320,6 +321,45 @@
         "fitted_pipeline.steps"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### View the engineered names for featurized data\n",
+        "Below we display the engineered feature names generated for the featurized data using the time-series featurization."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_pipeline.named_steps['timeseriestransformer'].get_engineered_feature_names()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### View the featurization summary\n",
+        "Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
+        "- Raw feature name\n",
+        "- Number of engineered features formed out of this raw feature\n",
+        "- Type detected\n",
+        "- If feature was dropped\n",
+        "- List of feature transformations for the raw feature"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_pipeline.named_steps['timeseriestransformer'].get_featurization_summary()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
 
@@ -37,8 +37,9 @@
         "In this notebook you will learn how to:\n",
         "1. Create an `Experiment` in an existing `Workspace`.\n",
         "2. Configure AutoML using `AutoMLConfig`.\n",
-        "4. Train the model.\n",
-        "5. Explore the results.\n",
+        "3. Train the model.\n",
+        "4. Explore the results.\n",
+        "5. Viewing the engineered names for featurized data and featurization summary for all raw features.\n",
         "6. Test the best fitted model.\n",
         "\n",
         "In addition this notebook showcases the following features\n",
@@ -316,6 +317,45 @@
         "# best_run, fitted_model = local_run.get_output(iteration = iteration)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### View the engineered names for featurized data\n",
+        "Below we display the engineered feature names generated for the featurized data using the preprocessing featurization."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_model.named_steps['datatransformer'].get_engineered_feature_names()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### View the featurization summary\n",
+        "Below we display the featurization that was performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:-\n",
+        "- Raw feature name\n",
+        "- Number of engineered features formed out of this raw feature\n",
+        "- Type detected\n",
+        "- If feature was dropped\n",
+        "- List of feature transformations for the raw feature"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fitted_model.named_steps['datatransformer'].get_featurization_summary()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
 
@@ -305,7 +305,7 @@
         "from azureml.train.automl.automlexplainer import explain_model\n",
         "\n",
         "shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",
-        "    explain_model(fitted_model, X_train, X_test)"
+        "    explain_model(fitted_model, X_train, X_test, features=features)"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -305,7 +305,7 @@`
`305`	`305`	`"from azureml.train.automl.automlexplainer import explain_model\n",`
`306`	`306`	`"\n",`
`307`	`307`	`"shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n",`
`308`		`- " explain_model(fitted_model, X_train, X_test)"`
	`308`	`+ " explain_model(fitted_model, X_train, X_test, features=features)"`
`309`	`309`	`]`
`310`	`310`	`},`
`311`	`311`	`{`