diff --git a/.gitignore b/.gitignore
index 1c08d93..da30c8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,5 +9,7 @@ docs
 *.pdf
 *_notes_files/
 .quarto
+.vdoc.py
+.vdoc.r
 
 !sidebars-toggle.html
diff --git a/08_main.qmd b/08_main.qmd
index 01e8618..831bed9 100644
--- a/08_main.qmd
+++ b/08_main.qmd
@@ -2,6 +2,10 @@
 
 ## Learning Objectives
 
-- item 1
-- item 2
-- item 3
+- Use **decision trees** o model relationships between predictors and an outcome
+- Compare and contrast tree-based models with other model types
+- Use **tree-based ensemble methods** to improve predictive models
+- Compare and contrast the various methods of building tree ensembles: bagging, boosting, random forests, and Bayesian Additive Regression Trees (BART)
+
+Sources: 
+https://github.com/JauntyJJS/islr2-bookclub-cohort3-chapter8,  https://hastie.su.domains/ISLR2/Slides/Ch8_Tree_Based_Methods.pdf
diff --git a/08_notes.qmd b/08_notes.qmd
index 7a77083..20dfe88 100644
--- a/08_notes.qmd
+++ b/08_notes.qmd
@@ -1,2 +1,417 @@
-## Notes {.unnumbered}
+# Notes {-}
 
+## Introduction: Tree-based methods
+
+- Involve **stratifying** or **segmenting** the predictor space into a number of simple regions
+- Are simple and useful for interpretation
+- However, basic decision trees are NOT competitive with the best supervised learning approaches in terms of prediction accuracy
+- Thus, we also discuss **bagging**, **random forests**, and **boosting** (i.e., tree-based ensemble methods) to grow multiple trees which are then combined to yield a single consensus prediction
+- These can result in dramatic improvements in prediction accuracy (but some loss of interpretability)
+- Can be applied to both regression and classification
+
+## Regression Trees
+
+First, let's take a look at `Hitters` dataset.
+```{r}
+#| label: 08-hitters-dataset
+#| echo: false
+library(dplyr)
+library(tidyr)
+library(readr)
+
+df <- read_csv('./data/Hitters.csv') %>% 
+select(Names, Hits, Years, Salary) %>% 
+drop_na() %>% 
+mutate(log_Salary = log(Salary))
+
+df
+```
+
+```{r}
+#| label: 08-reg-trees-intro
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_1_salary_data.png")
+
+knitr::include_graphics("images/08_2_basic_tree.png")
+```
+
+- For the Hitters data, a regression tree for predicting the log salary of a baseball player based on:
+
+    1. number of years that he has played in the major leagues
+    2. number of hits that he made in the previous year
+
+## Terminology
+
+```{r}
+#| label: 08-decision-trees-terminology-1
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_3_basic_tree_term.png")
+```
+
+```{r}
+#| label: 08-decision-trees-terminology-2
+#| echo: false
+#| fig-cap: The three-region partition for the Hitters data set from the regression tree
+#| out-width: 100%
+knitr::include_graphics("images/08_4_hitters_predictor_space.png")
+```
+
+- Overall, the tree stratifies or segments the players into three regions of predictor space:
+  - R1 = {X \| Years\< 4.5}
+  - R2 = {X \| Years\>=4.5, Hits\<117.5}
+  - R3 = {X \| Years\>=4.5, Hits\>=117.5}
+  
+  where R1, R2, and R3 are **terminal nodes** (leaves) and green lines (where the predictor space is split) are the **internal nodes**
+
+- The number in each leaf/terminal node is the mean of the response for the observations that fall there
+
+## Interpretation of results: regression tree (Hitters data)
+
+```{r}
+#| label: 08-reg-trees-interpreration
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_2_basic_tree.png")
+```
+
+1. `Years` is the most important factor in determining `Salary`: players with less experience earn lower salaries than more experienced players
+2. Given that a player is less experienced, the number of `Hits` that he made in the previous year seems to play little role in his `Salary`
+3. But among players who have been in the major leagues for 5 or more years, the number of Hits made in the previous year does affect Salary: players who made more Hits last year tend to have higher salaries
+4. This is surely an over-simplification, but compared to a regression model, it is easy to display, interpret and explain
+
+## Tree-building process (regression)
+
+1. Divide the predictor space --- that is, the set of possible values for $X_1,X_2, . . . ,X_p$ --- into $J$ distinct and **non-overlapping** regions, $R_1,R_2, . . . ,R_J$
+ - Regions can have ANY shape - they don't have to be boxes
+2. For every observation that falls into the region $R_j$, we make the same prediction: the **mean** of the response values in $R_j$
+3. The goal is to find regions (here boxes) $R_1, . . . ,R_J$ that **minimize** the $RSS$, given by
+
+$$\mathrm{RSS}=\sum_{j=1}^{J}\sum_{i{\in}R_j}^{}(y_i - \hat{y}_{R_j})^2$$
+
+where $\hat{y}_{R_j}$ is the **mean** response for the training observations within the $j$th box
+
+- Unfortunately, it is **computationally infeasible** to consider every possible partition of the feature space into $J$ boxes.
+
+## Recursive binary splitting
+
+So, take a top-down, greedy approach known as recursive binary splitting:
+
+- **top-down** because it begins at the top of the tree and then successively splits the predictor space
+- **greedy** because at each step of the tree-building process, the best split is made at that particular step, rather than looking ahead and picking a split that will lead to a better tree in some future step
+
+1. First, select the predictor $X_j$ and the cutpoint $s$ such that splitting the predictor space into the regions ${\{X|X_j<s\}}$ and ${\{X|X_j{\ge}s}\}$ leads to the greatest possible reduction in RSS
+2. Repeat the process looking for the best predictor and best cutpoint to split data further (i.e., split one of the 2 previously identified regions - not the entire predictor space) minimizing the RSS within each of the resulting regions
+3. Continue until a stopping criterion is reached, e.g., no region contains more than five observations
+4. Again, we predict the response for a given test observation using the **mean of the training observations** in the region to which that test observation belongs
+
+but ...
+
+- The previous method may result in a tree that **overfits** the data. Why?
+- Tree is too leafy (complex)
+- A better strategy is to have a smaller tree with fewer splits, which will reduce variance and lead to better interpretation of results (at the cost of a little bias)
+- So we will prune
+
+## Pruning a tree
+
+1. Grow a very large tree $T_0$ as before
+2. Apply cost-complexity pruning to $T_0$ to obtain a sequence of BEST subtrees, as a function of $\alpha$
+
+Cost complexity pruning minimizes (Eq. 8.4)
+$\sum_{m=1}^{|T|}\sum_{x_i{\in}R_m}(y_i-\hat{y}_{R_m})^2 + \alpha|T|$
+
+where
+
+$\alpha$ $\geq$ 0
+
+$|T|$ is the number of **terminal nodes** the sub tree $|T|$ holds
+
+$R_m$ is the rectangle/region (i.e., the subset of predictor space) corresponding to the $m$th terminal node
+
+$\hat{y}_{R_m}$ is the **mean** response for the training observations in $R_m$
+
+- the tuning parameter $\alpha$ controls:
+
+    a. a trade-off between the subtree's complexity (the number of terminal nodes)
+    b. the subtree's fit to the training data
+
+3. Choose $\alpha$ using K-fold cross-validation
+
+    - repeat steps 1) and 2) for each $K-1/K$th fraction of training data
+    - average the results and pick $\alpha$ to minimize the average MSE
+    - recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
+
+4. Return to the subtree from Step 2) that corresponds to the chosen value of $\alpha$
+
+## An example: tree pruning (Hitters dataset)
+
+- Results of fitting and pruning a regression tree on the Hitters data using 9 of the features
+- Randomly divided the data set in half (132 observations in training, 131 observations in the test set)
+- Built large regression tree on training data and varied $\alpha$ in Eq. 8.4 to create subtrees with different numbers of terminal nodes
+- Finally, performed 6-fold cross-validation to estimate the cross-validated MSE of the trees as a function of $\alpha$
+
+```{r}
+#| label: 08-purning-a-tree
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_5_hitters_unpruned_tree.png")
+```
+
+```{r}
+#| label: 08-mse-cross-validation
+#| echo: false
+#| out-width: 100%
+#| fig-cap: 'Training, cross-validation, and test MSE are shown as a function of the number of terminal nodes in the pruned tree. Standard error bands are displayed. The minimum cross-validation error occurs at a tree size of 3.'
+knitr::include_graphics("images/08_6_hitters_mse.png")
+```
+
+## Classification trees
+
+- Very similar to a regression tree except it predicts a qualitative (vs quantitative) response
+- We predict that each observation belongs to the **most commonly occurring class** of training observations in the region to which it belongs
+- In the classification setting, RSS cannot be used as a criterion for making the binary splits
+- A natural alternative to RSS is the classification **error rate**, i.e., the fraction of the training observations in that region that do not belong to the most common class:
+
+$$E = 1 - \max_k(\hat{p}_{mk})$$
+
+where $\hat{p}_{mk}$ is the **proportion of training observations** in the $m$th region that are from the $k$th class
+
+- However, this error rate is unsuited for tree-based classification because $E$ does not change much as the tree grows (**lacks sensitivity**)
+- So, 2 other measures are preferable:
+
+    - The **Gini Index** defined by $$G = \sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})$$ is a measure of total variance across the K classes
+    - The Gini index takes on a small value if all of the $\hat{p}_{mk}$'s are close to 0 or 1
+    - For this reason the Gini index is referred to as a measure of node **purity** - a small value indicates that a node contains predominantly observations from a single class
+    - An alternative to the Gini index is **cross-entropy** given by
+  
+  $$D = - \sum_{k=1}^{K}\hat{p}_{mk}\log(\hat{p}_{mk})$$
+
+- The Gini index and cross-entropy are very similar numerically
+
+## Example: classification tree (Heart dataset)
+
+- Data contain a binary outcome HD (heart disease Y or N based on angiographic test) for 303 patients who presented with chest pain
+- 13 predictors including Age, Sex, Chol (a cholesterol measurement), and other heart and lung function measurements
+- Cross-validation yields a tree with six terminal nodes
+```{r}
+#| label: 08-heart-dataet-cross-valudation
+#| echo: false
+#| out-width: 100%
+#| fig-cap: 'Heart data. Top: The unpruned tree. Bottom Left: Cross-validation error, training, and test error, for different sizes of the pruned tree. Bottom Right: The pruned tree corresponding to the minimal cross-validation error.'
+knitr::include_graphics("images/08_7_classif_tree_heart.png")
+```
+
+- **Comment**: Classification trees can be constructed if categorical PREDICTORS are present e.g., the first split: Thal is categorical (the 'a' in Thal:a indicates the first level of the predictor, i.e. Normal levels)
+- Additionally, notice that some of the splits yield two terminal nodes that have the same predicted value (see red box)
+- Regardless of the value of RestECG, a response value of *Yes* is predicted for those observations
+- Why is the split performed at all?
+  - Because it leads to increased node purity: all 9 of the observations corresponding to the right-hand leaf have a response value of *Yes*, whereas 7/11 of those corresponding to the left-hand leaf have a response value of *Yes*
+- Why is node purity important?
+  - Suppose that we have a test observation that belongs to the region given by that right-hand leaf. Then we can be pretty certain that its response value is *Yes*. In contrast, if a test observation belongs to the region given by the left-hand leaf, then its response value is **probably** *Yes*, but we are much less certain
+- Even though the split RestECG\<1 does not reduce the classification error, it improves the Gini index and the entropy, which are more sensitive to node purity
+
+## Advantages/Disadvantages of decision trees
+
+- Trees can be displayed graphically and are **very easy to explain** to people
+- They mirror human decision-making
+- Can handle qualitative predictors without the need for dummy variables
+
+but,
+
+- They do not have the same level of predictive accuracy
+- Can be very non-robust (i.e., a small change in the data can cause large change in the final estimated tree)
+- To improve performance, we can use an **ensemble** method, which combines many simple 'buidling blocks' (i.e., regression or classification trees) to obtain a single and potentially very powerful model
+- **ensemble** methods include: bagging, random forests, boosting, and Bayesian additive regression trees
+
+## Bagging
+
+- Also known as **bootstrap aggregation** is a general-purpose procedure for reducing the variance of a statistical learning method
+- It's useful and frequently used in the context of decision trees
+- Recall that given a set of $n$ independent observations $Z_1,..., Z_n$, each with variance $\sigma^2$, the variance of the mean $\bar{Z}$ of the observations is given by $\sigma^2/n$
+- So, **averaging a set of observations** reduces variance
+- But, this is not practical because we generally do not have access to multiple training sets!
+- What can we do?
+
+- Cue the bootstrap, i.e., take repeated samples from the single training set
+- Generate $B$ different bootstrapped training data set
+- Then train our method on the $b$th bootstrapped training set to get $\hat{f}^{*b}$, the prediction at a point x
+- Average all the predictions to obtain $$\hat{f}_{bag}(x) = \frac{1}{B}\sum_{b=1}^B\hat{f}^{*b}(x)$$
+- In the case of classification trees:
+  - for each test observation:
+    - record the class predicted by each of the $B$ trees
+    - take a **majority vote**: the overall prediction is the most commonly occurring class among the $B$ predictions
+
+**Comment**: The number of trees $B$ is not a critical parameter with bagging - a large $B$ will not lead to overfitting
+
+## Out-of-bag error estimation
+
+- But how do we estimate the test error of a bagged model?
+- It's pretty straightforward:
+  1. Because trees are repeatedly fit to bootstrapped subsets of observations, on average each bagged tree uses about 2/3 of the observations
+  2. The leftover 1/3 not used to fit a given bagged tree are called **out-of-bag** (OOB) observations
+  3. We can predict the response for the $i$th observation using each of the trees in which that observation was OOB. Gives around B/3 predictions for the $i$th observation (which we then average)
+  4. This estimate is essentially the LOO cross-validation error for bagging (if $B$ is large)
+
+## Variable importance measures
+
+- Bagging results in improved accuracy over prediction using a single tre
+- But, it can be difficult to interpret the resulting model:
+  - we can't represent the statistical learning procedure using a single tree
+  - it's not clear which variables are most important to the procedure (i.e., we have many trees each of which may give a differing view on the importance of a given predictor)
+- So, which predictors are important?
+  - An overall summary of the importance of each predictor can be achieved by recording how much the average $RSS$ or Gini index **improves (or decreases)** when each tree is split over a given predictor (averaged over all $B$ trees)
+    - a large value = important predictor
+
+```{r}
+#| label: 08-variable-importance
+#| echo: false
+#| out-width: 100%
+#| fig-cap: 'A variable importance plot for the Heart data. Variable importance is computed using the mean decrease in Gini index, and expressed relative to the maximum.'
+knitr::include_graphics("images/08_8_var_importance.png")
+```
+
+## Random forests
+
+- A problem with bagging is that bagged trees may be **highly similar** to each other.
+- For example, if there is a strong predictor in the data set, most of the bagged trees will **use this strong predictor** in the top split so that
+  - the trees will look quite similar
+  - predictions from the bagged trees will be highly correlated
+- Averaging many highly correlated quantities does not lead to as large a reduction in variance as averaging many uncorrelated quantities
+
+## Random forests: advantages over bagging
+
+- Random forests overcome this problem by forcing each split to consider only a **subset** of the predictors (typically a random sample $m \approx \sqrt{p}$)
+- Thus at each split, the algorithm is NOT ALLOWED to consider a majority of the available predictors (essentially $(p - m)/p$ of the splits will not even consider the strong predictor, giving other predictors a chance)
+- This *decorrelates* the trees and makes the average of the resulting trees less variable (more reliable)
+- Only difference between bagging and random forests is the choice of predictor subset size $m$ at each split: if a random forest is built using $m = p$ that's just bagging
+- For both, we build a number of decision trees on bootstrapped training samples
+
+## Example: Random forests versus bagging (gene expression data)
+
+- High-dimensional biological data set: contains gene expression measurements of 4,718 genes measured on tissue samples from 349 patients
+- Each of the patient samples has a qualitative label with 15 different levels: *Normal* or one of 14 different cancer types
+- Want to predict cancer type based on the 500 genes that have the largest variance in the training set
+- Randomly divided the observations into training/test and applied random forests (or bagging) to the training set for 3 different values of $m$ (the number of predictors available at each split)
+
+```{r}
+#| label: 08-random-forest
+#| echo: false
+#| out-width: 100%
+#| fig-cap: 'Results from random forests for the 15-class gene expression data set with p = 500 predictors. The test error is displayed as a function of the number of trees. Random forests (m < p) lead to a slight improvement over bagging (m = p). A single classification tree has an error rate of 45.7%.'
+knitr::include_graphics("images/08_9_rand_forest_gene_exp.png")
+```
+
+## Boosting
+
+- Yet another approach to improve prediction accuracy from a decision tree
+- Can also be applied to many statistical learning methods for regression or classification
+- Recall that in bagging each tree is built on a bootstrap training data set
+- In boosting, each tree is grown sequentially using information from previously grown trees:
+  - given the current model, we fit a decision tree to the residuals of the model (rather than the outcome *Y*) as the response
+  - we then add this new decision tree into the fitted function (model) in order to update the residuals
+  - Why? this way each tree is built on information that the previous trees were unable to 'catch'
+
+## Boosting algorithm
+
+```{r}
+#| label: 08-boosting-algo
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_10_boosting_algorithm.png")
+```
+
+where:
+
+$\hat{f}(x)$ is the decision tree (model)
+
+$r$ = residuals
+
+$d$ = number of splits in each tree (controls the complexity of the boosted ensemble)
+
+$\lambda$ = shrinkage parameter (a small positive number that controls the rate at which boosting learns; typically 0.01 or 0.001 but right choice can depend on the problem)
+
+- Each of the trees can be small, with just a few terminal nodes (determined by $d$)
+- By fitting small trees to the residuals, we slowly improve our model ($\hat{f}$) in areas where it doesn't perform well
+- The shrinkage parameter $\lambda$ slows the process down further, allowing more and different shaped trees to 'attack' the residuals
+- Unlike bagging and random forests, boosting can OVERFIT if $B$ is too large. $B$ is selected via cross-validation
+
+## Example: Boosting versus random forests
+
+```{r}
+#| label: 08-boosting-vs-rf
+#| echo: false
+#| out-width: 100%
+#| fig-cap: 'Results from performing boosting and random forests on the 15-class gene expression data set in order to predict cancer versus normal. The test error is displayed as a function of the number of trees. For the two boosted models, lambda = 0.01. Depth-1 trees slightly outperform depth-2 trees, and both outperform the random forest, although the standard errors are around 0.02, making none of these differences significant. The test error rate for a single tree is 24 %.'
+knitr::include_graphics("images/08_11_boosting_gene_exp_data.png")
+```
+
+- Notice that because the growth of a particular tree takes into account the other trees that have already been grown, smaller trees are typically sufficient in boosting (versus random forests)
+- Random forests and boosting are among the state-of-the-art methods for supervised learning (but, their results can be difficult to interpret)
+
+## Bayesian additive regression trees (BART)
+
+- Recall that in bagging and random forests, each tree is built on a **random sample of data and/or predictors** and each tree is built **independently** of the others
+- BART is related to both - what is new is HOW the new trees are generated
+- **NOTE**: only BART for regression is described in the book
+
+## BART notation
+
+- Let $K$ be the total **number of regression trees** and
+- $B$ be the **number of iterations** the BART algorithm will run for
+- Let $\hat{f}^b_k(x)$ be the **prediction** at $x$ for the $k$th regression tree used in the $b$th iteration of the BART algorithm
+- At the end of each iteration, the $K$ trees from that iteration will be summed:
+
+$$\hat{f}^b(x) = \sum_{k=1}^{K}\hat{f}^b_k(x)$$ for $b=1,...,B$
+
+## BART algorithm
+
+- In the first iteration of the BART algorithm, all $K$ trees are initialized to have 1 root node, with $\hat{f}^1_k(x) = \frac{1}{nK}\sum_{i=1}^{n}y_i$
+  - i.e., the mean of the response values divided by the total number of trees
+- Thus, for the first iteration ($b = 1$), the prediction for all $K$ trees is just the mean of the response
+
+$\hat{f}^1(x) = \sum_{k=1}^K\hat{f}^1_k(x) = \sum_{k=1}^K\frac{1}{nK}\sum_{i=1}^{n}y_i = \frac{1}{n}\sum_{i=1}^{n}y_i$
+
+## BART algorithm: iteration 2 and on
+
+- In subsequent iterations, BART updates each of the $K$ trees one at a time
+- In the $b$th iteration to update the $k$th tree, we subtract from each response value the predictions from all but the $k$th tree, to obtain a partial residual:
+
+$r_i = y_i - \sum_{k'<k}\hat{f}^b_{k'}(x_i) - \sum_{k'>k}\hat{f}^{b-1}_{k'}(x_i)$
+
+for the $i$th observation, $i = 1, …, n$
+
+- Rather than fitting a new tree to this partial residual, BART chooses a perturbation to the tree from a previous iteration $\hat{f}^{b-1}_{k}$ favoring perturbations that improve the fit to the partial residual
+- To perturb trees:
+  - change the structure of the tree by adding/pruning branches
+  - change the prediction in each terminal node of the tree
+- The output of BART is a collection of prediction models:
+
+$\hat{f}^b(x) = \sum_{k=1}^{K}\hat{f}^b_k(x)$
+
+for $b = 1, 2,…, B$
+
+## BART algorithm: figure
+
+```{r}
+#| label: 08-bart-algo
+#| echo: false
+#| out-width: 100%
+knitr::include_graphics("images/08_12_bart_algorithm.png")
+```
+
+- **Comment**: the first few prediction models obtained in the earlier iterations (known as the $burn-in$ period; denoted by $L$) are typically thrown away since they tend to not provide very good results, like you throw away the first pancake of the batch
+
+## BART: additional details
+
+- A key element of BART is that a fresh tree is NOT fit to the current partial residual: instead, we improve the fit to the current partial residual by slightly modifying the tree obtained in the previous iteration (Step 3(a)ii)
+- This guards against overfitting since it limits how "hard" the data is fit in each iteration
+- Additionally, the individual trees are typically pretty small
+- BART, as the name suggests, can be viewed as a *Bayesian* approach to fitting an ensemble of trees:
+  - each time a tree is randomly perturbed to fit the residuals = drawing a new tree from a *posterior* distribution
+
+## To apply BART:
+
+- We must select the number of trees $K$, the number of iterations $B$ and the number of burn-in iterations $L$
+- Typically, large values are chosen for $B$ and $K$ and a moderate value for $L$: e.g. $K$ = 200, $B$ = 1,000 and $L$ = 100
+- BART has been shown to have impressive out-of-box performance - i.e., it performs well with minimal tuning
diff --git a/_freeze/03_notes/execute-results/html.json b/_freeze/03_notes/execute-results/html.json
index 09e50a6..a753dd4 100644
--- a/_freeze/03_notes/execute-results/html.json
+++ b/_freeze/03_notes/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "3e85649170309cf3a5e235e9d4c6c1b8",
+  "hash": "08bcf4f45674dfc276d50a821a693bbd",
   "result": {
-    "markdown": "# Notes {-}\n\n## Questions to Answer\n\nRecall the `Advertising` data from **Chapter 2**. Here are a few important questions that we might seek to address:\n\n1. **Is there a relationship between advertising budget and sales?**\n2. **How strong is the relationship between advertising budget and sales?** Does knowledge of the advertising budget provide a lot of information about product sales?\n3. **Which media are associated with sales?**\n4. **How large is the association between each medium and sales?** For every dollar spent on advertising in a particular medium, by what amount will sales increase? \n5. **How accurately can we predict future sales?**\n6. **Is the relationship linear?** If there is approximately a straight-line relationship between advertising expenditure in the various media and sales, then linear regression is an appropriate tool. If not, then it may still be possible to transform the predictor or the response so that linear regression can be used.\n7. **Is there synergy among the advertising media?** Or, in stats terms, is there an interaction effect?\n\n## Simple Linear Regression: Definition\n\n**Simple linear regression:** Very straightforward approach to predicting response $Y$ on predictor $X$.\n\n\n$$Y \\approx \\beta_{0} + \\beta_{1}X$$\n\n\n- Read \"$\\approx$\" as *\"is approximately modeled by.\"*\n- $\\beta_{0}$ = intercept\n- $\\beta_{1}$ = slope\n\n\n$$\\hat{y} = \\hat{\\beta}_{0} + \\hat{\\beta}_{1}x$$\n\n\n- $\\hat{\\beta}_{0}$ = our approximation of intercept\n- $\\hat{\\beta}_{1}$ = our approximation of slope\n- $x$ = sample of $X$\n- $\\hat{y}$ = our prediction of $Y$ from $x$\n- hat symbol denotes \"estimated value\" \n\n- Linear regression is a simple approach to supervised learning\n\n## Simple Linear Regression: Visualization\n\n\n::: {.cell}\n::: {.cell-output-display}\n![For the `Advertising` data, the least squares fit for the regression of `sales` onto `TV` is shown. The fit is found by minimizing the residual sum of squares. Each grey line segment represents a residual. In this case a linear fit captures the essence of the relationship, although it overestimates the trend in the left of the plot.](images/fig3_1.jpg){width=100%}\n:::\n:::\n\n\n## Simple Linear Regression: Math\n\n- **RSS** = *residual sum of squares*\n\n\n$$\\mathrm{RSS} = e^{2}_{1} + e^{2}_{2} + \\ldots + e^{2}_{n}$$\n\n$$\\mathrm{RSS} = (y_{1} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{1})^{2} + (y_{2} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{2})^{2} + \\ldots + (y_{n} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{n})^{2}$$\n\n$$\\mathrm{RSS} = (y_{1} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{1})^{2} + (y_{2} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{2})^{2} + \\ldots + (y_{n} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{n})^{2}$$\n\n$$\\hat{\\beta}_{0} = \\bar{y} - \\hat{\\beta}_{1}\\bar{x}$$\n\n\n- $\\bar{x}$, $\\bar{y}$ = sample means of $x$ and $y$\n\n### Visualization of Fit\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Contour and three-dimensional plots of the RSS on the `Advertising` data, using `sales` as the response and `TV` as the predictor. The red dots correspond to the least squares estimates $\\\\hat\\\\beta_0$ and $\\\\hat\\\\beta_1$, given by (3.4).](images/fig3_2.jpg){width=100%}\n:::\n:::\n\n\n**Learning Objectives:**\n\n- Perform linear regression with a **single predictor variable.**\n\n## Assessing Accuracy of Coefficient Estimates\n\n\n$$Y = \\beta_{0} + \\beta_{1}X + \\epsilon$$\n\n\n- **RSE** = *residual standard error*\n- Estimate of $\\sigma$\n\n\n$$\\mathrm{RSE} = \\sqrt{\\frac{\\mathrm{RSS}}{n - 2}}$$\n\n$$\\mathrm{SE}(\\hat\\beta_0)^2 = \\sigma^2 \\left[\\frac{1}{n} + \\frac{\\bar{x}^2}{\\sum_{i=1}^n (x_i - \\bar{x})^2}\\right],\\ \\ \\mathrm{SE}(\\hat\\beta_1)^2 = \\frac{\\sigma^2}{\\sum_{i=1}^n (x_i - \\bar{x})^2}$$\n\n\n- **95% confidence interval:** a range of values such that with 95% probability, the range will contain the true unknown value of the parameter\n  - If we take repeated samples and construct the confidence interval for each sample, 95% of the intervals will contain the true unknown value of the parameter\n\n\n$$\\hat\\beta_1 \\pm 2\\ \\cdot \\ \\mathrm{SE}(\\hat\\beta_1)$$\n\n$$\\hat\\beta_0 \\pm 2\\ \\cdot \\ \\mathrm{SE}(\\hat\\beta_0)$$\n\n\n**Learning Objectives:**\n\n- Estimate the **standard error** of regression coefficients.\n\n## Assessing the Accuracy of the Model\n\n- **RSE** can be considered a measure of the *lack of fit* of the model. \na\n- *$R^2$* statistic (also called coefficient of determination) provides an alternative that is in the form of a *proportion of the variance explained*, ranges from 0 to 1, a *good value* depends on the application.\n\n\n$$R^2 = 1 - \\frac{RSS}{TSS}$$\n\n\nwhere TSS is the *total sum of squarse*:\n\n$$TSS = \\Sigma (y_i - \\bar{y})^2$$\n\n\nQuiz: Can *$R^2$* be negative?\n\n[Answer](https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative/)\n\n## Multiple Linear Regression\n\n**Multiple linear regression** extends simple linear regression for *p* predictors:\n\n\n$$Y = \\beta_{0} + \\beta_{1}X_1 + \\beta_{2}X_2 + ... +\\beta_{p}X_p + \\epsilon_i$$\n\n\n- $\\beta_{j}$ is the *average* effect on $Y$ from $X_{j}$ holding all other predictors fixed.  \n\n- Fit is once again choosing the $\\beta_{j}$ that minimizes the RSS.\n\n- Example in book shows that although fitting *sales* against *newspaper* alone indicated a significant slope (0.055 +- 0.017), when you include *radio* in a multiple regression, *newspaper* no longer has any significant effect. (-0.001 +- 0.006) \n\n### Important Questions\n\n1. *Is at least one of the predictors $X_1$, $X_2$,  ... , $X_p$ useful in predicting\nthe response?*\n\n    F statistic close to 1 when there is no relationship, otherwise greater then 1.\n\n\n$$F = \\frac{(TSS-RSS)/p}{RSS/(n-p-1)}$$\n\n\n2. *Do all the predictors help to explain $Y$ , or is only a subset of the\npredictors useful?*\n\n   p-values can help identify important predictors, but it is possible to be mislead by this especially with large number of predictors. Variable selection methods include Forward selection, backward selection and mixed. Topic is continued in Chapter 6.\n\n3. *How well does the model fit the data?*\n\n    **$R^2$** still gives *proportion of the variance explained*, so look for values \"close\" to 1. Can also look at **RSE** which is generalized for multiple regression as:\n    \n\n$$RSE = \\sqrt{\\frac{1}{n-p-1}RSS}$$\n\n\n4. *Given a set of predictor values, what response value should we predict,\nand how accurate is our prediction?* \n\n    Three sets of uncertainty in predictions:\n    \n    * Uncertainty in the estimates of $\\beta_i$\n    * Model bias\n    * Irreducible error $\\epsilon$\n\n## Qualitative Predictors\n\n* Dummy variables: if there are $k$ levels, introduce $k-1$ dummy variables which are equal to one (\"one hot\") when the underlying qualitative predictor takes that value. For example if there are 3 levels, introduce two new dummy variables and fit the model:\n\n\n$$y_i = \\beta_0 + \\beta_1 x_{i1} + \\beta_2 x_{i2} + \\epsilon_i$$\n\n\n| Qualitative Predicitor | $x_{i1}$ | $x_{i2}$ |\n| ---------------------- |:--------:|:--------:|\n| level 0    (baseline)  |    0     |    0     |\n| level 1                |    1     |    0     |\n| level 2                |    0     |    1     |\n\n* Coefficients are interpreted the average effect relative to the baseline.\n\n* Alternative is to use index variables, a different coefficient for each level:\n\n\n$$y_i = \\beta_{0 1} + \\beta_{0 2} +\\beta_{0 3} + \\epsilon_i$$\n\n\n## Extensions\n\n- Interaction / Synergy effects\n    \n    Include a product term to account for synergy where one changes in one variable changes the association of the Y with another:\n    \n\n$$Y = \\beta_{0} + \\beta_{1}X_1 + \\beta_{2}X_2 +  \\beta_{3}X_1 X_2 + \\epsilon_i$$\n\n\n- Non-linear relationships (e.g. polynomial fits)\n\n\n$$Y = \\beta_{0} + \\beta_{1}X + \\beta_{2}X^2 + ... \\beta_{n}X^n + \\epsilon_i$$\n\n\n## Potential Problems\n\n1. *Non-linear relationships* \n\n    Residual plots are useful tool to see if any remaining trends exist. If so consider fitting transformation of the data. \n    \n2. *Correlation of Error Terms*\n\n    Linear regression assumes that the error terms $\\epsilon_i$ are uncorrelated. Residuals may indicate that this is not correct (obvious *tracking* in the data). One could also look at the autocorrelation of the residuals. What to do about it?\n    \n3. *Non-constant variance of error terms*\n\n    Again this can be revealed by examining the residuals.  Consider transformation of the predictors to remove non-constant variance. The figure below shows residuals demonstrating non-constant variance, and shows this being mitigated to a great extent by log transforming the data.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Figure 3.11](images/fig3_11.png){width=100%}\n:::\n:::\n\n\n4. *Outliers*\n\n   - Outliers are points with for which $y_i$ is far from value predicted by the model (including irreducible error).  See point labeled '20' in figure 3.13.\n   - Detect outliers by plotting studentized residuals (residual $e_i$ divided by the estimated error) and look for residuals larger then 3 standard deviations in absolute value.\n   - An outlier may not effect the fit much but can have dramatic effect on the **RSE**. \n   - Often outliers are mistakes in data collection and can be removed, but could also be an indicator of a deficient model.  \n\n5. *High Leverage Points* \n\n   - These are points with unusual values of $x_i$.  Examples is point labeled '41' in figure 3.13.\n   - These points can have large impact on the fit, as in the example, including point 41 pulls slope up significantly.\n   - Use *leverage statistic* to identify high leverage points, which can be hard to identify in multiple regression.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Figure 3.13](images/fig3_13.png){width=100%}\n:::\n:::\n\n\n6. *Collinearity*\n\n   - Two or more predictor variables are closely related to one another.\n   - Simple collinearity can be identified by looking at correlations between predictors. \n   - Causes the standard error to grow (and p-values to grow)\n   - Often can be dealt with by removing one of the highly correlated predictors or combining them. \n   - *Multicollinearity* (involving 3 or more predictors) is not so easy to identify. Use *Variance inflation factor*, which is the ratio of the variance of $\\hat{\\beta_j}$ when fitting the full model to fitting the parameter on its own. Can be computed using the formula:\n    \n\n$$VIF(\\hat{\\beta_j}) = \\frac{1}{1-R^2_{X_j|X_{-j}}}$$\n\n\nwhere $R^2_{X_j|X_{-j}}$ is the $R^2$ from a regression of $X_j$ onto all the other predictors.\n\n## Answers to the Marketing Plan questions\n\n1. **Is there a relationship between advertising budget and sales?**\n\n    Tool: Multiple regression, look at F-statistic.\n\n2. **How strong is the relationship between advertising budget and sales?** \n\n    Tool: **$R^2$** and **RSE**\n    \n3. **Which media are associated with sales?**\n \n    Tool: p-values for each predictor's *t-statistic*.  Explored further in chapter 6.\n\n4. **How large is the association between each medium and sales?**\n\n    Tool: Confidence intervals on $\\hat{\\beta_j}$\n\n5. **How accurately can we predict future sales?**\n\n    Tool:: Prediction intervals for individual response, confidence intervals for average response.\n    \n    \n6. **Is the relationship linear?** \n\n    Tool: Residual Plots\n    \n7. **Is there synergy among the advertising media?** \n\n    Tool: Interaction terms and associated p-vales.\n\n## Comparison of Linear Regression with K-Nearest Neighbors\n\n- This section examines the K-nearest neighbor (KNN) method (a non-parameteric method).\n- This is essentially a k-point moving average.\n- This serves to illustrate the Bias-Variance trade-off nicely.\n\n",
+    "markdown": "# Notes {-}\n\n## Questions to Answer\n\nRecall the `Advertising` data from **Chapter 2**. Here are a few important questions that we might seek to address:\n\n1. **Is there a relationship between advertising budget and sales?**\n2. **How strong is the relationship between advertising budget and sales?** Does knowledge of the advertising budget provide a lot of information about product sales?\n3. **Which media are associated with sales?**\n4. **How large is the association between each medium and sales?** For every dollar spent on advertising in a particular medium, by what amount will sales increase? \n5. **How accurately can we predict future sales?**\n6. **Is the relationship linear?** If there is approximately a straight-line relationship between advertising expenditure in the various media and sales, then linear regression is an appropriate tool. If not, then it may still be possible to transform the predictor or the response so that linear regression can be used.\n7. **Is there synergy among the advertising media?** Or, in stats terms, is there an interaction effect?\n\n## Simple Linear Regression: Definition\n\n**Simple linear regression:** Very straightforward approach to predicting response $Y$ on predictor $X$.\n\n\n$$Y \\approx \\beta_{0} + \\beta_{1}X$$\n\n\n- Read \"$\\approx$\" as *\"is approximately modeled by.\"*\n- $\\beta_{0}$ = intercept\n- $\\beta_{1}$ = slope\n\n\n$$\\hat{y} = \\hat{\\beta}_{0} + \\hat{\\beta}_{1}x$$\n\n\n- $\\hat{\\beta}_{0}$ = our approximation of intercept\n- $\\hat{\\beta}_{1}$ = our approximation of slope\n- $x$ = sample of $X$\n- $\\hat{y}$ = our prediction of $Y$ from $x$\n- hat symbol denotes \"estimated value\" \n\n- Linear regression is a simple approach to supervised learning\n\n## Simple Linear Regression: Visualization\n\n\n::: {.cell}\n::: {.cell-output-display}\n![For the `Advertising` data, the least squares fit for the regression of `sales` onto `TV` is shown. The fit is found by minimizing the residual sum of squares. Each grey line segment represents a residual. In this case a linear fit captures the essence of the relationship, although it overestimates the trend in the left of the plot.](images/fig3_1.jpg){width=100%}\n:::\n:::\n\n\n## Simple Linear Regression: Math\n\n- **RSS** = *residual sum of squares*\n\n\n$$\\mathrm{RSS} = e^{2}_{1} + e^{2}_{2} + \\ldots + e^{2}_{n}$$\n\n$$\\mathrm{RSS} = (y_{1} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{1})^{2} + (y_{2} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{2})^{2} + \\ldots + (y_{n} - \\hat{\\beta}_{0} - \\hat{\\beta}_{1}x_{n})^{2}$$\n\n$$\\hat{\\beta}_{1} = \\frac{\\sum_{i=1}^{n}{(x_{i}-\\bar{x})(y_{i}-\\bar{y})}}{\\sum_{i=1}^{n}{(x_{i}-\\bar{x})^{2}}}$$\n\n$$\\hat{\\beta}_{0} = \\bar{y} - \\hat{\\beta}_{1}\\bar{x}$$\n\n\n- $\\bar{x}$, $\\bar{y}$ = sample means of $x$ and $y$\n\n### Visualization of Fit\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Contour and three-dimensional plots of the RSS on the `Advertising` data, using `sales` as the response and `TV` as the predictor. The red dots correspond to the least squares estimates $\\\\hat\\\\beta_0$ and $\\\\hat\\\\beta_1$, given by (3.4).](images/fig3_2.jpg){width=100%}\n:::\n:::\n\n\n**Learning Objectives:**\n\n- Perform linear regression with a **single predictor variable.**\n\n## Assessing Accuracy of Coefficient Estimates\n\n\n$$Y = \\beta_{0} + \\beta_{1}X + \\epsilon$$\n\n\n- **RSE** = *residual standard error*\n- Estimate of $\\sigma$\n\n\n$$\\mathrm{RSE} = \\sqrt{\\frac{\\mathrm{RSS}}{n - 2}}$$\n\n$$\\mathrm{SE}(\\hat\\beta_0)^2 = \\sigma^2 \\left[\\frac{1}{n} + \\frac{\\bar{x}^2}{\\sum_{i=1}^n (x_i - \\bar{x})^2}\\right],\\ \\ \\mathrm{SE}(\\hat\\beta_1)^2 = \\frac{\\sigma^2}{\\sum_{i=1}^n (x_i - \\bar{x})^2}$$\n\n\n- **95% confidence interval:** a range of values such that with 95% probability, the range will contain the true unknown value of the parameter\n  - If we take repeated samples and construct the confidence interval for each sample, 95% of the intervals will contain the true unknown value of the parameter\n\n\n$$\\hat\\beta_1 \\pm 2\\ \\cdot \\ \\mathrm{SE}(\\hat\\beta_1)$$\n\n$$\\hat\\beta_0 \\pm 2\\ \\cdot \\ \\mathrm{SE}(\\hat\\beta_0)$$\n\n\n**Learning Objectives:**\n\n- Estimate the **standard error** of regression coefficients.\n\n## Assessing the Accuracy of the Model\n\n- **RSE** can be considered a measure of the *lack of fit* of the model. \na\n- *$R^2$* statistic (also called coefficient of determination) provides an alternative that is in the form of a *proportion of the variance explained*, ranges from 0 to 1, a *good value* depends on the application.\n\n\n$$R^2 = 1 - \\frac{RSS}{TSS}$$\n\n\nwhere TSS is the *total sum of squarse*:\n\n$$TSS = \\Sigma (y_i - \\bar{y})^2$$\n\n\nQuiz: Can *$R^2$* be negative?\n\n[Answer](https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative/)\n\n## Multiple Linear Regression\n\n**Multiple linear regression** extends simple linear regression for *p* predictors:\n\n\n$$Y = \\beta_{0} + \\beta_{1}X_1 + \\beta_{2}X_2 + ... +\\beta_{p}X_p + \\epsilon_i$$\n\n\n- $\\beta_{j}$ is the *average* effect on $Y$ from $X_{j}$ holding all other predictors fixed.  \n\n- Fit is once again choosing the $\\beta_{j}$ that minimizes the RSS.\n\n- Example in book shows that although fitting *sales* against *newspaper* alone indicated a significant slope (0.055 +- 0.017), when you include *radio* in a multiple regression, *newspaper* no longer has any significant effect. (-0.001 +- 0.006) \n\n### Important Questions\n\n1. *Is at least one of the predictors $X_1$, $X_2$,  ... , $X_p$ useful in predicting\nthe response?*\n\n    F statistic close to 1 when there is no relationship, otherwise greater then 1.\n\n\n$$F = \\frac{(TSS-RSS)/p}{RSS/(n-p-1)}$$\n\n\n2. *Do all the predictors help to explain $Y$ , or is only a subset of the\npredictors useful?*\n\n   p-values can help identify important predictors, but it is possible to be mislead by this especially with large number of predictors. Variable selection methods include Forward selection, backward selection and mixed. Topic is continued in Chapter 6.\n\n3. *How well does the model fit the data?*\n\n    **$R^2$** still gives *proportion of the variance explained*, so look for values \"close\" to 1. Can also look at **RSE** which is generalized for multiple regression as:\n    \n\n$$RSE = \\sqrt{\\frac{1}{n-p-1}RSS}$$\n\n\n4. *Given a set of predictor values, what response value should we predict,\nand how accurate is our prediction?* \n\n    Three sets of uncertainty in predictions:\n    \n    * Uncertainty in the estimates of $\\beta_i$\n    * Model bias\n    * Irreducible error $\\epsilon$\n\n## Qualitative Predictors\n\n* Dummy variables: if there are $k$ levels, introduce $k-1$ dummy variables which are equal to one (\"one hot\") when the underlying qualitative predictor takes that value. For example if there are 3 levels, introduce two new dummy variables and fit the model:\n\n\n$$y_i = \\beta_0 + \\beta_1 x_{i1} + \\beta_2 x_{i2} + \\epsilon_i$$\n\n\n| Qualitative Predicitor | $x_{i1}$ | $x_{i2}$ |\n| ---------------------- |:--------:|:--------:|\n| level 0    (baseline)  |    0     |    0     |\n| level 1                |    1     |    0     |\n| level 2                |    0     |    1     |\n\n* Coefficients are interpreted the average effect relative to the baseline.\n\n* Alternative is to use index variables, a different coefficient for each level:\n\n\n$$y_i = \\beta_{0 1} + \\beta_{0 2} +\\beta_{0 3} + \\epsilon_i$$\n\n\n## Extensions\n\n- Interaction / Synergy effects\n    \n    Include a product term to account for synergy where one changes in one variable changes the association of the Y with another:\n    \n\n$$Y = \\beta_{0} + \\beta_{1}X_1 + \\beta_{2}X_2 +  \\beta_{3}X_1 X_2 + \\epsilon_i$$\n\n\n- Non-linear relationships (e.g. polynomial fits)\n\n\n$$Y = \\beta_{0} + \\beta_{1}X + \\beta_{2}X^2 + ... \\beta_{n}X^n + \\epsilon_i$$\n\n\n## Potential Problems\n\n1. *Non-linear relationships* \n\n    Residual plots are useful tool to see if any remaining trends exist. If so consider fitting transformation of the data. \n    \n2. *Correlation of Error Terms*\n\n    Linear regression assumes that the error terms $\\epsilon_i$ are uncorrelated. Residuals may indicate that this is not correct (obvious *tracking* in the data). One could also look at the autocorrelation of the residuals. What to do about it?\n    \n3. *Non-constant variance of error terms*\n\n    Again this can be revealed by examining the residuals.  Consider transformation of the predictors to remove non-constant variance. The figure below shows residuals demonstrating non-constant variance, and shows this being mitigated to a great extent by log transforming the data.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Figure 3.11](images/fig3_11.png){width=100%}\n:::\n:::\n\n\n4. *Outliers*\n\n   - Outliers are points with for which $y_i$ is far from value predicted by the model (including irreducible error).  See point labeled '20' in figure 3.13.\n   - Detect outliers by plotting studentized residuals (residual $e_i$ divided by the estimated error) and look for residuals larger then 3 standard deviations in absolute value.\n   - An outlier may not effect the fit much but can have dramatic effect on the **RSE**. \n   - Often outliers are mistakes in data collection and can be removed, but could also be an indicator of a deficient model.  \n\n5. *High Leverage Points* \n\n   - These are points with unusual values of $x_i$.  Examples is point labeled '41' in figure 3.13.\n   - These points can have large impact on the fit, as in the example, including point 41 pulls slope up significantly.\n   - Use *leverage statistic* to identify high leverage points, which can be hard to identify in multiple regression.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Figure 3.13](images/fig3_13.png){width=100%}\n:::\n:::\n\n\n6. *Collinearity*\n\n   - Two or more predictor variables are closely related to one another.\n   - Simple collinearity can be identified by looking at correlations between predictors. \n   - Causes the standard error to grow (and p-values to grow)\n   - Often can be dealt with by removing one of the highly correlated predictors or combining them. \n   - *Multicollinearity* (involving 3 or more predictors) is not so easy to identify. Use *Variance inflation factor*, which is the ratio of the variance of $\\hat{\\beta_j}$ when fitting the full model to fitting the parameter on its own. Can be computed using the formula:\n    \n\n$$VIF(\\hat{\\beta_j}) = \\frac{1}{1-R^2_{X_j|X_{-j}}}$$\n\n\nwhere $R^2_{X_j|X_{-j}}$ is the $R^2$ from a regression of $X_j$ onto all the other predictors.\n\n## Answers to the Marketing Plan questions\n\n1. **Is there a relationship between advertising budget and sales?**\n\n    Tool: Multiple regression, look at F-statistic.\n\n2. **How strong is the relationship between advertising budget and sales?** \n\n    Tool: **$R^2$** and **RSE**\n    \n3. **Which media are associated with sales?**\n \n    Tool: p-values for each predictor's *t-statistic*.  Explored further in chapter 6.\n\n4. **How large is the association between each medium and sales?**\n\n    Tool: Confidence intervals on $\\hat{\\beta_j}$\n\n5. **How accurately can we predict future sales?**\n\n    Tool:: Prediction intervals for individual response, confidence intervals for average response.\n    \n    \n6. **Is the relationship linear?** \n\n    Tool: Residual Plots\n    \n7. **Is there synergy among the advertising media?** \n\n    Tool: Interaction terms and associated p-vales.\n\n## Comparison of Linear Regression with K-Nearest Neighbors\n\n- This section examines the K-nearest neighbor (KNN) method (a non-parameteric method).\n- This is essentially a k-point moving average.\n- This serves to illustrate the Bias-Variance trade-off nicely.\n\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
diff --git a/_freeze/08_notes/execute-results/html.json b/_freeze/08_notes/execute-results/html.json
new file mode 100644
index 0000000..17482ad
--- /dev/null
+++ b/_freeze/08_notes/execute-results/html.json
@@ -0,0 +1,14 @@
+{
+  "hash": "ee34acabde75dba55f5aa78d7199e10e",
+  "result": {
+    "markdown": "# Notes {-}\n\n## Introduction: Tree-based methods\n\n- Involve **stratifying** or **segmenting** the predictor space into a number of simple regions\n- Are simple and useful for interpretation\n- However, basic decision trees are NOT competitive with the best supervised learning approaches in terms of prediction accuracy\n- Thus, we also discuss **bagging**, **random forests**, and **boosting** (i.e., tree-based ensemble methods) to grow multiple trees which are then combined to yield a single consensus prediction\n- These can result in dramatic improvements in prediction accuracy (but some loss of interpretability)\n- Can be applied to both regression and classification\n\n## Regression Trees\n\nFirst, let's take a look at `Hitters` dataset.\n\n::: {.cell}\n::: {.cell-output .cell-output-stderr}\n```\n\nAttaching package: 'dplyr'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following objects are masked from 'package:stats':\n\n    filter, lag\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following objects are masked from 'package:base':\n\n    intersect, setdiff, setequal, union\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nRows: 322 Columns: 21\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr  (4): Names, League, Division, NewLeague\ndbl (17): AtBat, Hits, HmRun, Runs, RBI, Walks, Years, CAtBat, CHits, CHmRun...\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 263 × 5\n   Names              Hits Years Salary log_Salary\n   <chr>             <dbl> <dbl>  <dbl>      <dbl>\n 1 -Alan Ashby          81    14  475         6.16\n 2 -Alvin Davis        130     3  480         6.17\n 3 -Andre Dawson       141    11  500         6.21\n 4 -Andres Galarraga    87     2   91.5       4.52\n 5 -Alfredo Griffin    169    11  750         6.62\n 6 -Al Newman           37     2   70         4.25\n 7 -Argenis Salazar     73     3  100         4.61\n 8 -Andres Thomas       81     2   75         4.32\n 9 -Andre Thornton      92    13 1100         7.00\n10 -Alan Trammell      159    10  517.        6.25\n# ℹ 253 more rows\n```\n:::\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_1_salary_data.png){width=100%}\n:::\n\n::: {.cell-output-display}\n![](images/08_2_basic_tree.png){width=100%}\n:::\n:::\n\n\n- For the Hitters data, a regression tree for predicting the log salary of a baseball player based on:\n\n    1. number of years that he has played in the major leagues\n    2. number of hits that he made in the previous year\n\n## Terminology\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_3_basic_tree_term.png){width=100%}\n:::\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![The three-region partition for the Hitters data set from the regression tree](images/08_4_hitters_predictor_space.png){width=100%}\n:::\n:::\n\n\n- Overall, the tree stratifies or segments the players into three regions of predictor space:\n  - R1 = {X \\| Years\\< 4.5}\n  - R2 = {X \\| Years\\>=4.5, Hits\\<117.5}\n  - R3 = {X \\| Years\\>=4.5, Hits\\>=117.5}\n  \n  where R1, R2, and R3 are **terminal nodes** (leaves) and green lines (where the predictor space is split) are the **internal nodes**\n\n- The number in each leaf/terminal node is the mean of the response for the observations that fall there\n\n## Interpretation of results: regression tree (Hitters data)\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_2_basic_tree.png){width=100%}\n:::\n:::\n\n\n1. `Years` is the most important factor in determining `Salary`: players with less experience earn lower salaries than more experienced players\n2. Given that a player is less experienced, the number of `Hits` that he made in the previous year seems to play little role in his `Salary`\n3. But among players who have been in the major leagues for 5 or more years, the number of Hits made in the previous year does affect Salary: players who made more Hits last year tend to have higher salaries\n4. This is surely an over-simplification, but compared to a regression model, it is easy to display, interpret and explain\n\n## Tree-building process (regression)\n\n1. Divide the predictor space --- that is, the set of possible values for $X_1,X_2, . . . ,X_p$ --- into $J$ distinct and **non-overlapping** regions, $R_1,R_2, . . . ,R_J$\n - Regions can have ANY shape - they don't have to be boxes\n2. For every observation that falls into the region $R_j$, we make the same prediction: the **mean** of the response values in $R_j$\n3. The goal is to find regions (here boxes) $R_1, . . . ,R_J$ that **minimize** the $RSS$, given by\n\n\n$$\\mathrm{RSS}=\\sum_{j=1}^{J}\\sum_{i{\\in}R_j}^{}(y_i - \\hat{y}_{R_j})^2$$\n\n\nwhere $\\hat{y}_{R_j}$ is the **mean** response for the training observations within the $j$th box\n\n- Unfortunately, it is **computationally infeasible** to consider every possible partition of the feature space into $J$ boxes.\n\n## Recursive binary splitting\n\nSo, take a top-down, greedy approach known as recursive binary splitting:\n\n- **top-down** because it begins at the top of the tree and then successively splits the predictor space\n- **greedy** because at each step of the tree-building process, the best split is made at that particular step, rather than looking ahead and picking a split that will lead to a better tree in some future step\n\n1. First, select the predictor $X_j$ and the cutpoint $s$ such that splitting the predictor space into the regions ${\\{X|X_j<s\\}}$ and ${\\{X|X_j{\\ge}s}\\}$ leads to the greatest possible reduction in RSS\n2. Repeat the process looking for the best predictor and best cutpoint to split data further (i.e., split one of the 2 previously identified regions - not the entire predictor space) minimizing the RSS within each of the resulting regions\n3. Continue until a stopping criterion is reached, e.g., no region contains more than five observations\n4. Again, we predict the response for a given test observation using the **mean of the training observations** in the region to which that test observation belongs\n\nbut ...\n\n- The previous method may result in a tree that **overfits** the data. Why?\n- Tree is too leafy (complex)\n- A better strategy is to have a smaller tree with fewer splits, which will reduce variance and lead to better interpretation of results (at the cost of a little bias)\n- So we will prune\n\n## Pruning a tree\n\n1. Grow a very large tree $T_0$ as before\n2. Apply cost-complexity pruning to $T_0$ to obtain a sequence of BEST subtrees, as a function of $\\alpha$\n\nCost complexity pruning minimizes (Eq. 8.4)\n$\\sum_{m=1}^{|T|}\\sum_{x_i{\\in}R_m}(y_i-\\hat{y}_{R_m})^2 + \\alpha|T|$\n\nwhere\n\n$\\alpha$ $\\geq$ 0\n\n$|T|$ is the number of **terminal nodes** the sub tree $|T|$ holds\n\n$R_m$ is the rectangle/region (i.e., the subset of predictor space) corresponding to the $m$th terminal node\n\n$\\hat{y}_{R_m}$ is the **mean** response for the training observations in $R_m$\n\n- the tuning parameter $\\alpha$ controls:\n\n    a. a trade-off between the subtree's complexity (the number of terminal nodes)\n    b. the subtree's fit to the training data\n\n3. Choose $\\alpha$ using K-fold cross-validation\n\n    - repeat steps 1) and 2) for each $K-1/K$th fraction of training data\n    - average the results and pick $\\alpha$ to minimize the average MSE\n    - recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged\n\n4. Return to the subtree from Step 2) that corresponds to the chosen value of $\\alpha$\n\n## An example: tree pruning (Hitters dataset)\n\n- Results of fitting and pruning a regression tree on the Hitters data using 9 of the features\n- Randomly divided the data set in half (132 observations in training, 131 observations in the test set)\n- Built large regression tree on training data and varied $\\alpha$ in Eq. 8.4 to create subtrees with different numbers of terminal nodes\n- Finally, performed 6-fold cross-validation to estimate the cross-validated MSE of the trees as a function of $\\alpha$\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_5_hitters_unpruned_tree.png){width=100%}\n:::\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![Training, cross-validation, and test MSE are shown as a function of the number of terminal nodes in the pruned tree. Standard error bands are displayed. The minimum cross-validation error occurs at a tree size of 3.](images/08_6_hitters_mse.png){width=100%}\n:::\n:::\n\n\n## Classification trees\n\n- Very similar to a regression tree except it predicts a qualitative (vs quantitative) response\n- We predict that each observation belongs to the **most commonly occurring class** of training observations in the region to which it belongs\n- In the classification setting, RSS cannot be used as a criterion for making the binary splits\n- A natural alternative to RSS is the classification **error rate**, i.e., the fraction of the training observations in that region that do not belong to the most common class:\n\n\n$$E = 1 - \\max_k(\\hat{p}_{mk})$$\n\n\nwhere $\\hat{p}_{mk}$ is the **proportion of training observations** in the $m$th region that are from the $k$th class\n\n- However, this error rate is unsuited for tree-based classification because $E$ does not change much as the tree grows (**lacks sensitivity**)\n- So, 2 other measures are preferable:\n\n    - The **Gini Index** defined by $$G = \\sum_{k=1}^{K}\\hat{p}_{mk}(1-\\hat{p}_{mk})$$ is a measure of total variance across the K classes\n    - The Gini index takes on a small value if all of the $\\hat{p}_{mk}$'s are close to 0 or 1\n    - For this reason the Gini index is referred to as a measure of node **purity** - a small value indicates that a node contains predominantly observations from a single class\n    - An alternative to the Gini index is **cross-entropy** given by\n  \n\n  $$D = - \\sum_{k=1}^{K}\\hat{p}_{mk}\\log(\\hat{p}_{mk})$$\n\n\n- The Gini index and cross-entropy are very similar numerically\n\n## Example: classification tree (Heart dataset)\n\n- Data contain a binary outcome HD (heart disease Y or N based on angiographic test) for 303 patients who presented with chest pain\n- 13 predictors including Age, Sex, Chol (a cholesterol measurement), and other heart and lung function measurements\n- Cross-validation yields a tree with six terminal nodes\n\n::: {.cell}\n::: {.cell-output-display}\n![Heart data. Top: The unpruned tree. Bottom Left: Cross-validation error, training, and test error, for different sizes of the pruned tree. Bottom Right: The pruned tree corresponding to the minimal cross-validation error.](images/08_7_classif_tree_heart.png){width=100%}\n:::\n:::\n\n\n- **Comment**: Classification trees can be constructed if categorical PREDICTORS are present e.g., the first split: Thal is categorical (the 'a' in Thal:a indicates the first level of the predictor, i.e. Normal levels)\n- Additionally, notice that some of the splits yield two terminal nodes that have the same predicted value (see red box)\n- Regardless of the value of RestECG, a response value of *Yes* is predicted for those observations\n- Why is the split performed at all?\n  - Because it leads to increased node purity: all 9 of the observations corresponding to the right-hand leaf have a response value of *Yes*, whereas 7/11 of those corresponding to the left-hand leaf have a response value of *Yes*\n- Why is node purity important?\n  - Suppose that we have a test observation that belongs to the region given by that right-hand leaf. Then we can be pretty certain that its response value is *Yes*. In contrast, if a test observation belongs to the region given by the left-hand leaf, then its response value is **probably** *Yes*, but we are much less certain\n- Even though the split RestECG\\<1 does not reduce the classification error, it improves the Gini index and the entropy, which are more sensitive to node purity\n\n## Advantages/Disadvantages of decision trees\n\n- Trees can be displayed graphically and are **very easy to explain** to people\n- They mirror human decision-making\n- Can handle qualitative predictors without the need for dummy variables\n\nbut,\n\n- They do not have the same level of predictive accuracy\n- Can be very non-robust (i.e., a small change in the data can cause large change in the final estimated tree)\n- To improve performance, we can use an **ensemble** method, which combines many simple 'buidling blocks' (i.e., regression or classification trees) to obtain a single and potentially very powerful model\n- **ensemble** methods include: bagging, random forests, boosting, and Bayesian additive regression trees\n\n## Bagging\n\n- Also known as **bootstrap aggregation** is a general-purpose procedure for reducing the variance of a statistical learning method\n- It's useful and frequently used in the context of decision trees\n- Recall that given a set of $n$ independent observations $Z_1,..., Z_n$, each with variance $\\sigma^2$, the variance of the mean $\\bar{Z}$ of the observations is given by $\\sigma^2/n$\n- So, **averaging a set of observations** reduces variance\n- But, this is not practical because we generally do not have access to multiple training sets!\n- What can we do?\n\n- Cue the bootstrap, i.e., take repeated samples from the single training set\n- Generate $B$ different bootstrapped training data set\n- Then train our method on the $b$th bootstrapped training set to get $\\hat{f}^{*b}$, the prediction at a point x\n- Average all the predictions to obtain $$\\hat{f}_{bag}(x) = \\frac{1}{B}\\sum_{b=1}^B\\hat{f}^{*b}(x)$$\n- In the case of classification trees:\n  - for each test observation:\n    - record the class predicted by each of the $B$ trees\n    - take a **majority vote**: the overall prediction is the most commonly occurring class among the $B$ predictions\n\n**Comment**: The number of trees $B$ is not a critical parameter with bagging - a large $B$ will not lead to overfitting\n\n## Out-of-bag error estimation\n\n- But how do we estimate the test error of a bagged model?\n- It's pretty straightforward:\n  1. Because trees are repeatedly fit to bootstrapped subsets of observations, on average each bagged tree uses about 2/3 of the observations\n  2. The leftover 1/3 not used to fit a given bagged tree are called **out-of-bag** (OOB) observations\n  3. We can predict the response for the $i$th observation using each of the trees in which that observation was OOB. Gives around B/3 predictions for the $i$th observation (which we then average)\n  4. This estimate is essentially the LOO cross-validation error for bagging (if $B$ is large)\n\n## Variable importance measures\n\n- Bagging results in improved accuracy over prediction using a single tre\n- But, it can be difficult to interpret the resulting model:\n  - we can't represent the statistical learning procedure using a single tree\n  - it's not clear which variables are most important to the procedure (i.e., we have many trees each of which may give a differing view on the importance of a given predictor)\n- So, which predictors are important?\n  - An overall summary of the importance of each predictor can be achieved by recording how much the average $RSS$ or Gini index **improves (or decreases)** when each tree is split over a given predictor (averaged over all $B$ trees)\n    - a large value = important predictor\n\n\n::: {.cell}\n::: {.cell-output-display}\n![A variable importance plot for the Heart data. Variable importance is computed using the mean decrease in Gini index, and expressed relative to the maximum.](images/08_8_var_importance.png){width=100%}\n:::\n:::\n\n\n## Random forests\n\n- A problem with bagging is that bagged trees may be **highly similar** to each other.\n- For example, if there is a strong predictor in the data set, most of the bagged trees will **use this strong predictor** in the top split so that\n  - the trees will look quite similar\n  - predictions from the bagged trees will be highly correlated\n- Averaging many highly correlated quantities does not lead to as large a reduction in variance as averaging many uncorrelated quantities\n\n## Random forests: advantages over bagging\n\n- Random forests overcome this problem by forcing each split to consider only a **subset** of the predictors (typically a random sample $m \\approx \\sqrt{p}$)\n- Thus at each split, the algorithm is NOT ALLOWED to consider a majority of the available predictors (essentially $(p - m)/p$ of the splits will not even consider the strong predictor, giving other predictors a chance)\n- This *decorrelates* the trees and makes the average of the resulting trees less variable (more reliable)\n- Only difference between bagging and random forests is the choice of predictor subset size $m$ at each split: if a random forest is built using $m = p$ that's just bagging\n- For both, we build a number of decision trees on bootstrapped training samples\n\n## Example: Random forests versus bagging (gene expression data)\n\n- High-dimensional biological data set: contains gene expression measurements of 4,718 genes measured on tissue samples from 349 patients\n- Each of the patient samples has a qualitative label with 15 different levels: *Normal* or one of 14 different cancer types\n- Want to predict cancer type based on the 500 genes that have the largest variance in the training set\n- Randomly divided the observations into training/test and applied random forests (or bagging) to the training set for 3 different values of $m$ (the number of predictors available at each split)\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Results from random forests for the 15-class gene expression data set with p = 500 predictors. The test error is displayed as a function of the number of trees. Random forests (m < p) lead to a slight improvement over bagging (m = p). A single classification tree has an error rate of 45.7%.](images/08_9_rand_forest_gene_exp.png){width=100%}\n:::\n:::\n\n\n## Boosting\n\n- Yet another approach to improve prediction accuracy from a decision tree\n- Can also be applied to many statistical learning methods for regression or classification\n- Recall that in bagging each tree is built on a bootstrap training data set\n- In boosting, each tree is grown sequentially using information from previously grown trees:\n  - given the current model, we fit a decision tree to the residuals of the model (rather than the outcome *Y*) as the response\n  - we then add this new decision tree into the fitted function (model) in order to update the residuals\n  - Why? this way each tree is built on information that the previous trees were unable to 'catch'\n\n## Boosting algorithm\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_10_boosting_algorithm.png){width=100%}\n:::\n:::\n\n\nwhere:\n\n$\\hat{f}(x)$ is the decision tree (model)\n\n$r$ = residuals\n\n$d$ = number of splits in each tree (controls the complexity of the boosted ensemble)\n\n$\\lambda$ = shrinkage parameter (a small positive number that controls the rate at which boosting learns; typically 0.01 or 0.001 but right choice can depend on the problem)\n\n- Each of the trees can be small, with just a few terminal nodes (determined by $d$)\n- By fitting small trees to the residuals, we slowly improve our model ($\\hat{f}$) in areas where it doesn't perform well\n- The shrinkage parameter $\\lambda$ slows the process down further, allowing more and different shaped trees to 'attack' the residuals\n- Unlike bagging and random forests, boosting can OVERFIT if $B$ is too large. $B$ is selected via cross-validation\n\n## Example: Boosting versus random forests\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Results from performing boosting and random forests on the 15-class gene expression data set in order to predict cancer versus normal. The test error is displayed as a function of the number of trees. For the two boosted models, lambda = 0.01. Depth-1 trees slightly outperform depth-2 trees, and both outperform the random forest, although the standard errors are around 0.02, making none of these differences significant. The test error rate for a single tree is 24 %.](images/08_11_boosting_gene_exp_data.png){width=100%}\n:::\n:::\n\n\n- Notice that because the growth of a particular tree takes into account the other trees that have already been grown, smaller trees are typically sufficient in boosting (versus random forests)\n- Random forests and boosting are among the state-of-the-art methods for supervised learning (but, their results can be difficult to interpret)\n\n## Bayesian additive regression trees (BART)\n\n- Recall that in bagging and random forests, each tree is built on a **random sample of data and/or predictors** and each tree is built **independently** of the others\n- BART is related to both - what is new is HOW the new trees are generated\n- **NOTE**: only BART for regression is described in the book\n\n## BART notation\n\n- Let $K$ be the total **number of regression trees** and\n- $B$ be the **number of iterations** the BART algorithm will run for\n- Let $\\hat{f}^b_k(x)$ be the **prediction** at $x$ for the $k$th regression tree used in the $b$th iteration of the BART algorithm\n- At the end of each iteration, the $K$ trees from that iteration will be summed:\n\n\n$$\\hat{f}^b(x) = \\sum_{k=1}^{K}\\hat{f}^b_k(x)$$ for $b=1,...,B$\n\n## BART algorithm\n\n- In the first iteration of the BART algorithm, all $K$ trees are initialized to have 1 root node, with $\\hat{f}^1_k(x) = \\frac{1}{nK}\\sum_{i=1}^{n}y_i$\n  - i.e., the mean of the response values divided by the total number of trees\n- Thus, for the first iteration ($b = 1$), the prediction for all $K$ trees is just the mean of the response\n\n$\\hat{f}^1(x) = \\sum_{k=1}^K\\hat{f}^1_k(x) = \\sum_{k=1}^K\\frac{1}{nK}\\sum_{i=1}^{n}y_i = \\frac{1}{n}\\sum_{i=1}^{n}y_i$\n\n## BART algorithm: iteration 2 and on\n\n- In subsequent iterations, BART updates each of the $K$ trees one at a time\n- In the $b$th iteration to update the $k$th tree, we subtract from each response value the predictions from all but the $k$th tree, to obtain a partial residual:\n\n$r_i = y_i - \\sum_{k'<k}\\hat{f}^b_{k'}(x_i) - \\sum_{k'>k}\\hat{f}^{b-1}_{k'}(x_i)$\n\nfor the $i$th observation, $i = 1, …, n$\n\n- Rather than fitting a new tree to this partial residual, BART chooses a perturbation to the tree from a previous iteration $\\hat{f}^{b-1}_{k}$ favoring perturbations that improve the fit to the partial residual\n- To perturb trees:\n  - change the structure of the tree by adding/pruning branches\n  - change the prediction in each terminal node of the tree\n- The output of BART is a collection of prediction models:\n\n$\\hat{f}^b(x) = \\sum_{k=1}^{K}\\hat{f}^b_k(x)$\n\nfor $b = 1, 2,…, B$\n\n## BART algorithm: figure\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/08_12_bart_algorithm.png){width=100%}\n:::\n:::\n\n- **Comment**: the first few prediction models obtained in the earlier iterations (known as the $burn-in$ period; denoted by $L$) are typically thrown away since they tend to not provide very good results, like you throw away the first pancake of the batch\n\n## BART: additional details\n\n- A key element of BART is that a fresh tree is NOT fit to the current partial residual: instead, we improve the fit to the current partial residual by slightly modifying the tree obtained in the previous iteration (Step 3(a)ii)\n- This guards against overfitting since it limits how \"hard\" the data is fit in each iteration\n- Additionally, the individual trees are typically pretty small\n- BART, as the name suggests, can be viewed as a *Bayesian* approach to fitting an ensemble of trees:\n  - each time a tree is randomly perturbed to fit the residuals = drawing a new tree from a *posterior* distribution\n\n## To apply BART:\n\n- We must select the number of trees $K$, the number of iterations $B$ and the number of burn-in iterations $L$\n- Typically, large values are chosen for $B$ and $K$ and a moderate value for $L$: e.g. $K$ = 200, $B$ = 1,000 and $L$ = 100\n- BART has been shown to have impressive out-of-box performance - i.e., it performs well with minimal tuning\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
\ No newline at end of file
diff --git a/data/Hitters.csv b/data/Hitters.csv
new file mode 100644
index 0000000..cf86793
--- /dev/null
+++ b/data/Hitters.csv
@@ -0,0 +1,323 @@
+"Names","AtBat","Hits","HmRun","Runs","RBI","Walks","Years","CAtBat","CHits","CHmRun","CRuns","CRBI","CWalks","League","Division","PutOuts","Assists","Errors","Salary","NewLeague"
+"-Andy Allanson",293,66,1,30,29,14,1,293,66,1,30,29,14,"A","E",446,33,20,NA,"A"
+"-Alan Ashby",315,81,7,24,38,39,14,3449,835,69,321,414,375,"N","W",632,43,10,475,"N"
+"-Alvin Davis",479,130,18,66,72,76,3,1624,457,63,224,266,263,"A","W",880,82,14,480,"A"
+"-Andre Dawson",496,141,20,65,78,37,11,5628,1575,225,828,838,354,"N","E",200,11,3,500,"N"
+"-Andres Galarraga",321,87,10,39,42,30,2,396,101,12,48,46,33,"N","E",805,40,4,91.5,"N"
+"-Alfredo Griffin",594,169,4,74,51,35,11,4408,1133,19,501,336,194,"A","W",282,421,25,750,"A"
+"-Al Newman",185,37,1,23,8,21,2,214,42,1,30,9,24,"N","E",76,127,7,70,"A"
+"-Argenis Salazar",298,73,0,24,24,7,3,509,108,0,41,37,12,"A","W",121,283,9,100,"A"
+"-Andres Thomas",323,81,6,26,32,8,2,341,86,6,32,34,8,"N","W",143,290,19,75,"N"
+"-Andre Thornton",401,92,17,49,66,65,13,5206,1332,253,784,890,866,"A","E",0,0,0,1100,"A"
+"-Alan Trammell",574,159,21,107,75,59,10,4631,1300,90,702,504,488,"A","E",238,445,22,517.143,"A"
+"-Alex Trevino",202,53,4,31,26,27,9,1876,467,15,192,186,161,"N","W",304,45,11,512.5,"N"
+"-Andy VanSlyke",418,113,13,48,61,47,4,1512,392,41,205,204,203,"N","E",211,11,7,550,"N"
+"-Alan Wiggins",239,60,0,30,11,22,6,1941,510,4,309,103,207,"A","E",121,151,6,700,"A"
+"-Bill Almon",196,43,7,29,27,30,13,3231,825,36,376,290,238,"N","E",80,45,8,240,"N"
+"-Billy Beane",183,39,3,20,15,11,3,201,42,3,20,16,11,"A","W",118,0,0,NA,"A"
+"-Buddy Bell",568,158,20,89,75,73,15,8068,2273,177,1045,993,732,"N","W",105,290,10,775,"N"
+"-Buddy Biancalana",190,46,2,24,8,15,5,479,102,5,65,23,39,"A","W",102,177,16,175,"A"
+"-Bruce Bochte",407,104,6,57,43,65,12,5233,1478,100,643,658,653,"A","W",912,88,9,NA,"A"
+"-Bruce Bochy",127,32,8,16,22,14,8,727,180,24,67,82,56,"N","W",202,22,2,135,"N"
+"-Barry Bonds",413,92,16,72,48,65,1,413,92,16,72,48,65,"N","E",280,9,5,100,"N"
+"-Bobby Bonilla",426,109,3,55,43,62,1,426,109,3,55,43,62,"A","W",361,22,2,115,"N"
+"-Bob Boone",22,10,1,4,2,1,6,84,26,2,9,9,3,"A","W",812,84,11,NA,"A"
+"-Bob Brenly",472,116,16,60,62,74,6,1924,489,67,242,251,240,"N","W",518,55,3,600,"N"
+"-Bill Buckner",629,168,18,73,102,40,18,8424,2464,164,1008,1072,402,"A","E",1067,157,14,776.667,"A"
+"-Brett Butler",587,163,4,92,51,70,6,2695,747,17,442,198,317,"A","E",434,9,3,765,"A"
+"-Bob Dernier",324,73,4,32,18,22,7,1931,491,13,291,108,180,"N","E",222,3,3,708.333,"N"
+"-Bo Diaz",474,129,10,50,56,40,10,2331,604,61,246,327,166,"N","W",732,83,13,750,"N"
+"-Bill Doran",550,152,6,92,37,81,5,2308,633,32,349,182,308,"N","W",262,329,16,625,"N"
+"-Brian Downing",513,137,20,90,95,90,14,5201,1382,166,763,734,784,"A","W",267,5,3,900,"A"
+"-Bobby Grich",313,84,9,42,30,39,17,6890,1833,224,1033,864,1087,"A","W",127,221,7,NA,"A"
+"-Billy Hatcher",419,108,6,55,36,22,3,591,149,8,80,46,31,"N","W",226,7,4,110,"N"
+"-Bob Horner",517,141,27,70,87,52,9,3571,994,215,545,652,337,"N","W",1378,102,8,NA,"N"
+"-Brook Jacoby",583,168,17,83,80,56,5,1646,452,44,219,208,136,"A","E",109,292,25,612.5,"A"
+"-Bob Kearney",204,49,6,23,25,12,7,1309,308,27,126,132,66,"A","W",419,46,5,300,"A"
+"-Bill Madlock",379,106,10,38,60,30,14,6207,1906,146,859,803,571,"N","W",72,170,24,850,"N"
+"-Bobby Meacham",161,36,0,19,10,17,4,1053,244,3,156,86,107,"A","E",70,149,12,NA,"A"
+"-Bob Melvin",268,60,5,24,25,15,2,350,78,5,34,29,18,"N","W",442,59,6,90,"N"
+"-Ben Oglivie",346,98,5,31,53,30,16,5913,1615,235,784,901,560,"A","E",0,0,0,NA,"A"
+"-Bip Roberts",241,61,1,34,12,14,1,241,61,1,34,12,14,"N","W",166,172,10,NA,"N"
+"-BillyJo Robidoux",181,41,1,15,21,33,2,232,50,4,20,29,45,"A","E",326,29,5,67.5,"A"
+"-Bill Russell",216,54,0,21,18,15,18,7318,1926,46,796,627,483,"N","W",103,84,5,NA,"N"
+"-Billy Sample",200,57,6,23,14,14,9,2516,684,46,371,230,195,"N","W",69,1,1,NA,"N"
+"-Bill Schroeder",217,46,7,32,19,9,4,694,160,32,86,76,32,"A","E",307,25,1,180,"A"
+"-Butch Wynegar",194,40,7,19,29,30,11,4183,1069,64,486,493,608,"A","E",325,22,2,NA,"A"
+"-Chris Bando",254,68,2,28,26,22,6,999,236,21,108,117,118,"A","E",359,30,4,305,"A"
+"-Chris Brown",416,132,7,57,49,33,3,932,273,24,113,121,80,"N","W",73,177,18,215,"N"
+"-Carmen Castillo",205,57,8,34,32,9,5,756,192,32,117,107,51,"A","E",58,4,4,247.5,"A"
+"-Cecil Cooper",542,140,12,46,75,41,16,7099,2130,235,987,1089,431,"A","E",697,61,9,NA,"A"
+"-Chili Davis",526,146,13,71,70,84,6,2648,715,77,352,342,289,"N","W",303,9,9,815,"N"
+"-Carlton Fisk",457,101,14,42,63,22,17,6521,1767,281,1003,977,619,"A","W",389,39,4,875,"A"
+"-Curt Ford",214,53,2,30,29,23,2,226,59,2,32,32,27,"N","E",109,7,3,70,"N"
+"-Cliff Johnson",19,7,0,1,2,1,4,41,13,1,3,4,4,"A","E",0,0,0,NA,"A"
+"-Carney Lansford",591,168,19,80,72,39,9,4478,1307,113,634,563,319,"A","W",67,147,4,1200,"A"
+"-Chet Lemon",403,101,12,45,53,39,12,5150,1429,166,747,666,526,"A","E",316,6,5,675,"A"
+"-Candy Maldonado",405,102,18,49,85,20,6,950,231,29,99,138,64,"N","W",161,10,3,415,"N"
+"-Carmelo Martinez",244,58,9,28,25,35,4,1335,333,49,164,179,194,"N","W",142,14,2,340,"N"
+"-Charlie Moore",235,61,3,24,39,21,14,3926,1029,35,441,401,333,"A","E",425,43,4,NA,"A"
+"-Craig Reynolds",313,78,6,32,41,12,12,3742,968,35,409,321,170,"N","W",106,206,7,416.667,"N"
+"-Cal Ripken",627,177,25,98,81,70,6,3210,927,133,529,472,313,"A","E",240,482,13,1350,"A"
+"-Cory Snyder",416,113,24,58,69,16,1,416,113,24,58,69,16,"A","E",203,70,10,90,"A"
+"-Chris Speier",155,44,6,21,23,15,16,6631,1634,98,698,661,777,"N","E",53,88,3,275,"N"
+"-Curt Wilkerson",236,56,0,27,15,11,4,1115,270,1,116,64,57,"A","W",125,199,13,230,"A"
+"-Dave Anderson",216,53,1,31,15,22,4,926,210,9,118,69,114,"N","W",73,152,11,225,"N"
+"-Doug Baker",24,3,0,1,0,2,3,159,28,0,20,12,9,"A","W",80,4,0,NA,"A"
+"-Don Baylor",585,139,31,93,94,62,17,7546,1982,315,1141,1179,727,"A","E",0,0,0,950,"A"
+"-Dann Bilardello",191,37,4,12,17,14,4,773,163,16,61,74,52,"N","E",391,38,8,NA,"N"
+"-Daryl Boston",199,53,5,29,22,21,3,514,120,8,57,40,39,"A","W",152,3,5,75,"A"
+"-Darnell Coles",521,142,20,67,86,45,4,815,205,22,99,103,78,"A","E",107,242,23,105,"A"
+"-Dave Collins",419,113,1,44,27,44,12,4484,1231,32,612,344,422,"A","E",211,2,1,NA,"A"
+"-Dave Concepcion",311,81,3,42,30,26,17,8247,2198,100,950,909,690,"N","W",153,223,10,320,"N"
+"-Darren Daulton",138,31,8,18,21,38,3,244,53,12,33,32,55,"N","E",244,21,4,NA,"N"
+"-Doug DeCinces",512,131,26,69,96,52,14,5347,1397,221,712,815,548,"A","W",119,216,12,850,"A"
+"-Darrell Evans",507,122,29,78,85,91,18,7761,1947,347,1175,1152,1380,"A","E",808,108,2,535,"A"
+"-Dwight Evans",529,137,26,86,97,97,15,6661,1785,291,1082,949,989,"A","E",280,10,5,933.333,"A"
+"-Damaso Garcia",424,119,6,57,46,13,9,3651,1046,32,461,301,112,"A","E",224,286,8,850,"N"
+"-Dan Gladden",351,97,4,55,29,39,4,1258,353,16,196,110,117,"N","W",226,7,3,210,"A"
+"-Danny Heep",195,55,5,24,33,30,8,1313,338,25,144,149,153,"N","E",83,2,1,NA,"N"
+"-Dave Henderson",388,103,15,59,47,39,6,2174,555,80,285,274,186,"A","W",182,9,4,325,"A"
+"-Donnie Hill",339,96,4,37,29,23,4,1064,290,11,123,108,55,"A","W",104,213,9,275,"A"
+"-Dave Kingman",561,118,35,70,94,33,16,6677,1575,442,901,1210,608,"A","W",463,32,8,NA,"A"
+"-Davey Lopes",255,70,7,49,35,43,15,6311,1661,154,1019,608,820,"N","E",51,54,8,450,"N"
+"-Don Mattingly",677,238,31,117,113,53,5,2223,737,93,349,401,171,"A","E",1377,100,6,1975,"A"
+"-Darryl Motley",227,46,7,23,20,12,5,1325,324,44,156,158,67,"A","W",92,2,2,NA,"A"
+"-Dale Murphy",614,163,29,89,83,75,11,5017,1388,266,813,822,617,"N","W",303,6,6,1900,"N"
+"-Dwayne Murphy",329,83,9,50,39,56,9,3828,948,145,575,528,635,"A","W",276,6,2,600,"A"
+"-Dave Parker",637,174,31,89,116,56,14,6727,2024,247,978,1093,495,"N","W",278,9,9,1041.667,"N"
+"-Dan Pasqua",280,82,16,44,45,47,2,428,113,25,61,70,63,"A","E",148,4,2,110,"A"
+"-Darrell Porter",155,41,12,21,29,22,16,5409,1338,181,746,805,875,"A","W",165,9,1,260,"A"
+"-Dick Schofield",458,114,13,67,57,48,4,1350,298,28,160,123,122,"A","W",246,389,18,475,"A"
+"-Don Slaught",314,83,13,39,46,16,5,1457,405,28,156,159,76,"A","W",533,40,4,431.5,"A"
+"-Darryl Strawberry",475,123,27,76,93,72,4,1810,471,108,292,343,267,"N","E",226,10,6,1220,"N"
+"-Dale Sveum",317,78,7,35,35,32,1,317,78,7,35,35,32,"A","E",45,122,26,70,"A"
+"-Danny Tartabull",511,138,25,76,96,61,3,592,164,28,87,110,71,"A","W",157,7,8,145,"A"
+"-Dickie Thon",278,69,3,24,21,29,8,2079,565,32,258,192,162,"N","W",142,210,10,NA,"N"
+"-Denny Walling",382,119,13,54,58,36,12,2133,594,41,287,294,227,"N","W",59,156,9,595,"N"
+"-Dave Winfield",565,148,24,90,104,77,14,7287,2083,305,1135,1234,791,"A","E",292,9,5,1861.46,"A"
+"-Enos Cabell",277,71,2,27,29,14,15,5952,1647,60,753,596,259,"N","W",360,32,5,NA,"N"
+"-Eric Davis",415,115,27,97,71,68,3,711,184,45,156,119,99,"N","W",274,2,7,300,"N"
+"-Eddie Milner",424,110,15,70,47,36,7,2130,544,38,335,174,258,"N","W",292,6,3,490,"N"
+"-Eddie Murray",495,151,17,61,84,78,10,5624,1679,275,884,1015,709,"A","E",1045,88,13,2460,"A"
+"-Ernest Riles",524,132,9,69,47,54,2,972,260,14,123,92,90,"A","E",212,327,20,NA,"A"
+"-Ed Romero",233,49,2,41,23,18,8,1350,336,7,166,122,106,"A","E",102,132,10,375,"A"
+"-Ernie Whitt",395,106,16,48,56,35,10,2303,571,86,266,323,248,"A","E",709,41,7,NA,"A"
+"-Fred Lynn",397,114,23,67,67,53,13,5589,1632,241,906,926,716,"A","E",244,2,4,NA,"A"
+"-Floyd Rayford",210,37,8,15,19,15,6,994,244,36,107,114,53,"A","E",40,115,15,NA,"A"
+"-Franklin Stubbs",420,95,23,55,58,37,3,646,139,31,77,77,61,"N","W",206,10,7,NA,"N"
+"-Frank White",566,154,22,76,84,43,14,6100,1583,131,743,693,300,"A","W",316,439,10,750,"A"
+"-George Bell",641,198,31,101,108,41,5,2129,610,92,297,319,117,"A","E",269,17,10,1175,"A"
+"-Glenn Braggs",215,51,4,19,18,11,1,215,51,4,19,18,11,"A","E",116,5,12,70,"A"
+"-George Brett",441,128,16,70,73,80,14,6675,2095,209,1072,1050,695,"A","W",97,218,16,1500,"A"
+"-Greg Brock",325,76,16,33,52,37,5,1506,351,71,195,219,214,"N","W",726,87,3,385,"A"
+"-Gary Carter",490,125,24,81,105,62,13,6063,1646,271,847,999,680,"N","E",869,62,8,1925.571,"N"
+"-Glenn Davis",574,152,31,91,101,64,3,985,260,53,148,173,95,"N","W",1253,111,11,215,"N"
+"-George Foster",284,64,14,30,42,24,18,7023,1925,348,986,1239,666,"N","E",96,4,4,NA,"N"
+"-Gary Gaetti",596,171,34,91,108,52,6,2862,728,107,361,401,224,"A","W",118,334,21,900,"A"
+"-Greg Gagne",472,118,12,63,54,30,4,793,187,14,102,80,50,"A","W",228,377,26,155,"A"
+"-George Hendrick",283,77,14,45,47,26,16,6840,1910,259,915,1067,546,"A","W",144,6,5,700,"A"
+"-Glenn Hubbard",408,94,4,42,36,66,9,3573,866,59,429,365,410,"N","W",282,487,19,535,"N"
+"-Garth Iorg",327,85,3,30,44,20,8,2140,568,16,216,208,93,"A","E",91,185,12,362.5,"A"
+"-Gary Matthews",370,96,21,49,46,60,15,6986,1972,231,1070,955,921,"N","E",137,5,9,733.333,"N"
+"-Graig Nettles",354,77,16,36,55,41,20,8716,2172,384,1172,1267,1057,"N","W",83,174,16,200,"N"
+"-Gary Pettis",539,139,5,93,58,69,5,1469,369,12,247,126,198,"A","W",462,9,7,400,"A"
+"-Gary Redus",340,84,11,62,33,47,5,1516,376,42,284,141,219,"N","E",185,8,4,400,"A"
+"-Garry Templeton",510,126,2,42,44,35,11,5562,1578,44,703,519,256,"N","W",207,358,20,737.5,"N"
+"-Gorman Thomas",315,59,16,45,36,58,13,4677,1051,268,681,782,697,"A","W",0,0,0,NA,"A"
+"-Greg Walker",282,78,13,37,51,29,5,1649,453,73,211,280,138,"A","W",670,57,5,500,"A"
+"-Gary Ward",380,120,5,54,51,31,8,3118,900,92,444,419,240,"A","W",237,8,1,600,"A"
+"-Glenn Wilson",584,158,15,70,84,42,5,2358,636,58,265,316,134,"N","E",331,20,4,662.5,"N"
+"-Harold Baines",570,169,21,72,88,38,7,3754,1077,140,492,589,263,"A","W",295,15,5,950,"A"
+"-Hubie Brooks",306,104,14,50,58,25,7,2954,822,55,313,377,187,"N","E",116,222,15,750,"N"
+"-Howard Johnson",220,54,10,30,39,31,5,1185,299,40,145,154,128,"N","E",50,136,20,297.5,"N"
+"-Hal McRae",278,70,7,22,37,18,18,7186,2081,190,935,1088,643,"A","W",0,0,0,325,"A"
+"-Harold Reynolds",445,99,1,46,24,29,4,618,129,1,72,31,48,"A","W",278,415,16,87.5,"A"
+"-Harry Spilman",143,39,5,18,30,15,9,639,151,16,80,97,61,"N","W",138,15,1,175,"N"
+"-Herm Winningham",185,40,4,23,11,18,3,524,125,7,58,37,47,"N","E",97,2,2,90,"N"
+"-Jesse Barfield",589,170,40,107,108,69,6,2325,634,128,371,376,238,"A","E",368,20,3,1237.5,"A"
+"-Juan Beniquez",343,103,6,48,36,40,15,4338,1193,70,581,421,325,"A","E",211,56,13,430,"A"
+"-Juan Bonilla",284,69,1,33,18,25,5,1407,361,6,139,98,111,"A","E",122,140,5,NA,"N"
+"-John Cangelosi",438,103,2,65,32,71,2,440,103,2,67,32,71,"A","W",276,7,9,100,"N"
+"-Jose Canseco",600,144,33,85,117,65,2,696,173,38,101,130,69,"A","W",319,4,14,165,"A"
+"-Joe Carter",663,200,29,108,121,32,4,1447,404,57,210,222,68,"A","E",241,8,6,250,"A"
+"-Jack Clark",232,55,9,34,23,45,12,4405,1213,194,702,705,625,"N","E",623,35,3,1300,"N"
+"-Jose Cruz",479,133,10,48,72,55,17,7472,2147,153,980,1032,854,"N","W",237,5,4,773.333,"N"
+"-Julio Cruz",209,45,0,38,19,42,10,3859,916,23,557,279,478,"A","W",132,205,5,NA,"A"
+"-Jody Davis",528,132,21,61,74,41,6,2641,671,97,273,383,226,"N","E",885,105,8,1008.333,"N"
+"-Jim Dwyer",160,39,8,18,31,22,14,2128,543,56,304,268,298,"A","E",33,3,0,275,"A"
+"-Julio Franco",599,183,10,80,74,32,5,2482,715,27,330,326,158,"A","E",231,374,18,775,"A"
+"-Jim Gantner",497,136,7,58,38,26,11,3871,1066,40,450,367,241,"A","E",304,347,10,850,"A"
+"-Johnny Grubb",210,70,13,32,51,28,15,4040,1130,97,544,462,551,"A","E",0,0,0,365,"A"
+"-Jerry Hairston",225,61,5,32,26,26,11,1568,408,25,202,185,257,"A","W",132,9,0,NA,"A"
+"-Jack Howell",151,41,4,26,21,19,2,288,68,9,45,39,35,"A","W",28,56,2,95,"A"
+"-John Kruk",278,86,4,33,38,45,1,278,86,4,33,38,45,"N","W",102,4,2,110,"N"
+"-Jeffrey Leonard",341,95,6,48,42,20,10,2964,808,81,379,428,221,"N","W",158,4,5,100,"N"
+"-Jim Morrison",537,147,23,58,88,47,10,2744,730,97,302,351,174,"N","E",92,257,20,277.5,"N"
+"-John Moses",399,102,3,56,34,34,5,670,167,4,89,48,54,"A","W",211,9,3,80,"A"
+"-Jerry Mumphrey",309,94,5,37,32,26,13,4618,1330,57,616,522,436,"N","E",161,3,3,600,"N"
+"-Joe Orsulak",401,100,2,60,19,28,4,876,238,2,126,44,55,"N","E",193,11,4,NA,"N"
+"-Jorge Orta",336,93,9,35,46,23,15,5779,1610,128,730,741,497,"A","W",0,0,0,NA,"A"
+"-Jim Presley",616,163,27,83,107,32,3,1437,377,65,181,227,82,"A","W",110,308,15,200,"A"
+"-Jamie Quirk",219,47,8,24,26,17,12,1188,286,23,100,125,63,"A","W",260,58,4,NA,"A"
+"-Johnny Ray",579,174,7,67,78,58,6,3053,880,32,366,337,218,"N","E",280,479,5,657,"N"
+"-Jeff Reed",165,39,2,13,9,16,3,196,44,2,18,10,18,"A","W",332,19,2,75,"N"
+"-Jim Rice",618,200,20,98,110,62,13,7127,2163,351,1104,1289,564,"A","E",330,16,8,2412.5,"A"
+"-Jerry Royster",257,66,5,31,26,32,14,3910,979,33,518,324,382,"N","W",87,166,14,250,"A"
+"-John Russell",315,76,13,35,60,25,3,630,151,24,68,94,55,"N","E",498,39,13,155,"N"
+"-Juan Samuel",591,157,16,90,78,26,4,2020,541,52,310,226,91,"N","E",290,440,25,640,"N"
+"-John Shelby",404,92,11,54,49,18,6,1354,325,30,188,135,63,"A","E",222,5,5,300,"A"
+"-Joel Skinner",315,73,5,23,37,16,4,450,108,6,38,46,28,"A","W",227,15,3,110,"A"
+"-Jeff Stone",249,69,6,32,19,20,4,702,209,10,97,48,44,"N","E",103,8,2,NA,"N"
+"-Jim Sundberg",429,91,12,41,42,57,13,5590,1397,83,578,579,644,"A","W",686,46,4,825,"N"
+"-Jim Traber",212,54,13,28,44,18,2,233,59,13,31,46,20,"A","E",243,23,5,NA,"A"
+"-Jose Uribe",453,101,3,46,43,61,3,948,218,6,96,72,91,"N","W",249,444,16,195,"N"
+"-Jerry Willard",161,43,4,17,26,22,3,707,179,21,77,99,76,"A","W",300,12,2,NA,"A"
+"-Joel Youngblood",184,47,5,20,28,18,11,3327,890,74,419,382,304,"N","W",49,2,0,450,"N"
+"-Kevin Bass",591,184,20,83,79,38,5,1689,462,40,219,195,82,"N","W",303,12,5,630,"N"
+"-Kal Daniels",181,58,6,34,23,22,1,181,58,6,34,23,22,"N","W",88,0,3,86.5,"N"
+"-Kirk Gibson",441,118,28,84,86,68,8,2723,750,126,433,420,309,"A","E",190,2,2,1300,"A"
+"-Ken Griffey",490,150,21,69,58,35,14,6126,1839,121,983,707,600,"A","E",96,5,3,1000,"N"
+"-Keith Hernandez",551,171,13,94,83,94,13,6090,1840,128,969,900,917,"N","E",1199,149,5,1800,"N"
+"-Kent Hrbek",550,147,29,85,91,71,6,2816,815,117,405,474,319,"A","W",1218,104,10,1310,"A"
+"-Ken Landreaux",283,74,4,34,29,22,10,3919,1062,85,505,456,283,"N","W",145,5,7,737.5,"N"
+"-Kevin McReynolds",560,161,26,89,96,66,4,1789,470,65,233,260,155,"N","W",332,9,8,625,"N"
+"-Kevin Mitchell",328,91,12,51,43,33,2,342,94,12,51,44,33,"N","E",145,59,8,125,"N"
+"-Keith Moreland",586,159,12,72,79,53,9,3082,880,83,363,477,295,"N","E",181,13,4,1043.333,"N"
+"-Ken Oberkfell",503,136,5,62,48,83,10,3423,970,20,408,303,414,"N","W",65,258,8,725,"N"
+"-Ken Phelps",344,85,24,69,64,88,7,911,214,64,150,156,187,"A","W",0,0,0,300,"A"
+"-Kirby Puckett",680,223,31,119,96,34,3,1928,587,35,262,201,91,"A","W",429,8,6,365,"A"
+"-Kurt Stillwell",279,64,0,31,26,30,1,279,64,0,31,26,30,"N","W",107,205,16,75,"N"
+"-Leon Durham",484,127,20,66,65,67,7,3006,844,116,436,458,377,"N","E",1231,80,7,1183.333,"N"
+"-Len Dykstra",431,127,8,77,45,58,2,667,187,9,117,64,88,"N","E",283,8,3,202.5,"N"
+"-Larry Herndon",283,70,8,33,37,27,12,4479,1222,94,557,483,307,"A","E",156,2,2,225,"A"
+"-Lee Lacy",491,141,11,77,47,37,15,4291,1240,84,615,430,340,"A","E",239,8,2,525,"A"
+"-Len Matuszek",199,52,9,26,28,21,6,805,191,30,113,119,87,"N","W",235,22,5,265,"N"
+"-Lloyd Moseby",589,149,21,89,86,64,7,3558,928,102,513,471,351,"A","E",371,6,6,787.5,"A"
+"-Lance Parrish",327,84,22,53,62,38,10,4273,1123,212,577,700,334,"A","E",483,48,6,800,"N"
+"-Larry Parrish",464,128,28,67,94,52,13,5829,1552,210,740,840,452,"A","W",0,0,0,587.5,"A"
+"-Luis Rivera",166,34,0,20,13,17,1,166,34,0,20,13,17,"N","E",64,119,9,NA,"N"
+"-Larry Sheets",338,92,18,42,60,21,3,682,185,36,88,112,50,"A","E",0,0,0,145,"A"
+"-Lonnie Smith",508,146,8,80,44,46,9,3148,915,41,571,289,326,"A","W",245,5,9,NA,"A"
+"-Lou Whitaker",584,157,20,95,73,63,10,4704,1320,93,724,522,576,"A","E",276,421,11,420,"A"
+"-Mike Aldrete",216,54,2,27,25,33,1,216,54,2,27,25,33,"N","W",317,36,1,75,"N"
+"-Marty Barrett",625,179,4,94,60,65,5,1696,476,12,216,163,166,"A","E",303,450,14,575,"A"
+"-Mike Brown",243,53,4,18,26,27,4,853,228,23,101,110,76,"N","E",107,3,3,NA,"N"
+"-Mike Davis",489,131,19,77,55,34,7,2051,549,62,300,263,153,"A","W",310,9,9,780,"A"
+"-Mike Diaz",209,56,12,22,36,19,2,216,58,12,24,37,19,"N","E",201,6,3,90,"N"
+"-Mariano Duncan",407,93,8,47,30,30,2,969,230,14,121,69,68,"N","W",172,317,25,150,"N"
+"-Mike Easler",490,148,14,64,78,49,13,3400,1000,113,445,491,301,"A","E",0,0,0,700,"N"
+"-Mike Fitzgerald",209,59,6,20,37,27,4,884,209,14,66,106,92,"N","E",415,35,3,NA,"N"
+"-Mel Hall",442,131,18,68,77,33,6,1416,398,47,210,203,136,"A","E",233,7,7,550,"A"
+"-Mickey Hatcher",317,88,3,40,32,19,8,2543,715,28,269,270,118,"A","W",220,16,4,NA,"A"
+"-Mike Heath",288,65,8,30,36,27,9,2815,698,55,315,325,189,"N","E",259,30,10,650,"A"
+"-Mike Kingery",209,54,3,25,14,12,1,209,54,3,25,14,12,"A","W",102,6,3,68,"A"
+"-Mike LaValliere",303,71,3,18,30,36,3,344,76,3,20,36,45,"N","E",468,47,6,100,"N"
+"-Mike Marshall",330,77,19,47,53,27,6,1928,516,90,247,288,161,"N","W",149,8,6,670,"N"
+"-Mike Pagliarulo",504,120,28,71,71,54,3,1085,259,54,150,167,114,"A","E",103,283,19,175,"A"
+"-Mark Salas",258,60,8,28,33,18,3,638,170,17,80,75,36,"A","W",358,32,8,137,"A"
+"-Mike Schmidt",20,1,0,0,0,0,2,41,9,2,6,7,4,"N","E",78,220,6,2127.333,"N"
+"-Mike Scioscia",374,94,5,36,26,62,7,1968,519,26,181,199,288,"N","W",756,64,15,875,"N"
+"-Mickey Tettleton",211,43,10,26,35,39,3,498,116,14,59,55,78,"A","W",463,32,8,120,"A"
+"-Milt Thompson",299,75,6,38,23,26,3,580,160,8,71,33,44,"N","E",212,1,2,140,"N"
+"-Mitch Webster",576,167,8,89,49,57,4,822,232,19,132,83,79,"N","E",325,12,8,210,"N"
+"-Mookie Wilson",381,110,9,61,45,32,7,3015,834,40,451,249,168,"N","E",228,7,5,800,"N"
+"-Marvell Wynne",288,76,7,34,37,15,4,1644,408,16,198,120,113,"N","W",203,3,3,240,"N"
+"-Mike Young",369,93,9,43,42,49,5,1258,323,54,181,177,157,"A","E",149,1,6,350,"A"
+"-Nick Esasky",330,76,12,35,41,47,4,1367,326,55,167,198,167,"N","W",512,30,5,NA,"N"
+"-Ozzie Guillen",547,137,2,58,47,12,2,1038,271,3,129,80,24,"A","W",261,459,22,175,"A"
+"-Oddibe McDowell",572,152,18,105,49,65,2,978,249,36,168,91,101,"A","W",325,13,3,200,"A"
+"-Omar Moreno",359,84,4,46,27,21,12,4992,1257,37,699,386,387,"N","W",151,8,5,NA,"N"
+"-Ozzie Smith",514,144,0,67,54,79,9,4739,1169,13,583,374,528,"N","E",229,453,15,1940,"N"
+"-Ozzie Virgil",359,80,15,45,48,63,7,1493,359,61,176,202,175,"N","W",682,93,13,700,"N"
+"-Phil Bradley",526,163,12,88,50,77,4,1556,470,38,245,167,174,"A","W",250,11,1,750,"A"
+"-Phil Garner",313,83,9,43,41,30,14,5885,1543,104,751,714,535,"N","W",58,141,23,450,"N"
+"-Pete Incaviglia",540,135,30,82,88,55,1,540,135,30,82,88,55,"A","W",157,6,14,172,"A"
+"-Paul Molitor",437,123,9,62,55,40,9,4139,1203,79,676,390,364,"A","E",82,170,15,1260,"A"
+"-Pete O'Brien",551,160,23,86,90,87,5,2235,602,75,278,328,273,"A","W",1224,115,11,NA,"A"
+"-Pete Rose",237,52,0,15,25,30,24,14053,4256,160,2165,1314,1566,"N","W",523,43,6,750,"N"
+"-Pat Sheridan",236,56,6,41,19,21,5,1257,329,24,166,125,105,"A","E",172,1,4,190,"A"
+"-Pat Tabler",473,154,6,61,48,29,6,1966,566,29,250,252,178,"A","E",846,84,9,580,"A"
+"-Rafael Belliard",309,72,0,33,31,26,5,354,82,0,41,32,26,"N","E",117,269,12,130,"N"
+"-Rick Burleson",271,77,5,35,29,33,12,4933,1358,48,630,435,403,"A","W",62,90,3,450,"A"
+"-Randy Bush",357,96,7,50,45,39,5,1394,344,43,178,192,136,"A","W",167,2,4,300,"A"
+"-Rick Cerone",216,56,4,22,18,15,12,2796,665,43,266,304,198,"A","E",391,44,4,250,"A"
+"-Ron Cey",256,70,13,42,36,44,16,7058,1845,312,965,1128,990,"N","E",41,118,8,1050,"A"
+"-Rob Deer",466,108,33,75,86,72,3,652,142,44,102,109,102,"A","E",286,8,8,215,"A"
+"-Rick Dempsey",327,68,13,42,29,45,18,3949,939,78,438,380,466,"A","E",659,53,7,400,"A"
+"-Rich Gedman",462,119,16,49,65,37,7,2131,583,69,244,288,150,"A","E",866,65,6,NA,"A"
+"-Ron Hassey",341,110,9,45,49,46,9,2331,658,50,249,322,274,"A","E",251,9,4,560,"A"
+"-Rickey Henderson",608,160,28,130,74,89,8,4071,1182,103,862,417,708,"A","E",426,4,6,1670,"A"
+"-Reggie Jackson",419,101,18,65,58,92,20,9528,2510,548,1509,1659,1342,"A","W",0,0,0,487.5,"A"
+"-Ricky Jones",33,6,0,2,4,7,1,33,6,0,2,4,7,"A","W",205,5,4,NA,"A"
+"-Ron Kittle",376,82,21,42,60,35,5,1770,408,115,238,299,157,"A","W",0,0,0,425,"A"
+"-Ray Knight",486,145,11,51,76,40,11,3967,1102,67,410,497,284,"N","E",88,204,16,500,"A"
+"-Randy Kutcher",186,44,7,28,16,11,1,186,44,7,28,16,11,"N","W",99,3,1,NA,"N"
+"-Rudy Law",307,80,1,42,36,29,7,2421,656,18,379,198,184,"A","W",145,2,2,NA,"A"
+"-Rick Leach",246,76,5,35,39,13,6,912,234,12,102,96,80,"A","E",44,0,1,250,"A"
+"-Rick Manning",205,52,8,31,27,17,12,5134,1323,56,643,445,459,"A","E",155,3,2,400,"A"
+"-Rance Mulliniks",348,90,11,50,45,43,10,2288,614,43,295,273,269,"A","E",60,176,6,450,"A"
+"-Ron Oester",523,135,8,52,44,52,9,3368,895,39,377,284,296,"N","W",367,475,19,750,"N"
+"-Rey Quinones",312,68,2,32,22,24,1,312,68,2,32,22,24,"A","E",86,150,15,70,"A"
+"-Rafael Ramirez",496,119,8,57,33,21,7,3358,882,36,365,280,165,"N","W",155,371,29,875,"N"
+"-Ronn Reynolds",126,27,3,8,10,5,4,239,49,3,16,13,14,"N","E",190,2,9,190,"N"
+"-Ron Roenicke",275,68,5,42,42,61,6,961,238,16,128,104,172,"N","E",181,3,2,191,"N"
+"-Ryne Sandberg",627,178,14,68,76,46,6,3146,902,74,494,345,242,"N","E",309,492,5,740,"N"
+"-Rafael Santana",394,86,1,38,28,36,4,1089,267,3,94,71,76,"N","E",203,369,16,250,"N"
+"-Rick Schu",208,57,8,32,25,18,3,653,170,17,98,54,62,"N","E",42,94,13,140,"N"
+"-Ruben Sierra",382,101,16,50,55,22,1,382,101,16,50,55,22,"A","W",200,7,6,97.5,"A"
+"-Roy Smalley",459,113,20,59,57,68,12,5348,1369,155,713,660,735,"A","W",0,0,0,740,"A"
+"-Robby Thompson",549,149,7,73,47,42,1,549,149,7,73,47,42,"N","W",255,450,17,140,"N"
+"-Rob Wilfong",288,63,3,25,33,16,10,2682,667,38,315,259,204,"A","W",135,257,7,341.667,"A"
+"-Reggie Williams",303,84,4,35,32,23,2,312,87,4,39,32,23,"N","W",179,5,3,NA,"N"
+"-Robin Yount",522,163,9,82,46,62,13,7037,2019,153,1043,827,535,"A","E",352,9,1,1000,"A"
+"-Steve Balboni",512,117,29,54,88,43,6,1750,412,100,204,276,155,"A","W",1236,98,18,100,"A"
+"-Scott Bradley",220,66,5,20,28,13,3,290,80,5,27,31,15,"A","W",281,21,3,90,"A"
+"-Sid Bream",522,140,16,73,77,60,4,730,185,22,93,106,86,"N","E",1320,166,17,200,"N"
+"-Steve Buechele",461,112,18,54,54,35,2,680,160,24,76,75,49,"A","W",111,226,11,135,"A"
+"-Shawon Dunston",581,145,17,66,68,21,2,831,210,21,106,86,40,"N","E",320,465,32,155,"N"
+"-Scott Fletcher",530,159,3,82,50,47,6,1619,426,11,218,149,163,"A","W",196,354,15,475,"A"
+"-Steve Garvey",557,142,21,58,81,23,18,8759,2583,271,1138,1299,478,"N","W",1160,53,7,1450,"N"
+"-Steve Jeltz",439,96,0,44,36,65,4,711,148,1,68,56,99,"N","E",229,406,22,150,"N"
+"-Steve Lombardozzi",453,103,8,53,33,52,2,507,123,8,63,39,58,"A","W",289,407,6,105,"A"
+"-Spike Owen",528,122,1,67,45,51,4,1716,403,12,211,146,155,"A","W",209,372,17,350,"A"
+"-Steve Sax",633,210,6,91,56,59,6,3070,872,19,420,230,274,"N","W",367,432,16,90,"N"
+"-Tony Armas",16,2,0,1,0,0,2,28,4,0,1,0,0,"A","E",247,4,8,NA,"A"
+"-Tony Bernazard",562,169,17,88,73,53,8,3181,841,61,450,342,373,"A","E",351,442,17,530,"A"
+"-Tom Brookens",281,76,3,42,25,20,8,2658,657,48,324,300,179,"A","E",106,144,7,341.667,"A"
+"-Tom Brunansky",593,152,23,69,75,53,6,2765,686,133,369,384,321,"A","W",315,10,6,940,"A"
+"-Tony Fernandez",687,213,10,91,65,27,4,1518,448,15,196,137,89,"A","E",294,445,13,350,"A"
+"-Tim Flannery",368,103,3,48,28,54,8,1897,493,9,207,162,198,"N","W",209,246,3,326.667,"N"
+"-Tom Foley",263,70,1,26,23,30,4,888,220,9,83,82,86,"N","E",81,147,4,250,"N"
+"-Tony Gwynn",642,211,14,107,59,52,5,2364,770,27,352,230,193,"N","W",337,19,4,740,"N"
+"-Terry Harper",265,68,8,26,30,29,7,1337,339,32,135,163,128,"N","W",92,5,3,425,"A"
+"-Toby Harrah",289,63,7,36,41,44,17,7402,1954,195,1115,919,1153,"A","W",166,211,7,NA,"A"
+"-Tommy Herr",559,141,2,48,61,73,8,3162,874,16,421,349,359,"N","E",352,414,9,925,"N"
+"-Tim Hulett",520,120,17,53,44,21,4,927,227,22,106,80,52,"A","W",70,144,11,185,"A"
+"-Terry Kennedy",19,4,1,2,3,1,1,19,4,1,2,3,1,"N","W",692,70,8,920,"A"
+"-Tito Landrum",205,43,2,24,17,20,7,854,219,12,105,99,71,"N","E",131,6,1,286.667,"N"
+"-Tim Laudner",193,47,10,21,29,24,6,1136,256,42,129,139,106,"A","W",299,13,5,245,"A"
+"-Tom O'Malley",181,46,1,19,18,17,5,937,238,9,88,95,104,"A","E",37,98,9,NA,"A"
+"-Tom Paciorek",213,61,4,17,22,3,17,4061,1145,83,488,491,244,"A","W",178,45,4,235,"A"
+"-Tony Pena",510,147,10,56,52,53,7,2872,821,63,307,340,174,"N","E",810,99,18,1150,"N"
+"-Terry Pendleton",578,138,1,56,59,34,3,1399,357,7,149,161,87,"N","E",133,371,20,160,"N"
+"-Tony Perez",200,51,2,14,29,25,23,9778,2732,379,1272,1652,925,"N","W",398,29,7,NA,"N"
+"-Tony Phillips",441,113,5,76,52,76,5,1546,397,17,226,149,191,"A","W",160,290,11,425,"A"
+"-Terry Puhl",172,42,3,17,14,15,10,4086,1150,57,579,363,406,"N","W",65,0,0,900,"N"
+"-Tim Raines",580,194,9,91,62,78,8,3372,1028,48,604,314,469,"N","E",270,13,6,NA,"N"
+"-Ted Simmons",127,32,4,14,25,12,19,8396,2402,242,1048,1348,819,"N","W",167,18,6,500,"N"
+"-Tim Teufel",279,69,4,35,31,32,4,1359,355,31,180,148,158,"N","E",133,173,9,277.5,"N"
+"-Tim Wallach",480,112,18,50,71,44,7,3031,771,110,338,406,239,"N","E",94,270,16,750,"N"
+"-Vince Coleman",600,139,0,94,29,60,2,1236,309,1,201,69,110,"N","E",300,12,9,160,"N"
+"-Von Hayes",610,186,19,107,98,74,6,2728,753,69,399,366,286,"N","E",1182,96,13,1300,"N"
+"-Vance Law",360,81,5,37,44,37,7,2268,566,41,279,257,246,"N","E",170,284,3,525,"N"
+"-Wally Backman",387,124,1,67,27,36,7,1775,506,6,272,125,194,"N","E",186,290,17,550,"N"
+"-Wade Boggs",580,207,8,107,71,105,5,2778,978,32,474,322,417,"A","E",121,267,19,1600,"A"
+"-Will Clark",408,117,11,66,41,34,1,408,117,11,66,41,34,"N","W",942,72,11,120,"N"
+"-Wally Joyner",593,172,22,82,100,57,1,593,172,22,82,100,57,"A","W",1222,139,15,165,"A"
+"-Wayne Krenchicki",221,53,2,21,23,22,8,1063,283,15,107,124,106,"N","E",325,58,6,NA,"N"
+"-Willie McGee",497,127,7,65,48,37,5,2703,806,32,379,311,138,"N","E",325,9,3,700,"N"
+"-Willie Randolph",492,136,5,76,50,94,12,5511,1511,39,897,451,875,"A","E",313,381,20,875,"A"
+"-Wayne Tolleson",475,126,3,61,43,52,6,1700,433,7,217,93,146,"A","W",37,113,7,385,"A"
+"-Willie Upshaw",573,144,9,85,60,78,8,3198,857,97,470,420,332,"A","E",1314,131,12,960,"A"
+"-Willie Wilson",631,170,9,77,44,31,11,4908,1457,30,775,357,249,"A","W",408,4,3,1000,"A"
diff --git a/images/08_10_boosting_algorithm.png b/images/08_10_boosting_algorithm.png
new file mode 100644
index 0000000..7340d7f
Binary files /dev/null and b/images/08_10_boosting_algorithm.png differ
diff --git a/images/08_11_boosting_gene_exp_data.png b/images/08_11_boosting_gene_exp_data.png
new file mode 100644
index 0000000..599a7c8
Binary files /dev/null and b/images/08_11_boosting_gene_exp_data.png differ
diff --git a/images/08_12_bart_algorithm.png b/images/08_12_bart_algorithm.png
new file mode 100644
index 0000000..7d6934f
Binary files /dev/null and b/images/08_12_bart_algorithm.png differ
diff --git a/images/08_1_salary_data.png b/images/08_1_salary_data.png
new file mode 100644
index 0000000..663c527
Binary files /dev/null and b/images/08_1_salary_data.png differ
diff --git a/images/08_2_basic_tree.png b/images/08_2_basic_tree.png
new file mode 100644
index 0000000..73b6226
Binary files /dev/null and b/images/08_2_basic_tree.png differ
diff --git a/images/08_3_basic_tree_term.png b/images/08_3_basic_tree_term.png
new file mode 100644
index 0000000..564ada1
Binary files /dev/null and b/images/08_3_basic_tree_term.png differ
diff --git a/images/08_4_hitters_predictor_space.png b/images/08_4_hitters_predictor_space.png
new file mode 100644
index 0000000..c671ff0
Binary files /dev/null and b/images/08_4_hitters_predictor_space.png differ
diff --git a/images/08_5_hitters_unpruned_tree.png b/images/08_5_hitters_unpruned_tree.png
new file mode 100644
index 0000000..d5e5cc4
Binary files /dev/null and b/images/08_5_hitters_unpruned_tree.png differ
diff --git a/images/08_6_hitters_mse.png b/images/08_6_hitters_mse.png
new file mode 100644
index 0000000..71d738c
Binary files /dev/null and b/images/08_6_hitters_mse.png differ
diff --git a/images/08_7_classif_tree_heart.png b/images/08_7_classif_tree_heart.png
new file mode 100644
index 0000000..c828ab7
Binary files /dev/null and b/images/08_7_classif_tree_heart.png differ
diff --git a/images/08_8_var_importance.png b/images/08_8_var_importance.png
new file mode 100644
index 0000000..6b74a26
Binary files /dev/null and b/images/08_8_var_importance.png differ
diff --git a/images/08_9_rand_forest_gene_exp.png b/images/08_9_rand_forest_gene_exp.png
new file mode 100644
index 0000000..4681f30
Binary files /dev/null and b/images/08_9_rand_forest_gene_exp.png differ