intro-to-r/index.Rmd

---
title: "Intro to R"
subtitle: "Bioinformatics Coffee Hour"
date: "Augus 4, 2020"
author: "Danielle Khost"
output: html_document
---

##Setting up our working directory and saving objects vs files

R is a functional programming language, which means that most of what one does is apply functions to objects.

We will begin with a very brief introduction to R objects and how functions work, and then focus on getting data into R, manipulating that data in R, plotting, and analysis.

First, we need to make sure we are working out of the correct directory. You can see the working directory above the console in RStudio, or use the `getwd()` function to see the current working directory. If you are not currently working in a good location to read and save files, you can use `setwd()` to change the working directory to the location of your choice.

```{r}
getwd()
```


## Vectors

Let's start by creating one of the simplest R objects, a vector of numbers.
```{r}

```

v1 is an *object* which we created by using the ```<-``` operator (less followed by a dash).

v1 now contains the output of the *function* ```c(1,2,3,4,5)``` (c, for **c**ombined, combines elements into a vector

Let's display the contents of v1:
```{r}

```

We can also just type the name of an object (in this case ```v1```) to display it.

Let's make a few more objects:
```{r}

```
With this last variable, we had to use ```""``` to specify that we want to create a character vector. Otherwise it thinks we are looking for variables to combine called a,b,c and d.

```{r, eval=FALSE}

```


We might want to get a list of all the objects we've created at this point. In R, the function ls() returns a character vector with the names of all the objects in a specified environment (by default, the set of user-defined objects and functions).

```{r}
ls()
```


## Getting Help

R has very good built-in documentation that describes what functions do.

To get help about a particular function, use ? followed by the function name, like so:

?ls

If you are not exactly sure of the function name, you can perform a search:

??mean

We can manipulate objects like so:
```{r}

```

Note that the value of ```x``` is not modified here. We did not save the output to a new object, so it is printed to the screen.

If we want to update the value of ```x```, we need to use our assignment operator:
```{r}

```

R handles vector math for us automatically:
```{r}

```

> **Vector exercises:**
>
>1. Create a new vector,  (called nums), that contains any 5 numbers you like
>
>2. There are a number of functions that operate on vectors, including length(), max(), min(), and mean(), all of which do exactly what they sound like they do. So for example length(nums) should return 5, because nums is a 5-element vector. Use these functions to get the minimum, maximum, and mean value for the nums vector you created.


```{r}


```


## Object Types

All objects have a type. Object types are a complex topic and we are only going to scratch the surface today. To slightly simplify, all data objects in R are either atomic vectors (contain only a single type of data), or data structures that combine atomic vectors in various ways. We'll walk through a few examples. First, let's consider a couple of the vectors that we've already made.

```{r}

```

*Numeric* and *character* data types are two of the most common we'll encounter, and are just what they sound like. Another useful type is *logical* data, as in TRUE/FALSE. We can create a logical vector directly like so:
```{r}

```

Or we can use logical tests.
```{r}

```

Note that the logical test here (>2) is applied independently to each element in the vector. We'll come back to this when we talk about data subsets.

> **Object type exercises:**
>
>1. Construct a logical vector with the same number of elements as your nums vector, that is TRUE if the corresponding element in the nums vector is less than the mean of the nums vector, and FALSE otherwise. Call this vector big_nums


```{r}


```


## Subsetting and Logical Vectors

We can also extract portions of vectors that we've created using ```[]``` following the vector, with the positions we wish to extract. For example, we can extract portions of the nums vector:

```{r}

```

Now it can be useful to specify which positions you would like to include, but often it is more useful to create new objects based on some sort of rule. That is where logical vectors come in, and really how you will mostly be using logical vectors going forward.

For example, in the last exercise, we constructed a logical vector that was the same length as our nums vector that was ```TRUE``` with if the corresponding element in the nums vector is less than the mean of the nums vector, and ```FALSE``` otherwise.

Let's take a look at both of these vectors again.

```{r}

```

Let's say we actually want to extract all of the numbers that are bigger than the mean (so all of the ```TRUE```) values. All we have to do is put the logical vector in brackets.

```{r}

```

We don't necessarily have to define this vector ahead of time, but can put the rule in the brackets as well.

```{r}

```

> **Vector Subsetting Data exercises:**
>
> 1. Subset nums for all numbers that are larger than the minimum.
>
> Hints -- logical tests in R - greater than, less than, equal to, not equal to
>
>  > '> , < , == , !='

```{r}


```


## Data Frames

So far we've been talking about atomic vectors, which only contain a single data type (every element is logical, or character, or numeric). However, data sets will usually have multiple different data types: numeric for continunous data, character for categorical data and sample labels. Depending on how underlying types are combined, we can have four different "higher-level" data types in R:

**Dimensions** | **Homogeneous** | **Heterogeneous**
---------------|-----------------|-------------------
1-D            | atomic vector   | list
2-D            | matrix          | data frame / tibble

We'll focus on data frames for today, but lists and matrices can also be very powerful. A data frame is a collection of vectors, which can be (but don't have to be) different types, but all have to have the same length. Let's make a couple of toy data frames.

One way to do this is with the data.frame() function.

```{r}

```

str() gives lots of information about the data type of the constituent parts of a data frame
```{r}

```

We can use the function head() to look at part of a a dataframe (or any R object). This can be very useful if you have a very large or long dataframe. You also can control how many lines of the dataframe you view with ```n=#```, although the default is 6.

```{r}

```

The summary() function can also be very useful to get a snapshot of the data in your dataframe.

```{r}

```


###Reading files into R

So far we have been working with small objects we created by hand. A more common way to create dataframes is by reading from a file. There are a few functions to do this in R: read.table() and read.csv() make this easy.

```{r}

```


> **File reading exercise**
>
>1. Try using head() to see the first 20 rows and summary() to see some information about the data.


```{r}

```

Again, summary() is a good way to see some basic information about a dataframe

```{r}

```

The view function can be useful for viewing dataframes in a spread-sheet like format.

```{r, eval=FALSE}

```

### Subsetting and Manipulating Dataframes

We learned about how to subset vectors earlier, now let's look at ways to get subsets of data from two-dimensional data. R uses ```[]``` to index rows and columns, like so:

```{r, eval=FALSE}

```

The first number refers to rows, and the second to columns in the case for 2-dimensional objects.

Notice that when you are extracting data from a single column, the data are returned as a vector of the same type as the column. However, if you extract more than one column, the data are returend as a data frame.

```{r, eval=FALSE}

```


Often we want to select a set of rows from a data frame matching some criteria. We can use the subset() function for this. Let's start by just getting the data for key bus routes.

```{r}

```

Sometimes we might want something more complicated. Let's say we want to get data for all "local" bus routes with at least 1000 daily riders.

```{r}


```

> **Subsetting Data exercises:**
>
> 1. Subset bus when ridership is greater than 2500.
>
> 2. Subset bus when bus routes are NOT classified as Key.
>
> 3. Subset bus using the conditions of both #1 and #2, save as bus_2500_notKey
>
> Hints -- logical tests in R
>
>  > '> , < , == , !='
>
>  > '& is and -->> z <- x & y TRUE only when both x and y are true'
>
>  > '| is or -->> z <- x | y TRUE if either x or y is true'
>

```{r}

```


We can add and manipulate the data frame as well, but we will save that for the next session, and use some packages specifically built for that purpose.

## Writing Data

We've added several variables now, and we might want to write our updated dataset to a file. We do this with the write.table() function, which writes out a data.frame.

```{r}

```

But where did R put the file on our computer? R does all file operations without a full path in a working directory. RStudio has a preference to set the default working directory, which is typically something like /Users/Danielle/R.

To see the current working directory, use:
```{r}
getwd()
```

We can change the working directory with ```setwd()```.


>**File operators exercises:**
>
>1. Write a copy of bus_2500_notKey, but **be sure to give it a different name than the original!** Otherwise you will overwrite your original data.
>

```{r}

```