Skip to content

Commit 46d1774

Browse files
Merge pull request #269 from naupaka/main
Address #120 by adding short description of `$` when it is first used
2 parents e9c57ca + e7d52ec commit 46d1774

File tree

1 file changed

+19
-18
lines changed

1 file changed

+19
-18
lines changed

episodes/03-basics-factors-dataframes.Rmd

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ for `read.table("file.csv", sep = ",")`. You can see in the help
151151
documentation that there are several additional variations of
152152
`read.table`, such as `read.csv2` to read tables separated by `;`
153153
and `read.delim` to read in tables separated by `\t` (tabs). If you know how your table is separated, you can use one of the provided short cuts,
154-
but case you run into an unconventional separator you are now equipt with the knowledge to define it in the `sep = ` arugument of `read.table`!
154+
but case you run into an unconventional separator you are now equipped with the knowledge to define it in the `sep = ` argument of `read.table`!
155155

156156

157157
::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -214,7 +214,7 @@ new data frame using the `data.frame()` function.
214214
```{r, purl=FALSE}
215215
## put the first three columns of variants into a new data frame called subset
216216
217-
subset<-data.frame(variants[,c(1:3,6)])
217+
subset <- data.frame(variants[, c(1:3, 6)])
218218
```
219219

220220
Now, let's use the `str()` (structure) function to look a little more closely
@@ -239,12 +239,13 @@ Ok, thats a lot up unpack! Some things to notice.
239239
Factors are the final major data structure we will introduce in our R genomics
240240
lessons. Factors can be thought of as vectors which are specialized for
241241
categorical data. Given R's specialization for statistics, this make sense since
242-
categorial and continuous variables are usually treated differently. Sometimes
242+
categorical and continuous variables are usually treated differently. Sometimes
243243
you may want to have data treated as a factor, but in other cases, this may be
244244
undesirable.
245245

246-
Let's see the value of treating some of which are categorical in nature as
247-
factors. Let's take a look at just the alternate alleles
246+
Let's explore the value of treating some vectors that are categorical in nature as
247+
factors. To do this we'll take a look at just the alternate alleles. We can use the `$` operator
248+
to access or extract a column by its name in data frames (or to extract objects within named lists).
248249

249250
```{r, purl=FALSE}
250251
## extract the "ALT" column to a new object
@@ -259,11 +260,11 @@ head(alt_alleles)
259260
```
260261

261262
There are 801 alleles (one for each row). To simplify, lets look at just the
262-
single-nuleotide alleles (SNPs). We can use some of the vector indexing skills
263+
single-nucleotide alleles (SNPs). We can use some of the vector indexing skills
263264
from the last episode.
264265

265266
```{r, purl=FALSE}
266-
snps <- c(alt_alleles[alt_alleles=="A"],
267+
snps <- c(alt_alleles[alt_alleles == "A"],
267268
alt_alleles[alt_alleles=="T"],
268269
alt_alleles[alt_alleles=="G"],
269270
alt_alleles[alt_alleles=="C"])
@@ -442,19 +443,19 @@ l. `variants[variants$REF == "A",]`
442443
a.
443444

444445
```{r}
445-
variants[1,1]
446+
variants[1, 1]
446447
```
447448

448449
b.
449450

450451
```{r}
451-
variants[2,4]
452+
variants[2, 4]
452453
```
453454

454455
c.
455456

456457
```{r}
457-
variants[801,29]
458+
variants[801, 29]
458459
```
459460

460461
d.
@@ -476,23 +477,23 @@ head(variants[-1, ])
476477
f.
477478

478479
```{r}
479-
variants[1:4,1]
480+
variants[1:4, 1]
480481
```
481482

482483
g.
483484

484485
```{r}
485-
variants[1:10,c("REF","ALT")]
486+
variants[1:10, c("REF", "ALT")]
486487
```
487488

488489
h.
489490

490491
```{r, echo=TRUE, eval=FALSE}
491-
variants[,c("sample_id")]
492+
variants[, c("sample_id")]
492493
```
493494

494495
```{r, echo=FALSE, eval=TRUE}
495-
head(variants[,c("sample_id")])
496+
head(variants[, c("sample_id")])
496497
```
497498

498499
i.
@@ -520,11 +521,11 @@ head(variants$sample_id)
520521
l.
521522

522523
```{r, echo=TRUE, eval=FALSE}
523-
variants[variants$REF == "A",]
524+
variants[variants$REF == "A", ]
524525
```
525526

526527
```{r, echo=FALSE, eval=TRUE}
527-
head(variants[variants$REF == "A",])
528+
head(variants[variants$REF == "A", ])
528529
```
529530

530531
:::::::::::::::::::::::::
@@ -547,7 +548,7 @@ them to a new object name:
547548
```{r, purl=FALSE}
548549
# create a new data frame containing only observations from SRR2584863
549550
550-
SRR2584863_variants <- variants[variants$sample_id == "SRR2584863",]
551+
SRR2584863_variants <- variants[variants$sample_id == "SRR2584863", ]
551552
552553
# check the dimension of the data frame
553554
@@ -842,7 +843,7 @@ H) Save the edited Ecoli\_metadata data frame as "exercise\_solution.csv" in you
842843
dim(Ecoli_metadata)
843844
levels(as.factor(Ecoli_metadata$cit))
844845
table(as.factor(Ecoli_metadata$cit))
845-
Ecoli_metadata[7,7]
846+
Ecoli_metadata[7, 7]
846847
median(Ecoli_metadata$genome_size)
847848
colnames(Ecoli_metadata)[colnames(Ecoli_metadata) == "sample"] <- "sample_id"
848849
Ecoli_metadata$genome_size_bp <- Ecoli_metadata$genome_size * 1000000

0 commit comments

Comments
 (0)