@@ -151,7 +151,7 @@ for `read.table("file.csv", sep = ",")`. You can see in the help
151
151
documentation that there are several additional variations of
152
152
` read.table ` , such as ` read.csv2 ` to read tables separated by ` ; `
153
153
and ` read.delim ` to read in tables separated by ` \t ` (tabs). If you know how your table is separated, you can use one of the provided short cuts,
154
- but case you run into an unconventional separator you are now equipt with the knowledge to define it in the ` sep = ` arugument of ` read.table ` !
154
+ but case you run into an unconventional separator you are now equipped with the knowledge to define it in the ` sep = ` argument of ` read.table ` !
155
155
156
156
157
157
::::::::::::::::::::::::::::::::::::::::::::::::::
@@ -214,7 +214,7 @@ new data frame using the `data.frame()` function.
214
214
``` {r, purl=FALSE}
215
215
## put the first three columns of variants into a new data frame called subset
216
216
217
- subset<- data.frame(variants[,c(1:3,6)])
217
+ subset <- data.frame(variants[, c(1:3, 6)])
218
218
```
219
219
220
220
Now, let's use the ` str() ` (structure) function to look a little more closely
@@ -239,12 +239,13 @@ Ok, thats a lot up unpack! Some things to notice.
239
239
Factors are the final major data structure we will introduce in our R genomics
240
240
lessons. Factors can be thought of as vectors which are specialized for
241
241
categorical data. Given R's specialization for statistics, this make sense since
242
- categorial and continuous variables are usually treated differently. Sometimes
242
+ categorical and continuous variables are usually treated differently. Sometimes
243
243
you may want to have data treated as a factor, but in other cases, this may be
244
244
undesirable.
245
245
246
- Let's see the value of treating some of which are categorical in nature as
247
- factors. Let's take a look at just the alternate alleles
246
+ Let's explore the value of treating some vectors that are categorical in nature as
247
+ factors. To do this we'll take a look at just the alternate alleles. We can use the ` $ ` operator
248
+ to access or extract a column by its name in data frames (or to extract objects within named lists).
248
249
249
250
``` {r, purl=FALSE}
250
251
## extract the "ALT" column to a new object
@@ -259,11 +260,11 @@ head(alt_alleles)
259
260
```
260
261
261
262
There are 801 alleles (one for each row). To simplify, lets look at just the
262
- single-nuleotide alleles (SNPs). We can use some of the vector indexing skills
263
+ single-nucleotide alleles (SNPs). We can use some of the vector indexing skills
263
264
from the last episode.
264
265
265
266
``` {r, purl=FALSE}
266
- snps <- c(alt_alleles[alt_alleles== "A"],
267
+ snps <- c(alt_alleles[alt_alleles == "A"],
267
268
alt_alleles[alt_alleles=="T"],
268
269
alt_alleles[alt_alleles=="G"],
269
270
alt_alleles[alt_alleles=="C"])
@@ -442,19 +443,19 @@ l. `variants[variants$REF == "A",]`
442
443
a.
443
444
444
445
``` {r}
445
- variants[1,1]
446
+ variants[1, 1]
446
447
```
447
448
448
449
b.
449
450
450
451
``` {r}
451
- variants[2,4]
452
+ variants[2, 4]
452
453
```
453
454
454
455
c.
455
456
456
457
``` {r}
457
- variants[801,29]
458
+ variants[801, 29]
458
459
```
459
460
460
461
d.
@@ -476,23 +477,23 @@ head(variants[-1, ])
476
477
f.
477
478
478
479
``` {r}
479
- variants[1:4,1]
480
+ variants[1:4, 1]
480
481
```
481
482
482
483
g.
483
484
484
485
``` {r}
485
- variants[1:10,c("REF","ALT")]
486
+ variants[1:10, c("REF", "ALT")]
486
487
```
487
488
488
489
h.
489
490
490
491
``` {r, echo=TRUE, eval=FALSE}
491
- variants[,c("sample_id")]
492
+ variants[, c("sample_id")]
492
493
```
493
494
494
495
``` {r, echo=FALSE, eval=TRUE}
495
- head(variants[,c("sample_id")])
496
+ head(variants[, c("sample_id")])
496
497
```
497
498
498
499
i.
@@ -520,11 +521,11 @@ head(variants$sample_id)
520
521
l.
521
522
522
523
``` {r, echo=TRUE, eval=FALSE}
523
- variants[variants$REF == "A",]
524
+ variants[variants$REF == "A", ]
524
525
```
525
526
526
527
``` {r, echo=FALSE, eval=TRUE}
527
- head(variants[variants$REF == "A",])
528
+ head(variants[variants$REF == "A", ])
528
529
```
529
530
530
531
:::::::::::::::::::::::::
@@ -547,7 +548,7 @@ them to a new object name:
547
548
``` {r, purl=FALSE}
548
549
# create a new data frame containing only observations from SRR2584863
549
550
550
- SRR2584863_variants <- variants[variants$sample_id == "SRR2584863",]
551
+ SRR2584863_variants <- variants[variants$sample_id == "SRR2584863", ]
551
552
552
553
# check the dimension of the data frame
553
554
@@ -842,7 +843,7 @@ H) Save the edited Ecoli\_metadata data frame as "exercise\_solution.csv" in you
842
843
dim(Ecoli_metadata)
843
844
levels(as.factor(Ecoli_metadata$cit))
844
845
table(as.factor(Ecoli_metadata$cit))
845
- Ecoli_metadata[7,7]
846
+ Ecoli_metadata[7, 7]
846
847
median(Ecoli_metadata$genome_size)
847
848
colnames(Ecoli_metadata)[colnames(Ecoli_metadata) == "sample"] <- "sample_id"
848
849
Ecoli_metadata$genome_size_bp <- Ecoli_metadata$genome_size * 1000000
0 commit comments