Skip to content

Commit ce4638f

Browse files
committed
differences for PR #290
1 parent b3f2e6d commit ce4638f

4 files changed

+133
-13
lines changed

03-basics-factors-dataframes.md

+132-12
Original file line numberDiff line numberDiff line change
@@ -294,12 +294,13 @@ str(subset)
294294

295295
Ok, thats a lot up unpack! Some things to notice.
296296

297-
- the object type `data.frame` is displayed in the first row along with its
297+
- The object type `data.frame` is displayed in the first row along with its
298298
dimensions, in this case 801 observations (rows) and 4 variables (columns)
299-
- Each variable (column) has a name (e.g. `sample_id`). This is followed
300-
by the object mode (e.g. chr, int, etc.). Notice that before each
299+
- Each variable (column) has a name (e.g. `sample_id`). Notice that before each
301300
variable name there is a `$` - this will be important later.
302-
301+
- Each variable name is followed by the data type it contains (e.g. chr, int, etc.).
302+
The `int` type shows an integer, which is a type of numerical data, where it can only
303+
store whole numbers (i.e. no decimal points ).
303304

304305

305306
::::::::::::::::::::::::::::::::::::::: challenge
@@ -379,11 +380,109 @@ head(alt_alleles)
379380
```
380381

381382
There are 801 alleles (one for each row). To simplify, lets look at just the
382-
single-nucleotide alleles (SNPs). We can use some of the vector indexing skills
383-
from the last episode.
383+
single-nucleotide alleles (SNPs).
384+
385+
Let's review some of the vector indexing skills from the last episode that can help:
386+
387+
388+
``` r
389+
# This will find all matching alleles with the single nucleotide "A" and provide a TRUE/FASE vector
390+
alt_alleles == "A"
391+
```
392+
393+
``` output
394+
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
395+
[13] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
396+
[25] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
397+
[37] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
398+
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
399+
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
400+
[73] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
401+
[85] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
402+
[97] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
403+
[109] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
404+
[121] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
405+
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
406+
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
407+
[157] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
408+
[169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
409+
[181] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
410+
[193] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
411+
[205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
412+
[217] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
413+
[229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
414+
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
415+
[253] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE
416+
[265] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
417+
[277] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
418+
[289] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE
419+
[301] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
420+
[313] FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE
421+
[325] FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
422+
[337] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
423+
[349] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
424+
[361] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
425+
[373] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
426+
[385] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
427+
[397] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE
428+
[409] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE
429+
[421] FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
430+
[433] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
431+
[445] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
432+
[457] TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
433+
[469] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
434+
[481] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
435+
[493] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
436+
[505] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
437+
[517] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
438+
[529] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE
439+
[541] TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
440+
[553] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE
441+
[565] FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
442+
[577] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
443+
[589] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
444+
[601] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
445+
[613] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE
446+
[625] TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
447+
[637] FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
448+
[649] TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
449+
[661] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
450+
[673] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
451+
[685] FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
452+
[697] TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
453+
[709] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
454+
[721] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
455+
[733] TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE
456+
[745] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
457+
[757] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
458+
[769] FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
459+
[781] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
460+
[793] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
461+
```
384462

463+
``` r
464+
# Then, we wrap them into an index to pull all the positions that match this.
465+
alt_alleles[alt_alleles == "A"]
466+
```
467+
468+
``` output
469+
[1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
470+
[19] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
471+
[37] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
472+
[55] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
473+
[73] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
474+
[91] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
475+
[109] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
476+
[127] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
477+
[145] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
478+
[163] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
479+
[181] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
480+
[199] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
481+
```
385482

386483
``` r
484+
# If we repeat this for each nucleotide A, T, G, and C, and connect them using `c()`,
485+
# we can index all the single nucleotide changes.
387486
snps <- c(alt_alleles[alt_alleles == "A"],
388487
alt_alleles[alt_alleles=="T"],
389488
alt_alleles[alt_alleles=="G"],
@@ -418,7 +517,18 @@ Error in plot.window(...): need finite 'ylim' values
418517
```
419518

420519
Whoops! Though the `plot()` function will do its best to give us a quick plot,
421-
it is unable to do so here. One way to fix this it to tell R to treat the SNPs
520+
it is unable to do so here. Let's use `str()` to see why this might be:
521+
522+
523+
``` r
524+
str(snps)
525+
```
526+
527+
``` output
528+
chr [1:707] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" ...
529+
```
530+
531+
R may not know how to plot a character vector! One way to fix this it to tell R to treat the SNPs
422532
as categories (i.e. a factor vector); we will create a new object to avoid
423533
confusion using the `factor()` function:
424534

@@ -463,7 +573,17 @@ summary(factor_snps)
463573
211 139 154 203
464574
```
465575

466-
As you can imagine, this is already useful when you want to generate a tally.
576+
``` r
577+
# Compare the character vector
578+
summary(snps)
579+
```
580+
581+
``` output
582+
Length Class Mode
583+
707 character character
584+
```
585+
586+
As you can imagine, factors are already useful when you want to generate a tally.
467587

468588
::::::::::::::::::::::::::::::::::::::::: callout
469589

@@ -489,7 +609,7 @@ possible SNP we could generate a plot:
489609
plot(factor_snps)
490610
```
491611

492-
<img src="fig/03-basics-factors-dataframes-rendered-unnamed-chunk-16-1.png" style="display: block; margin: auto;" />
612+
<img src="fig/03-basics-factors-dataframes-rendered-unnamed-chunk-17-1.png" style="display: block; margin: auto;" />
493613

494614
This isn't a particularly pretty example of a plot but it works. We'll be
495615
learning much more about creating nice, publication-quality graphics later in
@@ -526,7 +646,7 @@ Now we see our plot has be reordered:
526646
plot(ordered_factor_snps)
527647
```
528648

529-
<img src="fig/03-basics-factors-dataframes-rendered-unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
649+
<img src="fig/03-basics-factors-dataframes-rendered-unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
530650

531651
Factors come in handy in many places when using R. Even using more
532652
sophisticated plotting packages such as ggplot2 will sometimes require you
@@ -555,7 +675,7 @@ These packages will be installed into "~/work/genomics-r-intro/genomics-r-intro/
555675
556676
# Installing packages --------------------------------------------------------
557677
- Installing ggplot2 ... OK [linked from cache]
558-
Successfully installed 1 package in 7.2 milliseconds.
678+
Successfully installed 1 package in 9.9 milliseconds.
559679
```
560680

561681
``` r
@@ -569,7 +689,7 @@ These packages will be installed into "~/work/genomics-r-intro/genomics-r-intro/
569689
570690
# Installing packages --------------------------------------------------------
571691
- Installing dplyr ... OK [linked from cache]
572-
Successfully installed 1 package in 6.4 milliseconds.
692+
Successfully installed 1 package in 5.5 milliseconds.
573693
```
574694

575695
These two packages are among the most popular add on packages used in R, and they are part of a large set of very useful packages called the [tidyverse](https://www.tidyverse.org). Packages in the tidyverse are designed to work well together and are made to work with tidy data (which we described earlier in this lesson).
Loading
Loading

md5sum.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"episodes/00-introduction.Rmd" "e1354ed92fb458179c8c00b00ee1cf55" "site/built/00-introduction.md" "2024-10-02"
77
"episodes/01-r-basics.Rmd" "c9e52db6d25e0b716fce903bdf9d3ee8" "site/built/01-r-basics.md" "2024-10-02"
88
"episodes/02-data-prelude.Rmd" "ab2b1fd3cdaae919f9e409f713a0a8ad" "site/built/02-data-prelude.md" "2024-10-02"
9-
"episodes/03-basics-factors-dataframes.Rmd" "aba49258815322842a6abc14422e68b5" "site/built/03-basics-factors-dataframes.md" "2024-10-02"
9+
"episodes/03-basics-factors-dataframes.Rmd" "12a92926086599da792d6ec59b2df56e" "site/built/03-basics-factors-dataframes.md" "2024-10-02"
1010
"episodes/04-bioconductor-vcfr.Rmd" "10eb69b4697d7ecb9695d36c0d974208" "site/built/04-bioconductor-vcfr.md" "2024-10-02"
1111
"episodes/05-dplyr.Rmd" "f74055bd8677338a213e0a0c6c430119" "site/built/05-dplyr.md" "2024-10-02"
1212
"episodes/06-data-visualization.Rmd" "0b45534421bad05f040b24c40b6da71b" "site/built/06-data-visualization.md" "2024-10-02"

0 commit comments

Comments
 (0)