Skip to content

Commit 275be3b

Browse files
committed
Few edits
1 parent 59636ef commit 275be3b

File tree

1 file changed

+41
-15
lines changed

1 file changed

+41
-15
lines changed

vignettes/tidytranscriptomics_case_study.Rmd

+41-15
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,29 @@ knitr::opts_chunk$set(echo = TRUE)
1717

1818
# Workshop introduction
1919

20+
<p float="left">
21+
<img style="height:100px;" alt="BioC2022" src="https://bioc2022.bioconductor.org/img/carousel/BioC2022.png"/>
22+
<img style="height:100px;" alt="tidybulk" src="https://github.com/Bioconductor/BiocStickers/blob/master/tidybulk/tidybulk.png?raw=true"/>
23+
</p>
24+
2025
## Instructors
2126

2227
*Dr. Stefano Mangiola* is currently a Postdoctoral researcher in the laboratory of Prof. Tony Papenfuss at the Walter and Eliza Hall Institute in Melbourne, Australia. His background spans from biotechnology to bioinformatics and biostatistics. His research focuses on prostate and breast tumour microenvironment, the development of statistical models for the analysis of RNA sequencing data, and data analysis and visualisation interfaces.
2328

2429
*Dr. Maria Doyle* is the Application and Training Specialist for Research Computing at the Peter MacCallum Cancer Centre in Melbourne, Australia. She has a PhD in Molecular Biology and currently works in bioinformatics and data science education and training. She is passionate about supporting researchers, reproducible research, open source and tidy data.
2530

26-
## Workshop goals and objectives
31+
## Description
32+
33+
This tutorial will present how to perform analysis of single-cell RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.
34+
35+
This can be achieved with the integration of packages present in the R CRAN and Bioconductor ecosystem, including [tidySingleCellExperiment](https://stemangiola.github.io/tidySingleCellExperiment/) and [tidyverse](https://www.tidyverse.org/). These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data representation and analysis. For more information see the [tidy transcriptomics blog](https://stemangiola.github.io/tidytranscriptomics/).
36+
37+
### Pre-requisites
38+
39+
* Basic familiarity with single-cell transcriptomic analyses
40+
* Basic familiarity with tidyverse
41+
42+
## Goals and objectives
2743

2844
* To approach single-cell data representation and analysis through a tidy data paradigm, integrating tidyverse with tidySingleCellExperiment.
2945
* Compare SingleCellExperiment and tidy representation
@@ -51,8 +67,12 @@ This workshop will demonstrate a real-world example of using tidy transcriptomic
5167

5268
Easiest way to run this material. We will use the Orchestra Cloud platform during the BioC2022 workshop.
5369

54-
- Using the URL provided launch the workshop called "BioC2022: Tidy Transcriptomics For Single-Cell RNA Sequencing Analyses" **There are several tidy transcriptomics workshops. Be sure to select the BioC2022 one**.
55-
- Open `tidytranscriptomics_case_study.Rmd` in `bioc2022_tidytranscriptomcs/vignettes` folder
70+
1. Go to [Orchestra](http://app.orchestra.cancerdatasci.org/).
71+
2. Log in.
72+
3. Search for the workshop called "BioC2022: Tidy Transcriptomics For Single-Cell RNA Sequencing Analyses" **There are several tidy transcriptomics workshops. Be sure to select the BioC2022 one**.
73+
4. Click "Launch" (may take a minute or two).
74+
5. Follow instructions..
75+
6. Open `tidytranscriptomics_case_study.Rmd` in `bioc2022_tidytranscriptomcs/vignettes` folder
5676

5777
### Local
5878

@@ -62,7 +82,7 @@ Alternatively, you can view the material at the workshop webpage [here](https://
6282

6383
## Slides
6484

65-
*The embedded slides below may take a minute to appear. You can also download from [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
85+
*The embedded slides below may take a minute to appear. You can also view or download [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
6686

6787
<iframe
6888
src="https://docs.google.com/gview?url=https://raw.githubusercontent.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/master/inst/bioc2022_tidytranscriptomics.pdf&embedded=true"
@@ -250,7 +270,7 @@ sce_obj |>
250270
scales::rescale(CD8A + CD8B, to = c(0, 1))
251271
) |>
252272
253-
# plot cells with high score last
273+
# plot cells with high score last so they're not obscured by other cells
254274
arrange(signature_score) |>
255275
256276
ggplot(aes(UMAP_1, UMAP_2, color = signature_score)) +
@@ -301,7 +321,7 @@ sce_obj |>
301321
subset(signature_score > 0.7)
302322
```
303323

304-
It is then possible to focus in and analyse just these gamma delta T cells. We can chain Bioconductor and tidyverse commands to do this.
324+
We can then focus on just these gamma delta T cells and chain Bioconductor and tidyverse commands together to analyse.
305325

306326
```{r eval = FALSE}
307327
library(batchelor)
@@ -339,7 +359,7 @@ sce_obj_gamma_delta =
339359
sce_obj_gamma_delta |> select(batch, cluster, everything())
340360
```
341361

342-
It was also possible to visualise the cells as a 3D plot using plotly.
362+
It is also possible to visualise the cells as a 3D plot using plotly.
343363
The example data used here only contains a few genes, for the sake of time and size in this demonstration, but below is how you could generate the 3 dimensions needed for 3D plot with a full dataset.
344364

345365
```{r eval = FALSE}
@@ -365,20 +385,22 @@ pbmc |>
365385

366386
# Exercises
367387

388+
Using the `sce_obj`
389+
368390
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
369391

370392
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
371393

372394
# Pseudobulk analyses
373395

374-
Now we want to identify genes whose transcription is associated with treatment, pseudo bulk analysis is how we can do this. It aggregates cell-wise transcript abundance into pseudobulk samples and enables us to perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
396+
Next we want to identify genes whose transcription is affected by treatment in this dataset, comparing treated and untreated patients. We can do this with pseudobulk analysis. We aggregate cell-wise transcript abundance into pseudobulk samples and can then perform hypothesis testing using very well established bulk RNA sequencing tools. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
375397

376-
We want to do it for each cell type and the tidy transcriptomic ecosystem makes this very easy.
398+
We want to do it for each cell type and the tidy transcriptomics ecosystem makes this very easy.
377399

378400

379-
## Data exploration using pseudobulk samples
401+
## Create pseudobulk samples
380402

381-
To do this, we will use a helper function called `aggregate_cells`, available in this workshop package, to combine the single cells into groups for each cell type for each sample.
403+
To create pseudobulk samples from the single cell data, we will use a helper function called `aggregate_cells`, available in this workshop package. This function will combine the single cells into groups for each cell type for each sample.
382404

383405
```{r warning=FALSE, message=FALSE, echo=FALSE}
384406
library(glue)
@@ -401,12 +423,13 @@ pseudo_bulk
401423

402424
## Tidybulk and tidySummarizedExperiment
403425

404-
With `tidySummarizedExperiment` and `tidybulk` is easy to stratify our dataset for iterative self-contained analyses.
426+
With `tidySummarizedExperiment` and `tidybulk` it is easy to split the data into groups and perform analyses on each without needing to create separate objects.
405427

406428
```{r, echo=FALSE, out.width = "800px"}
407429
knitr::include_graphics("../inst/vignettes/new_SE_usage-01.png")
408430
```
409431

432+
We use tidyverse `nest` to group the data. The command below will create a tibble containing a column with a SummarizedExperiment object for each cell type. `nest` is similar to tidyverse `group_by`, except with `nest` each group is stored in a single row, and can be a complex object such as a plot or SummarizedExperiment.
410433

411434
```{r}
412435
pseudo_bulk |>
@@ -421,7 +444,7 @@ pseudo_bulk |>
421444
pull(data)
422445
```
423446

424-
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients.
447+
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients. We use tidyverse `map` to apply differential expression functions to each cell type group in the nested data.
425448

426449
```{r message=FALSE, warning=FALSE}
427450
# Differential transcription abundance
@@ -430,7 +453,7 @@ pseudo_bulk <-
430453
431454
nest(data = -cell_type) |>
432455
433-
# map inputs a data column (.x)
456+
# map accepts a data column (.x) and applies functions to each element
434457
mutate(data = map(
435458
data,
436459
~ .x |>
@@ -442,17 +465,20 @@ pseudo_bulk <-
442465
))
443466
```
444467

468+
The output is again a tibble containing a SummarizedExperiment object for each cell type.
469+
445470
```{r}
446471
pseudo_bulk
447472
```
473+
If we pull out the SummarizedExperiment object for the first cell type, as before, we can see it now has columns containing the differential expression results (e.g. logFC, PValue).
448474

449475
```{r}
450476
pseudo_bulk |>
451477
slice(1) |>
452478
pull(data)
453479
```
454480

455-
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, without needing to create multiple objects.
481+
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, also without needing to create multiple objects.
456482

457483
```{r message = FALSE}
458484
pseudo_bulk <-

0 commit comments

Comments
 (0)