You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*Dr. Stefano Mangiola* is currently a Postdoctoral researcher in the laboratory of Prof. Tony Papenfuss at the Walter and Eliza Hall Institute in Melbourne, Australia. His background spans from biotechnology to bioinformatics and biostatistics. His research focuses on prostate and breast tumour microenvironment, the development of statistical models for the analysis of RNA sequencing data, and data analysis and visualisation interfaces.
23
28
24
29
*Dr. Maria Doyle* is the Application and Training Specialist for Research Computing at the Peter MacCallum Cancer Centre in Melbourne, Australia. She has a PhD in Molecular Biology and currently works in bioinformatics and data science education and training. She is passionate about supporting researchers, reproducible research, open source and tidy data.
25
30
26
-
## Workshop goals and objectives
31
+
## Description
32
+
33
+
This tutorial will present how to perform analysis of single-cell RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.
34
+
35
+
This can be achieved with the integration of packages present in the R CRAN and Bioconductor ecosystem, including [tidySingleCellExperiment](https://stemangiola.github.io/tidySingleCellExperiment/) and [tidyverse](https://www.tidyverse.org/). These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data representation and analysis. For more information see the [tidy transcriptomics blog](https://stemangiola.github.io/tidytranscriptomics/).
36
+
37
+
### Pre-requisites
38
+
39
+
* Basic familiarity with single-cell transcriptomic analyses
40
+
* Basic familiarity with tidyverse
41
+
42
+
## Goals and objectives
27
43
28
44
* To approach single-cell data representation and analysis through a tidy data paradigm, integrating tidyverse with tidySingleCellExperiment.
29
45
* Compare SingleCellExperiment and tidy representation
@@ -51,8 +67,12 @@ This workshop will demonstrate a real-world example of using tidy transcriptomic
51
67
52
68
Easiest way to run this material. We will use the Orchestra Cloud platform during the BioC2022 workshop.
53
69
54
-
- Using the URL provided launch the workshop called "BioC2022: Tidy Transcriptomics For Single-Cell RNA Sequencing Analyses" **There are several tidy transcriptomics workshops. Be sure to select the BioC2022 one**.
55
-
- Open `tidytranscriptomics_case_study.Rmd` in `bioc2022_tidytranscriptomcs/vignettes` folder
70
+
1. Go to [Orchestra](http://app.orchestra.cancerdatasci.org/).
71
+
2. Log in.
72
+
3. Search for the workshop called "BioC2022: Tidy Transcriptomics For Single-Cell RNA Sequencing Analyses" **There are several tidy transcriptomics workshops. Be sure to select the BioC2022 one**.
73
+
4. Click "Launch" (may take a minute or two).
74
+
5. Follow instructions..
75
+
6. Open `tidytranscriptomics_case_study.Rmd` in `bioc2022_tidytranscriptomcs/vignettes` folder
56
76
57
77
### Local
58
78
@@ -62,7 +82,7 @@ Alternatively, you can view the material at the workshop webpage [here](https://
62
82
63
83
## Slides
64
84
65
-
*The embedded slides below may take a minute to appear. You can also download from[here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
85
+
*The embedded slides below may take a minute to appear. You can also view or download[here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
It was also possible to visualise the cells as a 3D plot using plotly.
362
+
It is also possible to visualise the cells as a 3D plot using plotly.
343
363
The example data used here only contains a few genes, for the sake of time and size in this demonstration, but below is how you could generate the 3 dimensions needed for 3D plot with a full dataset.
344
364
345
365
```{r eval = FALSE}
@@ -365,20 +385,22 @@ pbmc |>
365
385
366
386
# Exercises
367
387
388
+
Using the `sce_obj`
389
+
368
390
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
369
391
370
392
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
371
393
372
394
# Pseudobulk analyses
373
395
374
-
Now we want to identify genes whose transcription is associated with treatment, pseudo bulk analysis is how we can do this. It aggregates cell-wise transcript abundance into pseudobulk samples and enables us to perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
396
+
Next we want to identify genes whose transcription is affected by treatment in this dataset, comparing treated and untreated patients. We can do this with pseudobulk analysis. We aggregate cell-wise transcript abundance into pseudobulk samples and can then perform hypothesis testing using very well established bulk RNA sequencing tools. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
375
397
376
-
We want to do it for each cell type and the tidy transcriptomic ecosystem makes this very easy.
398
+
We want to do it for each cell type and the tidy transcriptomics ecosystem makes this very easy.
377
399
378
400
379
-
## Data exploration using pseudobulk samples
401
+
## Create pseudobulk samples
380
402
381
-
To do this, we will use a helper function called `aggregate_cells`, available in this workshop package, to combine the single cells into groups for each cell type for each sample.
403
+
To create pseudobulk samples from the single cell data, we will use a helper function called `aggregate_cells`, available in this workshop package. This function will combine the single cells into groups for each cell type for each sample.
382
404
383
405
```{r warning=FALSE, message=FALSE, echo=FALSE}
384
406
library(glue)
@@ -401,12 +423,13 @@ pseudo_bulk
401
423
402
424
## Tidybulk and tidySummarizedExperiment
403
425
404
-
With `tidySummarizedExperiment` and `tidybulk` is easy to stratify our dataset for iterative self-contained analyses.
426
+
With `tidySummarizedExperiment` and `tidybulk`it is easy to split the data into groups and perform analyses on each without needing to create separate objects.
We use tidyverse `nest` to group the data. The command below will create a tibble containing a column with a SummarizedExperiment object for each cell type. `nest` is similar to tidyverse `group_by`, except with `nest` each group is stored in a single row, and can be a complex object such as a plot or SummarizedExperiment.
410
433
411
434
```{r}
412
435
pseudo_bulk |>
@@ -421,7 +444,7 @@ pseudo_bulk |>
421
444
pull(data)
422
445
```
423
446
424
-
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients.
447
+
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients. We use tidyverse `map` to apply differential expression functions to each cell type group in the nested data.
425
448
426
449
```{r message=FALSE, warning=FALSE}
427
450
# Differential transcription abundance
@@ -430,7 +453,7 @@ pseudo_bulk <-
430
453
431
454
nest(data = -cell_type) |>
432
455
433
-
# map inputs a data column (.x)
456
+
# map accepts a data column (.x) and applies functions to each element
434
457
mutate(data = map(
435
458
data,
436
459
~ .x |>
@@ -442,17 +465,20 @@ pseudo_bulk <-
442
465
))
443
466
```
444
467
468
+
The output is again a tibble containing a SummarizedExperiment object for each cell type.
469
+
445
470
```{r}
446
471
pseudo_bulk
447
472
```
473
+
If we pull out the SummarizedExperiment object for the first cell type, as before, we can see it now has columns containing the differential expression results (e.g. logFC, PValue).
448
474
449
475
```{r}
450
476
pseudo_bulk |>
451
477
slice(1) |>
452
478
pull(data)
453
479
```
454
480
455
-
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, without needing to create multiple objects.
481
+
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, also without needing to create multiple objects.
0 commit comments