You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: This workshop will present how to perform analysis of RNA sequencing data following the tidy data paradigm, using the tidySingleCellExperiment and tidyverse packages.
Description: This workshop will showcase analysis of single-cell RNA sequencing data following the tidy data paradigm, using the tidySingleCellExperiment, tidySummarizedExperiment, tidybulk and tidyverse packages.
This tutorial will present how to perform analysis of single-cell RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.
33
+
This tutorial will showcase analysis of single-cell RNA sequencing data following the tidy data paradigm. The tidy data paradigm provides a standard way to organise data values within a dataset, where each variable is a column, each observation is a row, and data is manipulated using an easy-to-understand vocabulary. Most importantly, the data structure remains consistent across manipulation and analysis functions.
34
34
35
-
This can be achieved with the integration of packages present in the R CRAN and Bioconductor ecosystem, including [tidySingleCellExperiment](https://stemangiola.github.io/tidySingleCellExperiment/) and [tidyverse](https://www.tidyverse.org/). These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data representation and analysis. For more information see the [tidy transcriptomics blog](https://stemangiola.github.io/tidytranscriptomics/).
35
+
This can be achieved with the integration of packages present in the R CRAN and Bioconductor ecosystem, including [tidySingleCellExperiment](https://stemangiola.github.io/tidySingleCellExperiment/), [tidySummarizedExperiment](https://stemangiola.github.io/tidySummarizedExperiment/), [tidybulk](https://stemangiola.github.io/tidybulk/) and [tidyverse](https://www.tidyverse.org/). These packages are part of the tidytranscriptomics suite that introduces a tidy approach to RNA sequencing data representation and analysis. For more information see the [tidy transcriptomics blog](https://stemangiola.github.io/tidytranscriptomics/).
36
36
37
37
### Pre-requisites
38
38
@@ -59,7 +59,7 @@ This can be achieved with the integration of packages present in the R CRAN and
59
59
- The fundamentals of single-cell data analysis
60
60
- The fundamentals of tidy data analysis
61
61
62
-
This workshop will demonstrate a real-world example of using tidy transcriptomics packages, such as tidySingleCellExperiment and tidybulk, to perform a single cell analysis. This workshop is not a step-by-step introduction in how to perform single-cell analysis. For an overview of single-cell analysis steps performed in a tidy way please see the [ISMB2021 workshop](https://tidytranscriptomics-workshops.github.io/ismb2021_tidytranscriptomics/articles/tidytranscriptomics.html).
62
+
This workshop will demonstrate a real-world example of using tidy transcriptomics packagesto analyse single cell data. This workshop is not a step-by-step introduction in how to perform single-cell analysis. For an overview of single-cell analysis steps performed in a tidy way please see the [ISMB2021 workshop](https://tidytranscriptomics-workshops.github.io/ismb2021_tidytranscriptomics/articles/tidytranscriptomics.html).
63
63
64
64
## Getting started
65
65
@@ -82,7 +82,7 @@ Alternatively, you can view the material at the workshop webpage [here](https://
82
82
83
83
## Slides
84
84
85
-
*The embedded slides below may take a minute to appear. You can also view or download [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
85
+
*The embedded slides below may take a minute to appear. You can also view or download [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf).*
@@ -145,12 +145,20 @@ We can use `filter` to choose rows, for example, to see just the rows for the ce
145
145
sce_obj |> filter(Phase == "G1")
146
146
```
147
147
148
-
We can use `select` to choose columns, for example, to see the sample, cell, total cellular RNA
148
+
We can use `select` to view columns, for example, to see the filename, total cellular RNA abundance and cell phase.
149
149
150
150
```{r}
151
-
sce_obj |> select(.cell, nCount_RNA, Phase)
151
+
sce_obj |> select(file, nCount_RNA, Phase)
152
152
```
153
153
154
+
> As we did not output the .cell column we get a tibble instead of a SingleCellExperiment object and a message to let us know: "tidySingleCellExperiment says: Key columns are missing. A data frame is returned for independent data analysis." This is ok as it's what we want here when exploring the data.
155
+
156
+
> If we use select to output the .cell (key) column we will also get any view-only columns returned, such as the UMAP columns generated during the preprocessing.
157
+
158
+
>```{r}
159
+
> sce_obj |> select(.cell, nCount_RNA, Phase)
160
+
>```
161
+
154
162
We can use `mutate` to create a column. For example, we could create a new `Phase_l` column that contains a lower-case version of `Phase`.
155
163
156
164
```{r}
@@ -211,20 +219,18 @@ The object `sce_obj` we've been using was created as part of a study on breast c
211
219
212
220
## Analyse custom signature
213
221
214
-
The researcher analysing this dataset wanted to to identify gamma delta T cells using a gene signature from a published paper [@Pizzolato2019].
222
+
The researcher analysing this dataset wanted to identify gamma delta T cells using a gene signature from a published paper [@Pizzolato2019]. We'll show how that can be done here.
215
223
216
-
With tidySingleCellExperiment's `join_features` the counts for the genes could be viewed as columns.
224
+
With tidySingleCellExperiment's `join_features`we can view the counts for genes in the signature as columns joined to our single cell tibble.
They were able to use tidySingleCellExperiment's `join_features` to select the counts for the genes in the signature, followed by tidyverse`mutate`to easily create a column containing the signature score.
231
+
We can use tidyverse `mutate` to create a column containing the signature score. To generate the score, we scale the sum of the 4 genes, CD3D, TRDC, TRGC1, TRGC2, and subtract the scaled sum of the 2 genes, CD8A and CD8B.`mutate`is powerful in enabling us to perform complex arithmetic operations easily.
It is also possible to visualise the cells as a 3D plot using plotly.
363
-
The example data used here only contains a few genes, for the sake of time and size in this demonstration, but below is how you could generate the 3 dimensions needed for 3D plot with a full dataset.
369
+
The example data used here only contains a few genes, for the sake of time and size in this demonstration, but below is how you could generate the 3 dimensions needed for 3D plot with a full dataset.
We'll demonstrate creating a 3D plot using some data that has 3 UMAP dimensions.
376
+
We'll demonstrate creating a 3D plot using some data that has 3 UMAP dimensions. This is a fantastic way to visualise both reduced dimensions and metadata in the same representation.
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
396
+
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
391
397
392
-
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
398
+
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
393
399
394
400
# Pseudobulk analyses
395
401
396
-
Next we want to identify genes whose transcription is affected by treatment in this dataset, comparing treated and untreated patients. We can do this with pseudobulk analysis. We aggregate cell-wise transcript abundance into pseudobulk samples and can then perform hypothesis testing using very well established bulk RNA sequencing tools. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
402
+
Next we want to identify genes whose transcription is affected by treatment in this dataset, comparing treated and untreated patients. We can do this with pseudobulk analysis. We aggregate cell-wise transcript abundance into pseudobulk samples and can then perform hypothesis testing using the very well established bulk RNA sequencing tools. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
397
403
398
404
We want to do it for each cell type and the tidy transcriptomics ecosystem makes this very easy.
399
405
400
406
401
407
## Create pseudobulk samples
402
408
403
-
To create pseudobulk samples from the single cell data, we will use a helper function called `aggregate_cells`, available in this workshop package. This function will combine the single cells into groups for each cell type for each sample.
409
+
To create pseudobulk samples from the single cell data, we will use a helper function called `aggregate_cells`, available in this workshop package. This function will combine the single cells into a group for each cell type for each sample.
0 commit comments