Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding marker genes in eah group #136

Closed
emkahuda opened this issue Feb 16, 2022 · 9 comments
Closed

Finding marker genes in eah group #136

emkahuda opened this issue Feb 16, 2022 · 9 comments
Labels

Comments

@emkahuda
Copy link

I have been using splatter to generate scRNA synthetic data. However, when I used the generated synthetic data for the Seurat object, I found some clusters do not have a single marker gene (or sometimes they have but only a few). Is there any way that I can find marker genes for each group or cluster then add them to the metadata?

@lazappi
Copy link
Collaborator

lazappi commented Feb 17, 2022

Hi @emkahuda

Thanks for giving {splatter} a go! Please have a look at this issue for information on how to find the ground truth simulated DE genes #57. I will also point out that all the simulated DE genes may not be detected by the marker gene test depending on the foldchanges etc.

@emkahuda
Copy link
Author

Hi, thank you for the reply. So, actually what I am looking for are

  1. How can we know which cells belong to which cell types (which cluster or group)?
  2. Then, after knowing a list of cells in each group, I am wondering if then we can find the list of marker genes of each group?
    I cannot find the clustering algorithm in splatter. What I understand, we can set what kind of synthetic data we want such as how many group or clusters and the size of those groups, including to set the probability of genes to be DE genes. Could you please help? I might have missed something

@lazappi
Copy link
Collaborator

lazappi commented Feb 18, 2022

As {splatter} is a simulation package it doesn't perform clustering but it can produce datasets with clusters which we call "groups". This information is stored in the column data of the produced SingleCellExperiment object (colData(sce)$Group). The number and size of these groups are controlled by various parameters as described in the main splatter vignette and the parameters vignette.

The issue I linked to previously describes how to find the simulated DE genes for each of these groups (some of these can be considered "markers" but probably not all, it depends how you define that term).

@emkahuda
Copy link
Author

emkahuda commented Feb 18, 2022 via email

@emkahuda
Copy link
Author

emkahuda commented Mar 3, 2022 via email

@lazappi
Copy link
Collaborator

lazappi commented Mar 4, 2022

During the simulation process cells are randomly assigned to groups (with numbers depending on the group.prob parameter). You can consider these "Group" labels as the ground truth identity for each cell which you could compare to the clusters from whatever clustering method such as SC3. There is no direct relationship between {splatter} and any downstream analysis method.

@emkahuda
Copy link
Author

emkahuda commented Mar 6, 2022 via email

@lazappi
Copy link
Collaborator

lazappi commented Mar 7, 2022

I can't see the uploaded files (I think you need to attach them on GitHub, not by email) but this is not unexpected. The parameters for creating groups are not estimated directly from a real dataset and it would take some adjusting to get them to match. For example, I think you have randomly set group.prob but this would need to match the proportions in the real dataset (it's a bit hard to tell without any code). You may be able to get something that looks very close to your example dataset but it's non-trivial.

I would consider why you want to have an exact replica of a real dataset? The strength of simulations is being able to test scenarios for which there is no real data. If you already have this data I would suggest using it directly rather than trying to create a simulated copy of it. If there is a reason that won't work think about what scenario you want to simulate rather than just recreating a real dataset.

@emkahuda
Copy link
Author

emkahuda commented Mar 7, 2022 via email

@lazappi lazappi closed this as completed Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants