Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation of Pseudotime with splatSimulatePaths() #156

Closed
spriyansh opened this issue Nov 21, 2022 · 7 comments
Closed

Simulation of Pseudotime with splatSimulatePaths() #156

spriyansh opened this issue Nov 21, 2022 · 7 comments
Labels

Comments

@spriyansh
Copy link

Hi @lazappi, great tool.

I was wondering whether there is a way to simulate or elucidate the pseudotime-like ordering of the simulated cells using splatSimulatePaths().

From what I see in the colData(sce), there is a column Step. Can this be used as pseudotime-like ordering? Where the min(sce$Step) could be considered as pseudotime 0 (starting points) and max(sce$Step) would be considered as most differentiated cell states.

The idea is to plot gene expression against known pseudotime-like ordering—for example, the following plot but with known ground truth.

Is there a functionality for this in splatter or a workaround? Also, it would be really nice if you give an insight into how paths are computed, something like this

@lazappi
Copy link
Collaborator

lazappi commented Nov 21, 2022

Hi @spriyansh

Thanks for giving {splatter} a go! Yes, you should be able to use the Step value in that way. What this tells you is how far along each gene is between the two endpoints of the path. Note that the step value is relative to the that path. So if you have a simulation like this:

  graph TD;
      A-->|Path 1|B;
      B-->|Path 2|C;
Loading

And there is a cell with Path==2, Step==50 then the "pseudotime" (total steps) for that cell would be 150 (100 to the end of Path 1, then 50 along Path 2). That's using the default path.nSteps of 100.

You can find more information about how paths are calculated in the paper, the Splat parameters vignette and this issue #124. If you have a specific question after looking at those I'm happy to try and answer.

@spriyansh
Copy link
Author

spriyansh commented Nov 22, 2022

Hi @lazappi; thanks for such a quick response.

I want to simulate the branching trajectory as follows,

graph TD;
    A-->|Path-1|B;
    B-->|Path-2|C;
    B-->|Path-3|D;
Loading

I am taking a simple case where all the paths have the same number of steps, i.e., 100. Also, the paths after the branching point of both the diverging paths have the same time range.

Path 1: 0-100
Path 2: 100-200
Path 3: 300-400
Path 4: 300-400

Here is what I have tried using the Splat parameters vignette

# Load library
library(splatter)
library(scatter)
library(SingleCellExperiment)
library(scran)
library(ggplot2)
library(splines)
set.seed(007)

I simulated the topology as follows, as you suggested,

# Setting parameters
params.groups <- newSplatParams(batchCells = 500, nGenes = 1000)

# Simulating Path
sim.sce <- splatSimulatePaths(params.groups,
                           group.prob = c(0.25, 0.25, 0.25, 0.25),
                           de.prob = 0.5, de.facLoc = 0.2,
                           path.from = c(0, 1, 1, 3),
                           verbose = FALSE)

# Plot
sim.sce <- logNormCounts(sim.sce)
sim.sce <- runPCA(sim.sce)
plotPCA(sim.sce, colour_by = "Group") + ggtitle("Branching path")

I made the following function to plot gene expression vs custom time. It can be reproduced using this file.

new.sce <- readRDS("github.RDS")

# Plot Gene vs customTime
plot_with_pTime <- function(sce_obj, 
                            gene_id, 
                            cell_col = "Cell", 
                            pTime_col,
                            path_col = "Group"){
    
    # Extract the colData
    cell.data <- as.data.frame(colData(sce_obj))[,c(cell_col, pTime_col, path_col)]
    
    # Extract the matrix
    t.counts.mtx <- as.matrix(sce_obj@assays@data$TrueCounts) 
    
    # Long format
    t.counts.tab <- reshape::melt(t.counts.mtx)
    colnames(t.counts.tab) <- c("gene_id", "cell_id", "true_expr")
    
    # Select the gene of interest
    plt.tab <- t.counts.tab[t.counts.tab$gene_id == gene_id, ]
    
    # Merge with time 
    plt.tab <- merge(plt.tab, cell.data, by.x = "cell_id", by.y = cell_col)
    
    #return(plt.tab)
    
    # Plot B-Spline
    p <- ggplot(plt.tab, aes(x = .data[[pTime_col]], y = log(true_expr), 
                             color = .data[[path_col]]
                             ))+
        geom_point(alpha = 0.5, fill = "#fdc659", size = rel(1))+
        
        #b-spline from Monocle3
        geom_smooth(method = lm, formula = y ~ bs(x), se=FALSE, color = "black") +
        
        theme_bw() + ggtitle(gene_id) 
    
    p
    
} 

# Plot the gene
plot_with_pTime(new.sce,  "Gene13", cell_col = "Cell", 
                pTime_col = "customTime",
                path_col = "Group")

Gene13

My question is, how can I get lineage-specific genes? How can I infer which gene is specific or regulate cells toward a specific path?

My goal is to perform a differential expression at the bifurcating branching point and find path-specific genes. I want to know the ground truth to check how accurately I can predict the differentially expressed genes at a branching point.

By ground truth, I mean,

Which gene goes up or down along time w.r.t. to a lineage? Because in the image above it is kind of hard to judge the expression in path3 and path4. I appreciate any suggestions on how to do this.

@lazappi
Copy link
Collaborator

lazappi commented Nov 23, 2022

Your diagram only has three paths in it but I think what you have simulated with those parameters is this:

graph TD;
    A-->|Path-1|B;
    B-->|Path-2|C;
    B-->|Path-3|D;
    D-->|Path-4|E;
Loading

To get the genes that are simulated as changing in each path you need to look at the DE factors as described in #57. Note that the difference here is that the factors tell you the change between the start and end of each path rather than between a group and some fictional origin cell.

@spriyansh
Copy link
Author

spriyansh commented Nov 23, 2022

Thanks for the information. Here is another topology that I have tried,

graph LR;
    A-->|Path-1|B;
    A-->|Path-2|C;
Loading

Simulation parameters

# Setting parameters
params.groups <- newSplatParams(
    batch.rmEffect = TRUE, # No Batch affect
    batchCells = 1000, # Number of Cells
    nGenes = 2000, # Number of Genes
    seed = 2022 # Set seed
    ) 

# Simulating Path
sim.sce <- splatSimulatePaths(
    params.groups, 
    path.nSteps = 1000,
    out.prob = 0, # No outliers
    group.prob = c(0.4, 0.3, 0.3), #
    de.prob = c(0.3, 0.4, 0.4), 
    de.downProb = c(0.1, 0.2, 0.2),
    de.facLoc = c(0.3, 0.5,0.5),
    de.facScale = c(0.3, 0.5, 0.5),
    path.from = c(0, 1, 1),
    path.skew = c(0.1, 0.2, 0.2),
    path.nonlinearProb = 0,
    path.sigmaFac = 0,
    verbose = FALSE
    )

Output
Different plots

Here in the last plot for gene-1244, there is a clear distinction of paths at time point 1000 (as it is the max allowed steps per path). However, we cannot expect such a clear separation in actual experiments. Is there a way to simulate such an effect? I expect overlapping cells around 900-1100 steps/custom time. I tried changing path.skew, but it didn't help.

@lazappi
Copy link
Collaborator

lazappi commented Nov 24, 2022

I think you may be slightly misunderstanding the path.from parameter. The value you have used path.from = c(0, 1, 1) would give you this topology:

graph LR;
    A-->|Path-1|B;
    B-->|Path-2|C;
    B-->|Path-3|D;
Loading

Which I think matches what you are seeing in your plot. To get this topology:

graph LR;
    A-->|Path-1|B;
    A-->|Path-2|C;
Loading

You would need to use path.from = c(0, 0).

@spriyansh
Copy link
Author

Okay, I got it now. One last thing I wanted to ask was about phenoSimulate(). I don't know if it is worth mentioning on this thread or a new one.

Can we simulate with a specific topology (bifurcating) combined with the phenoSimulate()?

Phenopath can simulate patterns like differential expression, pseudotime dependence, and pseudotime-covariate interactions. Is it possible to simulate such patterns along the steps of a path?

Differential Pattern

I tried using phenoSimulate(), but I think it only lets you assign Phenopath specific parameters. But what if one wants to add genes that are not differentially expressed? Or set path.from? Here's what I have tried,

# Setting parameters
params.groups <- newPhenoParams(
    nCells = 1000, # Number of Cells
    n.de =20,
    n.pst =30,
    n.pst.beta =20,
    n.de.pst.beta =30,
    seed = 2022 # Set seed
    )

# Simulating Path
sim.sce.pheno <- phenoSimulate(
    params.groups, 
    sparsify = TRUE,
    verbose = FALSE
    ) 

@lazappi
Copy link
Collaborator

lazappi commented Jan 2, 2023

I am going to close this now but please comment if you have further questions.

@lazappi lazappi closed this as completed Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants