Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles with estimateExpressionShiftMagnitudes() and estimateDiffCellDensity() functions #21

Open
anastasiiaNG opened this issue Mar 19, 2022 · 10 comments
Labels
bug Something isn't working

Comments

@anastasiiaNG
Copy link

Dear Cacoa team,

Thank you for creating such a perspective tool for analysis single cell RNAseq data.

I currently tried to apply it to our data but faced several issues:

  1. cao$estimateExpressionShiftMagnitudes() fails with the following message:
Filtering data... 
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
  'NA' indices are not (yet?) supported for sparse Matrices 

Could you please specify what exactly is wrong with the input data?

  1. cao$estimateDiffCellDensity(type='permutation', verbose=T) function also fails with the following message:
> cao$estimateCellDensity()
> cao$estimateDiffCellDensity(type='permutation', verbose=T)

Error in smoothSignalOnGraph(., filter = g.filt, graph = graph) : 
  The provided graph is not connected. It has to be connected to estimate l.max.

It starts working when argument smooth is specified to FALSE.

This message looks confusing because

> gg <- cao$data.object@graphs$integrated_nn %>% graph_from_adjacency_matrix()
> igraph::is_connected(gg)
[1] TRUE

I tried to fix it by changing the k.param and prune.SNN arguments to make the graph of the original Seurat object more connected in the following way...

whole.integrated <- FindNeighbors(whole.integrated,
                                  k.param = 100, # instead of default k = 20
                                  prune.SNN = 0, # instead of default prune.SNN = 1/15
                                  )

...but it didn't help.

Respectively, function plotDiffCellDensity() works only with permutation parameter, otherwise it returns the same error.

Could you please clarify how these issues may be fixed? Thank you!

cao object can be downloaded here.

@evanbiederstedt
Copy link
Collaborator

evanbiederstedt commented Mar 19, 2022

Hi @anastasiiaNG

Thanks for the issue. Cacoa is still in development of course. These issues provide useful user feedback.

I currently tried to apply it to our data but faced several issues:
cao object can be downloaded here.

Could you share how this data was created? Please share the code used. If we are able to reproduce this, we'll be able to provide better feedback.

Thanks

@anastasiiaNG
Copy link
Author

Hi, @evanbiederstedt,

This is how the cao object was created from the seurat object whole.integrated:

library(cacoa)
library(Seurat)
library(ggplot2)
library(tidyverse)

whole.integrated <- readRDS("whole.integrated.rds")

whole.integrated <- FindNeighbors(whole.integrated,
                                  k.param = 100, # instead of default k = 20
                                  prune.SNN = 0, # instead of default prune.SNN = 1/15
                                  )

whole.integrated <- FindClusters(whole.integrated, resolution = 3)

sample.groups <- factor(c("condition2", "condition2", "condition1", "condition1", "condition1", "condition2"))
names(sample.groups) <- c("R_1", "R_2", "R_3", "R_4", "R_5", "R_8")
  
cell.groups <- as.character([email protected]$integrated_snn_res.3)
names(cell.groups) <- rownames([email protected])
  
sample.per.cell <- [email protected]$sample
names(sample.per.cell) <- rownames([email protected])
  
ref.level <- "condition1"
target.level <- "condition2"
  
cao <- Cacoa$new(whole.integrated, 
                   sample.groups = sample.groups, 
                   cell.groups = cell.groups, 
                   sample.per.cell=sample.per.cell, 
                   ref.level=ref.level, 
                   target.level=target.level, 
                   graph.name="integrated_nn",
                   embedding = whole.integrated@[email protected])
                                      
bs <- if (length(levels(cao$cell.groups)) > 20) 10 else 16
cao$plot.theme <- theme_bw(base_size = bs) + 
    theme(plot.title=element_text(hjust = 0.5), 
          legend.background=element_rect(fill=alpha("white", 0.2)),
          legend.text=element_text(size=12), 
          legend.margin=margin(6, 6, 4, 1, 'pt'),
          plot.margin=margin())

cao$cell.groups.palette <- levels(cao$cell.groups) %>% 
    {setNames(sample(brewerPalette("Paired")(length(.))), .)}
  
cao$sample.groups.palette <- c("#d73027", "#4575b4") %>% 
    setNames(c(cao$target.level, cao$ref.level))

@ianphelps
Copy link

Hi,

Thank you very much for providing such great insight into case-control analysis in scRNA-seq and for creating an incredibly useful tool.

@evanbiederstedt I want to mention that I'm running into the same errors described by @anastasiiaNG. Also using a seurat object. Any help you can provide is greatly appreciated. I'm happy to provide further details as needed. I generated the coa object similarly to anastasiiaNG.

Thanks!

@VPetukhov
Copy link
Contributor

Hi @anastasiiaNG and @ianphelps ,
I recently released v0.3.0 together with an updated walkthrough. It has a lot of changes and bug fixes, could you please try it? I tested it on the object you provided, and the function calls work fine.
There was just something wrong with the ggplot theme in this object, so I added cao$plot.theme <- theme_bw().

@anastasiiaNG
Copy link
Author

Hi @anastasiiaNG and @ianphelps , I recently released v0.3.0 together with an updated walkthrough. It has a lot of changes and bug fixes, could you please try it? I tested it on the object you provided, and the function calls work fine. There was just something wrong with the ggplot theme in this object, so I added cao$plot.theme <- theme_bw().

Hi, @VPetukhov

Thank you for the updates.

Does the updated walkthrough work correct in your hands on my data?

I've tested it on the cao object that I obtain just like in the #21 (comment) (with cao$plot.theme <- theme_bw() as you recommend).

The updated walkthrough works correct until this step:

sample_meta <- [email protected]

pvals <- cao$estimateMetadataSeparation(sample.meta=sample_meta)$padjust
# Error in x[...] <- m : replacement has length zero

cao$plotSampleDistances(
  space="expression.shifts",
  sample.colors=sample_meta$condition, legend.position=c(0, 1), font.size=2
)
# Error: Must request at least one colour from a hue palette.

Functions cao$plotVolcano, cao$plotOntologyHeatmapCollapsed, cao$plotOntologyHeatmap also fail to plot the results even though cao$estimateDEPerCellType, cao$estimateDEPerCellType, cao$estimateOntology work without errors:

> cao$plotVolcano(xlim=c(-3, 3), ylim=c(0, 3.5), lf.cutoff=1)
Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  : 
  Viewport has zero dimension(s)
In addition: There were 15 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: Removed 3 rows containing missing values (geom_point).
2: Removed 3 rows containing missing values (geom_point).
3: Removed 15 rows containing missing values (geom_point).
4: Removed 21 rows containing missing values (geom_point).
5: Removed 40 rows containing missing values (geom_point).
6: Removed 26 rows containing missing values (geom_point).
7: Removed 135 rows containing missing values (geom_point).
8: Removed 4 rows containing missing values (geom_point).
9: Removed 11 rows containing missing values (geom_point).
10: Removed 5 rows containing missing values (geom_point).
11: Removed 3 rows containing missing values (geom_point).
12: Removed 8 rows containing missing values (geom_point).
13: Removed 1 rows containing missing values (geom_point).
14: Removed 12 rows containing missing values (geom_point).
15: Removed 13 rows containing missing values (geom_point).

> cao$plotOntologyHeatmapCollapsed(
+   name="GSEA", genes="up", n=50, clust.method="ward.D", size.range=c(1, 4)
+ )
Error: Can't rename columns that don't exist.
x Column `core_enrichment` doesn't exist.

> cao$plotOntologyHeatmap(
+   name="GSEA", genes="up", description.regex="extracellular|matrix"
+ )
Error: Can't rename columns that don't exist.
x Column `core_enrichment` doesn't exist.

And starting from "Cluster-free. Compositional changes" none of the functions works on my data:

> cao$estimateCellDensity(method='graph')
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.

> cao$estimateClusterFreeExpressionShifts(
+   gene.selection="expression", min.n.between=3, min.n.within=3
+ )
Error in graph.adjacency.sparse(adjmatrix, mode = mode, weighted = weighted,  : 
  Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.

> cao$estimateClusterFreeDE(
+     n.top.genes=1000, min.expr.frac=0.01, adjust.pvalues=TRUE, smooth=TRUE, 
+     verbose=TRUE
+ )
Estimating cluster-free Z-scores for 1000 most expressed genes
Error in graph.adjacency.sparse(adjmatrix, mode = mode, weighted = weighted,  : 
  Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.

@evanbiederstedt
Copy link
Collaborator

Hi @anastasiiaNG

I'm following your instructions to recreate the data above, using the data downloaded from your link.

Could you please uninstall cacoa and re-install it? i.e. remove.packages('cacoa') and then devtools::install_github("kharchenkolab/cacoa")

RE: 1.

cao$estimateExpressionShiftMagnitudes() fails with the following message:

This works for me now:

> cao$estimateExpressionShiftMagnitudes()
Filtering data... 
done!

Calculating pairwise distances using dist='cor'...

  |=======================================================| 100%, Elapsed 00:10
Done!

RE: 2

This works for me too. Please check again.

> cao$estimateCellDensity()
> cao$estimateDiffCellDensity(type='permutation', verbose=T)
  |=======================================================| 100%, Elapsed 00:00
  |=======================================================| 100%, Elapsed 00:02
>

I tried using your data and running through http://pklab.med.harvard.edu/viktor/cacoa/walkthrough_short.html

It's true there are some errors, which are surely associated with some extra functionality we need to incorporate for Seurat objects. I'll try to help out with this when I get some time.

In the meantime, please try using a Conos object instead of a Seurat object if you could: https://github.com/kharchenkolab/conos

Thanks, Evan

@VPetukhov
Copy link
Contributor

Thanks, @anastasiiaNG ! Right, I also checked other functions and indeed there are errors in them. It's a bit of a pain to debug, as the part with core_enrichment comes from the DOSE, which returns a data frame without this column on your data. I have no idea why, so will need to dig into their specific. I'll fix it in the next couple of weeks.

@VPetukhov VPetukhov added the bug Something isn't working label May 5, 2022
@VPetukhov
Copy link
Contributor

@anastasiiaNG , I released version 0.4.0, where all this should be fixed. Could you please check it?

@anastasiiaNG
Copy link
Author

@anastasiiaNG , I released version 0.4.0, where all this should be fixed. Could you please check it?

@VPetukhov , great, thanks a lot, almost all bugs are fixed now.

The only thing that still produces an error is here (probably due to the absence of factors that significantly affect sample differences):

sample_meta <- [email protected][, c(4,7)]
head(as.data.frame(sample_meta))

pvals <- cao$estimateMetadataSeparation(sample.meta=sample_meta)$padjust
# Error in x[...] <- m : replacement has length zero

I also noticed that in estimating gene programs you put by default z.adj=TRUE and smooth=FALSE, however warning says that Gene programs without smoothing often produce intractable results, especially with z.adj=FALSE. Are z.adj=TRUE and smooth=TRUE the best option here? Or how to decide when these parameters should be set to TRUE?

@Angel-Wei
Copy link

Hi, I ran into an error when I called cao$plotOntologyHeatmapCollapsed function. I got an error

Error in `chr_as_locations()`:
! Can't rename columns that don't exist.Column `qvalues` doesn't exist.

Before I come to this step, everything ran well and functions cao$estimateDEPerCellType and cao$estimateOntology (I used 'GSEA')also worked. Here's some output info I got from rlang::last_error():

> pdf(paste0(rep.name, '/', rep.name,"_plotOntologyHeatmapCollapsed_up_in_MUT.pdf"), height=15, width=10)
> print(cao$plotOntologyHeatmapCollapsed(
+   name="GSEA", genes="up", n=50, clust.method="ward.D", size.range=c(1, 4)) + ggtitle("Up in MUT"))
Error in (function (cond)  : 
  error in evaluating the argument 'x' in selecting a method for function 'print': Can't rename columns that don't exist.
✖ Column `qvalues` doesn't exist.
> rlang::last_error()
<error/vctrs_error_subscript_oob>
Error in `chr_as_locations()`:
! Can't rename columns that don't exist.
✖ Column `qvalues` doesn't exist.
---
Backtrace:
  1. cao$plotOntologyHeatmapCollapsed(...)
  6. dplyr:::rename.data.frame(., geneID = core_enrichment, qvalue = qvalues)
  7. tidyselect::eval_rename(expr(c(...)), .data)
  8. tidyselect:::rename_impl(...)
  9. tidyselect:::eval_select_impl(...)
 18. tidyselect:::vars_select_eval(...)
 19. tidyselect:::walk_data_tree(expr, data_mask, context_mask, error_call)
 20. tidyselect:::eval_c(expr, data_mask, context_mask)
 21. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 22. tidyselect:::walk_data_tree(new, data_mask, context_mask)
 23. tidyselect:::as_indices_sel_impl(...)
 24. tidyselect:::as_indices_impl(x, vars, call = call, strict = strict)
 25. tidyselect:::chr_as_locations(x, vars, call = call)
Run `rlang::last_trace()` to see the full context.

I wonder is there anything I can do to fix this problem to generate GSEA results? Thank you so much for the help in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants