Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold Change Question #91

Open
ddemelfi opened this issue Feb 6, 2025 · 2 comments
Open

Fold Change Question #91

ddemelfi opened this issue Feb 6, 2025 · 2 comments

Comments

@ddemelfi
Copy link

ddemelfi commented Feb 6, 2025

Hello, I am currently using SCPA on data from samples with Sars-CoV2, and there has been a few times other people have asked about the fold change output, but I feel like I still have to ask about what is happening. I reread the function description and even took a look at the code to see if there was something I missed. In a few of our good responders and non responder populations for SCPA (responder being sample1), I ran compare pathways only to see that all inflammatory responses had a negative fold change with about 5 or 6 qval. That doesn't mean they are very enriched, but I am working with a relatively small sample size for now.

This was odd, though, so I investigated the FC calculation in the code. It seemed like it would compare the average expression of genes for the pathway in population 1 and subtract the average expression for population 2. So then, negative values would show that the pathway was upregulated in population 2. But in the code, it shows that the calculation is log FC, so would that imply that if the average expression for any gene in that pathway was less than 1, it would return a negative value for the log fold change and the average would also then be negative. Of course, it depends on what the other population is and I am not trying to imply that it is always incorrect, but I am wondering if, for extremely low gene expression, if this could be a factor that has already been considered, or if it's impossible for this case to appear.

In the output of compare_pathways, it also seems like it is labelled as FC, not log2FC, so I am unsure which the "FC" refers to in the output.

I know we should specifically look at the qval and adjPval, but I also need to know up and down regulated pathways, and it would be odd if some inflammatory immune responses were more enriched in samples that had poor responses to covid.

@jackbibby1
Copy link
Owner

Hi,

Sorry for the delay in getting back to you. Maybe we should be more explicit in the documentation here, but we're assuming that the data used as input into SCPA are log-normalised (though this doesn't necessarily have to be the case -- as long as there's some count-normalisation done). So in generalised format for pathway x, we calculate the log FC as:

sum(
mean(pop_1 gene_1) - mean(pop_2 gene_1)
...
mean(pop_1 gene_k) - mean(pop_2 gene_k)
)

# where gene expression units are log-transformed e.g. log1p in Seurat

So under this definition, any pathway with a negative logFC will mean that it's enriched in pop2. If you're unsure, I would take pathways you're mentioning, and just run the above calculation to see if things align. There's a function pathway_matrices() in SCPA here to generate the pathway matrices if you want to test it out

Hopefully that clears things up

Jack

@ddemelfi
Copy link
Author

Oh, that's a great point, thank you for clearing that up, and I will take a look at the pathway matrices function. I suppose the previous normalization would make sense in that case.

On a semi related note, I did want to ask since that is the case - would that mean that the higher the fold change, the more the enrichment for that population? But the significance is measure by the qval, so for high fold changes but low qvals, that would indicate that it was much more enriched in one population over the other, but it's not a significant difference? I know you mention this is your paper for SCPA and give an example, and I reread the documentation a few times, but now that I have a clearer understanding of the fold change calculation, I'm wondering why this would be the case. Or, how would the other combinations (i.e. high qval, low qval, high FC, low FC) play into this?

Again, I know you mention this in the paper and I don't mean to ask you something you technically already answered, but it would be a great help. I appreciate it so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants