Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to pseudobulk and sanity check of high major copy number #194

Open
Nikki-Burdett opened this issue Jul 1, 2024 · 6 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@Nikki-Burdett
Copy link

Thanks very much for this great tool, just hoping to clarify a couple of points:

  1. We have run this with default parameters and not (yet) using a custom reference. I would like to compare to our bulk WGS copy number, but the [segs_consensus_{i}.tsv.gz] file does not have a major copy number (MCN). Does numbat calculate a consensus pseudobulk MCN, or else is there another way to reasonably compare to bulk data aside from comparing copy number plots (amp/del) visually?

  2. I note in the per cell (allele or joint) files some cells have what seems an outrageous MCN (eg. up to 1000). I note that it negatively correlates with Z, which I understand is the total log likelihood of all states, so it seems reasonable to think that a higher Z translates to a more reliable (generally lower) MCN. I can largely excludes these by filtering on the Z to exclude the very high and seemingly inaccurate results, but is it surprising/unusual/concerning that we have seen numbers this high?

@teng-gao
Copy link
Collaborator

teng-gao commented Jul 1, 2024

Hi,

segs_consensus file and joint_post files (documentation here) only report CNV states (del, loh, amp, etc) and not the absolute number of CN. So if you would like to compare with WGS you can use cnv_state column instead. The Z column in joint_post isn't statistical evidence if that's what you're looking for (explanations are in the doc page above). You can use LLR or p_cnv instead

@Nikki-Burdett
Copy link
Author

Thanks very much for the reply. RE: #2, my main question is whether you are surprised by seeing such high MCN results, and whether there is a way to reliably exclude/filter (?spurious) results. The LLR doesn't correlate with MCN so it does not help to create a filter.

@teng-gao
Copy link
Collaborator

teng-gao commented Jul 1, 2024

Hi which column did you think is MCN? We don’t report MCN in joint_post ..

@Nikki-Burdett
Copy link
Author

major: integer; Major allele count - am I interpreting this wrong?

@teng-gao
Copy link
Collaborator

teng-gao commented Jul 1, 2024

Ah, it's the total SNP pileup counts deriving from the major haplotype in the CNV region for the given cell .. I'll update this description to be more precise

@teng-gao teng-gao added tofix documentation Improvements or additions to documentation and removed tofix labels Jul 1, 2024
@Nikki-Burdett
Copy link
Author

Ah okay. well yes then that explains the very high numbers! Thanks for clarifying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants