Comparison to pseudobulk and sanity check of high major copy number #194

Nikki-Burdett · 2024-07-01T07:57:18Z

Thanks very much for this great tool, just hoping to clarify a couple of points:

We have run this with default parameters and not (yet) using a custom reference. I would like to compare to our bulk WGS copy number, but the [segs_consensus_{i}.tsv.gz] file does not have a major copy number (MCN). Does numbat calculate a consensus pseudobulk MCN, or else is there another way to reasonably compare to bulk data aside from comparing copy number plots (amp/del) visually?
I note in the per cell (allele or joint) files some cells have what seems an outrageous MCN (eg. up to 1000). I note that it negatively correlates with Z, which I understand is the total log likelihood of all states, so it seems reasonable to think that a higher Z translates to a more reliable (generally lower) MCN. I can largely excludes these by filtering on the Z to exclude the very high and seemingly inaccurate results, but is it surprising/unusual/concerning that we have seen numbers this high?

teng-gao · 2024-07-01T18:46:15Z

Hi,

segs_consensus file and joint_post files (documentation here) only report CNV states (del, loh, amp, etc) and not the absolute number of CN. So if you would like to compare with WGS you can use cnv_state column instead. The Z column in joint_post isn't statistical evidence if that's what you're looking for (explanations are in the doc page above). You can use LLR or p_cnv instead

Nikki-Burdett · 2024-07-01T20:50:21Z

Thanks very much for the reply. RE: #2, my main question is whether you are surprised by seeing such high MCN results, and whether there is a way to reliably exclude/filter (?spurious) results. The LLR doesn't correlate with MCN so it does not help to create a filter.

teng-gao · 2024-07-01T21:36:55Z

Hi which column did you think is MCN? We don’t report MCN in joint_post ..

Nikki-Burdett · 2024-07-01T22:24:41Z

major: integer; Major allele count - am I interpreting this wrong?

teng-gao · 2024-07-01T23:57:59Z

Ah, it's the total SNP pileup counts deriving from the major haplotype in the CNV region for the given cell .. I'll update this description to be more precise

Nikki-Burdett · 2024-07-02T01:15:10Z

Ah okay. well yes then that explains the very high numbers! Thanks for clarifying

teng-gao added tofix documentation Improvements or additions to documentation and removed tofix labels Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison to pseudobulk and sanity check of high major copy number #194

Comparison to pseudobulk and sanity check of high major copy number #194

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 2, 2024

Comparison to pseudobulk and sanity check of high major copy number #194

Comparison to pseudobulk and sanity check of high major copy number #194

Comments

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 1, 2024

teng-gao commented Jul 1, 2024

Nikki-Burdett commented Jul 2, 2024