chrX support #11

lightning-auriga · 2021-01-27T13:40:58Z

Atlas investigators have requested chrX support in the pipeline. This is not too difficult but requires pulling in imputed files generated by someone else. Each downstream tool handles chrX differently, so support needs to be cooked into each individual association pipeline.

lightning-auriga · 2021-01-27T13:42:38Z

for whoever inherits this project: here are the locations of chrX imputations for PLCO as I've been informed by email:

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/OmniX/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_1000G/batch1 (batch2,batch3,batch4,batch5)
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_TOPMED/batch1 (batch2,batch3,batch4,batch5)

lightning-auriga · 2021-01-27T13:47:27Z

assorted comments:

the 1KG variants of these imputations were for other testing purposes and should probably just be ignored for this project
the TOPMed chrX imputation was performed using the open MIS variant, which seems to be vs TOPMed v8, whereas the autosomes (in imputation freeze 2) are vs the private server/TOPMed v5b. so they are not synced, and eventually the autosomes need to catch up to X
the TOPMed chrX imputation was performed using chip data prepared for the first imputation pass, which is now deprecated. the biggest issues with this are (1) the ancestries were not split out before imputation, so the chips weren't appropriately cleaned; and (2) the batch assignments for freeze 2 were shuffled for GSA/Europeans to make them fit into 4 batches instead of 5, and since this wasn't what happened with chrX you have one additional chrX/GSA/European batch than you do for the autosomes

The above eventually just need to get synchronized, by everything getting reimputed to the public server's TOPMed panel with the better input prep. However, at least for the moment, I think the batch count discrepancy isn't that much of an issue (I think). There may be some step that assumes all chromosomes are present; but in general, the pipeline merely processes whatever is present. So it shouldn't be too hard to force it to use these files as-is.

shukwong · 2021-08-04T01:16:41Z

need to have a .sample file linked with each chromosome, in case the samples are slightly different between chromosome X and the autosomes (which is the case in PLCO)

lightning-auriga added the enhancement New feature or request label Jan 27, 2021

shukwong mentioned this issue Aug 6, 2021

chrX par region added #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chrX support #11

chrX support #11

lightning-auriga commented Jan 27, 2021

lightning-auriga commented Jan 27, 2021

lightning-auriga commented Jan 27, 2021

shukwong commented Aug 4, 2021

chrX support #11

chrX support #11

Comments

lightning-auriga commented Jan 27, 2021

lightning-auriga commented Jan 27, 2021

lightning-auriga commented Jan 27, 2021

shukwong commented Aug 4, 2021