Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chrX support #11

Open
lightning-auriga opened this issue Jan 27, 2021 · 3 comments
Open

chrX support #11

lightning-auriga opened this issue Jan 27, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@lightning-auriga
Copy link
Contributor

Atlas investigators have requested chrX support in the pipeline. This is not too difficult but requires pulling in imputed files generated by someone else. Each downstream tool handles chrX differently, so support needs to be cooked into each individual association pipeline.

@lightning-auriga
Copy link
Contributor Author

for whoever inherits this project: here are the locations of chrX imputations for PLCO as I've been informed by email:

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Oncoarray/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/OmniX/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni25M/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_1000G
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/Omni5/IMPUTATION_TOPMED

/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_1000G/batch1 (batch2,batch3,batch4,batch5)
/DCEG/CGF/Bioinformatics/Production/Shilpa/Projects/PLCO_chrX_Imputation/GSA/IMPUTATION_TOPMED/batch1 (batch2,batch3,batch4,batch5)

@lightning-auriga lightning-auriga added the enhancement New feature or request label Jan 27, 2021
@lightning-auriga
Copy link
Contributor Author

assorted comments:

  • the 1KG variants of these imputations were for other testing purposes and should probably just be ignored for this project
  • the TOPMed chrX imputation was performed using the open MIS variant, which seems to be vs TOPMed v8, whereas the autosomes (in imputation freeze 2) are vs the private server/TOPMed v5b. so they are not synced, and eventually the autosomes need to catch up to X
  • the TOPMed chrX imputation was performed using chip data prepared for the first imputation pass, which is now deprecated. the biggest issues with this are (1) the ancestries were not split out before imputation, so the chips weren't appropriately cleaned; and (2) the batch assignments for freeze 2 were shuffled for GSA/Europeans to make them fit into 4 batches instead of 5, and since this wasn't what happened with chrX you have one additional chrX/GSA/European batch than you do for the autosomes

The above eventually just need to get synchronized, by everything getting reimputed to the public server's TOPMed panel with the better input prep. However, at least for the moment, I think the batch count discrepancy isn't that much of an issue (I think). There may be some step that assumes all chromosomes are present; but in general, the pipeline merely processes whatever is present. So it shouldn't be too hard to force it to use these files as-is.

@shukwong
Copy link
Collaborator

shukwong commented Aug 4, 2021

need to have a .sample file linked with each chromosome, in case the samples are slightly different between chromosome X and the autosomes (which is the case in PLCO)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants