Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hg38 support #3

Open
pdiakumis opened this issue Feb 22, 2022 · 8 comments
Open

hg38 support #3

pdiakumis opened this issue Feb 22, 2022 · 8 comments

Comments

@pdiakumis
Copy link

Hi @luannnguyen!
Is hg38 supported? We've noticed a couple hg19-specific code chunks in featureExtractor, and (I believe?) those HMF/PCAWG training samples were hg19. Or is hg38 on the roadmap?
Cheers - Peter

@luannnguyen
Copy link
Collaborator

Hi Peter,

The training data was indeed hg19. In short, hg38 is not on the roadmap.

The regional mutation density features (RMD; no. of SNVs per 1mb bin across the genome) are the only genomic position sensitive features but these are by far the most important, but I'm not sure how much these 1mb bins will differ with hg38. Ideally CUPLR would be retrained on hg38 data. However i don't know whether Hartwig Medical Foundation have rerun any/all of their samples with hg38. Also, the PCAWG samples would need to be rerun with the Hartwig pipeline with hg38 (we did this but with hg19; took us 6 months!). So overall retraining isn't possible in the short term.

An alternative is to lift over the hg19 1mb bin coordinates to hg38 coords, count the no. of SNVs for the RMD features with the hg38 coords, then run the existing model. Let me know if you'd like this as an option and i can add this to featureExtractor. I don't know how this will impact the accuracy of the model, so this will need to be tested with a bunch of hg38 samples with known cancer type.

Luan

@ohofmann
Copy link

We do have fairly comprehensive test data (which was used on CUPPA before). Could be worth a shot but also need to think about priorities here and where to slot this in. Thanks for the feedback!

@luannnguyen
Copy link
Collaborator

Ok i can add hg38 support to featureExtractor in case you'd like to test CUPLR on hg38 data. Will let you know when this is ready :)

@luannnguyen
Copy link
Collaborator

luannnguyen commented Mar 2, 2022

Hi Oliver and Peter, I've now added hg38 support to featureExtractor. You should now be able to set the genome to hg38 with setGenome('hg38') before running extractFeaturesCuplr()

@ohofmann
Copy link

ohofmann commented Mar 2, 2022

Thank you!

@boutrys
Copy link

boutrys commented May 29, 2023

Hello all,

Did you tested on hg38 @ohofmann ?

Is there any limitations to run it that way even tough the training was perform on hg19 @luannnguyen ?

Thanks in advance :)

Impressive work by the way

@ohofmann
Copy link

ohofmann commented Jun 6, 2023

@boutrys We went with a slightly different path in the end, wrapping CUPPA (which now has hg38 support) into a Nextflow pipeline and have started testing that.

@AqsaAlam
Copy link

AqsaAlam commented Jul 4, 2023

Hello!

Great work on CUPLR!

I am running setGenome('hg38') but get the following error:

Error in assign("BSGENOME", BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38, : cannot change value of locked binding for 'BSGENOME'

In fact, I get this error when doing setGenome('hg19') as well.

Have you come across this before? I have BSgenome installed as well as both the hg19 and hg38 packages.

EDIT: I believe I've isolated the issue to the following line in the function setGenome in genomeUtils.R producing a locked environment:

pkg_env <- parent.env(environment())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants