Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How refmap is generated? #10

Open
michael-kotliar opened this issue Apr 9, 2018 · 2 comments
Open

How refmap is generated? #10

michael-kotliar opened this issue Apr 9, 2018 · 2 comments

Comments

@michael-kotliar
Copy link

michael-kotliar commented Apr 9, 2018

Hi!
Could you please explain how the refmap file is generated.
I tested insilico genome creation for 129S1_SvImJ with vcf file mgp.v5.merged.indels.dbSNP142.normed.vcf.gz, that includes only two indels for this specific strain:

3000019    G -> GA
3001236    A -> ATTTTT

For the rest of positions in this vcf file 129S1_SvImJ strain includes ./.:.:.:.:.:.:.:.:.:.:.:.:.:.. So, if I understood correctly, these positions should be skipped.
As reference genome I used chr1.fa from mm10 (includes only chr1) with >chr1 changed to >1 (file is not attached).

Command:

java -jar /opt/mea/bin/alea.jar insilico --input-fasta=../reference_genome/chr1.fa --input-vcf=../inputs/mgp.v5.merged.indels.dbSNP142.normed.vcf --strain=129S1_SvImJ --output-fasta=output.fa

The output I got output.fa.refmap.gz

>1
0 0 3000019
3000019 3000021 78
3000109 3000100 149
3000258 3000250 978
3001236 3001234 192470735

How did we calculate 3000109 3000100 149 and 3000258 3000250 978 lines?

I assume the correct output should be

>1
0       0       3000019
3000019 3000021 1217
3001236 3001244 192470735

Thanks!

@michael-kotliar
Copy link
Author

I thinks the problem is caused by differences between VCFv4.1 and VCFv4.2
For VCFv4.1 all strains that should be skipped are marked as . , so the following code works correct

if (genotypeString.equals(".")) { 
   continue; // skip. no variant for this sample
}

But for VCFv4.2 all the records, that should be skipped, are marked differently ./.:.:.:.:.:.:.:.:.:.:.:.:.:.

@abogutz
Copy link
Collaborator

abogutz commented Apr 10, 2018

Hi Michael, thanks for all your comments! Unfortunately, MEA was designed with VCFv4.1 in mind, and the changes in VCFv4.2 do make some of the code wonky. For now we'll ask that users stick with VCFv4.1, but we'll keep an eye on making everything compatible with both. The user guide will be updated to match this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants