-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One nucleotide shift for every SNP Positions #169
Comments
Hi, The issue is although you mentioned "use 0-based start-end coords for .xmfa. Fixes #169", still the output of pasnp 2.1.1 is one nucleotide shift for every SNP Position. I tried to use parsnp 2.1.1 for 6 isolates from NCBI ( SRR863393, SRR863392, SRR1264965, SRR1514990, SRR1586566, and SRR1586589) with the reference sequence of NZ_CP016581.1. I got the VCF file from parsnp.ggr and the positions are like instead of: In addition, I need the exact positions of bases in parsnp.snps.mblocks file which is used to build parsnp.tree file. The VCF file I got showes 189 positions, while parsnp.snps.mblocks includes 120 bases for each isolate. Can you please tell me how I can have access to the exact positions of bases that are used to build parsnp.tree? Thank you so much. |
Hi @DrMahsaRahnama76 , Yes, the fix will apply to the next release of Parsnp (2.1.2). However, this issue unfortunately may be with HarvestTools, which expects the xmfa file to be 0-indexed when it should be 1-indexed. I'll reopen this for now until it is fixed. As far as the mblocks file goes, the reason for the 120 vs 189 difference is that the SNPs used to build the tree are a filtered subset (for example, no indels). There's currently no way to recover the coordinates from the mblocks file alone. Best, |
Could you update parsnp in bioconda from 2.1.1 to 2.1.2, please? |
Hi, do you have a time window for when this issue with HarvestTools may be adressed and fixed? Unfortunately an upgrade to parsnp 2.1.2 did not fix the issue mentioned above. Also, the results we gathered from parsnp2 not only differ from the results gathered with parsnp1, but most of the SNPs also get flagged as homoplastic SNPs instead of unique ones. Maybe this issue is connected? Kind regards, |
@fish2022Jul, Parsnp v2.1.2 is on conda now. Please let me know if you run into any issues. @Kadsae, to clarify, you are seeing off by one errors in v2.1.2 as well? The coordinates in 2.1.2 should be shifted (corrected) relative to other Parsnp 2 releases:
Also, I've create a new issue (#171) regarding your second question. |
Hi, I ran version 2.1.2, but the problem still exists. Can you please tell me which version of Harvesttools you used? You mentioned in your last comment that the positions are only different and ALT are the same, but in my results, I got the wrong ALT according to their wrong positions. For example, it showed 12731 with ALT G instead of 12730 with ALT A. Thank you. |
Hi @bkille, thank you for the fast responses and the eagerness to solve my issue(s). For some more clarification: We have an inhouse database with around 800 bacterial genomes. Command: For v1.6.2 I got 9719 SNPs reported, in comparison to published SNPs at the same positions. During the v2.1.2 run, one partition failed - therefore 50 genomes got excluded from the analysis. So yes, it seems the nucleotide shift is still there. Best, |
Hi @Kadsae, This is very strange. Other than some error reporting improvements, the only change between v2.1.1 and v2.1.2 is the use of the With respect to the nucleotide shift issue, I created separate conda envs for v1.6.2, v2.1.1, and v2.1.2 and ran on example data. The coordinates of v2.1.2 match v1.6.2 and are indeed -1 relative to v2.1.1.
Are you using the version from Bioconda, or a locally built version? My only guess is that if you are using a locally built Parsnp install, the Please feel free to email me if you'd prefer to discuss more offline. |
Hi @bkille, I did indeed use a locally built version for my last tests. This time I created new conda envs and used the bioconda versions of parsnp v1.6.2 and parsnp 2.1.2 respectively. Using the same dataset as last time (800 bacterial genomes), I got the same results as last time. Surprisingly, I did not run into a partition failure this time. I also did another test and tried to experiment with the parameter To my surprise, the results from v1.6.2 and v2.1.2 without partitioning match! The run with partitioning again showed SNPs off by +1 to the other runs. So it seems that the error with the nucleotide shifts has something to do with the partitioning of parsnp2 for bigger datasets. Another interesting aspect is that v1.6.2 reported 2703 SNPs, v2.1.2 w/o partitioning 2786 SNPs and v2.1.2 w/ partitioning 2211 SNPs. You mentioned in #171 your findings showed that parsnp2 should find more SNPs than parsnp1 and without the partitioning rule this statement seems to be correct. Lastly, I also checked the status of the SNPs across all three runs and for v1.6.2 and v2.1.2 w/o partitioning close to all of the SNPs are unique, while the run of v2.1.2 w/ partitioning again shows the problem reported in #171 (many homoplastic SNPs). Summarized, I think both problems, this one and #171, are connected and have something to do with the partitioning-flag. I hope I could help a little. If you want more info or discuss more offline feel free to email me as well. All the best, |
Hello,
I ran parsnp version 2.1.1 and 2.0.6, but both of them found wrong locations for SNP positions. For example, it should be 12730 but these versions found 12731. I made the VCF file from .ggr file by this code: harvesttools -i parsnp.ggr -V parsnp.vcf.
I, also, ran parsnp version 1.5.4 and 1.5.3, but although they showed the correct posiotions, they did not make the .mblocks file and I need the SNP matrix for my project.
I appriciate your help.
Best,
Mahsa
The text was updated successfully, but these errors were encountered: