Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sam2conseq script will report "N" instead of majority consensus #17

Open
ewong347 opened this issue Apr 28, 2020 · 5 comments
Open

Sam2conseq script will report "N" instead of majority consensus #17

ewong347 opened this issue Apr 28, 2020 · 5 comments

Comments

@ewong347
Copy link
Collaborator

Expected behavior: sam2conseq should choose NT with highest frequency if the minor NT is only observed once.
Observed behavior: sam2conseq reports N instead of taking the major Nucleotide, even if the ratio is 350: 1 (position 7676)
WA11-UW7 (SRR11278092), aligned sequence, depth freqs found in langley/covid/problematic
pos 224 image

Freqs table:

pos A C G T N - ins
224 0 0 1 15 0 0 {}
237 0 1 19 0 0 0 {}
441 0 0 40 1 0 0 {}
444 0 0 0 49 0 1 {}
447 0 1 0 49 0 0 {}
458 0 0 46 1 0 0 {}
7676 350 0 1 0 0 0 {}
@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 28, 2020

Thanks for reporting this @ewong347 - however, I can't reproduce this issue.
Here is what I am seeing:
image
image

@ewong347
Copy link
Collaborator Author

ewong347 commented Apr 28, 2020

Weird- I just tried to reproduce this issue as well with no luck. When I re-ran the pipeline on the sequence, the new alignment also included more NT's on the 5' end of the sequence. Might just have been a hiccup on my end.

I think we can close this issue for now. I'll keep an eye out to see if I notice anything like this for the final couple sequences

@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 28, 2020

Well hiccups are not OK!
Can you please post your workflow so I can try to repro?

@ewong347
Copy link
Collaborator Author

I used the workflow on the main page for sam2conseq:

fasterq-dump SRRxxxx

cutadapt -q 20,20 -a CTGTCTCTTATACACATCT -o SRRxxxx.trim.fastq SRRxxxx.fastq

bowtie2 -x NC_045512 -U SRRxxxx.trim.fastq -S SRRxxxx.sam --local

python3 sam2conseq.py --unpaired SRRxxxx.sam freqs.csv SRRxxxx.conseq.txt

@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 28, 2020

Nothing unusual there. @ewong347 can you please re-run the last step a few times to see if you can reproduce this issue? Also please check your command history to see if there was something different in that run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants