Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore less reliable HMMer domain results (noted with '?') in reconstructing alignment #7

Open
dustine32 opened this issue Sep 17, 2020 · 0 comments

Comments

@dustine32
Copy link
Collaborator

Attempting to graft this sequence:

>Cyanophora_paradoxa_CPAR027107_Apc11
QKTLTILAKDRNYKVEDFKAAGAIAKTRLDQQREPCSCKVAASDAHPCVRRVLFLNLSAA
VGAREPRLGARRAPALRSMKVKIVWHAVASWTWNVDDEACGICRNAYDGCCPDCKTPGDD
CPLWGECRHAFHLHCILKWVNSQQEGKQHCPMCRRDWKFRSSD

...onto the PANTHER 15.0 library, TreeGrafter outputs this error:

ERROR MSF of Cyanophora_paradoxa_CPAR027107_Apc11 should have length 90, actual length is 203

Debugging what's going on, the treeGrafter.pl script appears to be parsing the hmmscan output for the top hit to PTHR11210 incorrectly and this causes the reconstruction of the query sequence alignment in TreeGrafter to not match the alignment length of the PTHR11210 family PIR file:

image
Specifically, the script recognizes that this hit has two domains and uses that count in iterating through start/end alignment values. Unfortunately, a regex for /!/ used in parsing out those start/end values causes the first domain ? to be skipped and the wrong values are used. We'll need to debug further to figure out how to line these parts all up together correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant