Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequence_locator: internal use case #37

Open
ArtPoon opened this issue Mar 18, 2021 · 0 comments
Open

sequence_locator: internal use case #37

ArtPoon opened this issue Mar 18, 2021 · 0 comments

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Mar 18, 2021

This is a Python script I used to batch process a FASTA file within Python:

from poplars.sequence_locator import *
from poplars.common import *
import sys

fasta = convert_fasta(open(sys.argv[1]))

virus = 'hiv'
base = 'NA'

configs = handle_args(virus, base)
ref_nt_seq, ref_aa_seq = configs[0][0][1], configs[1]
nt_coords = configs[2]
reference_sequence = configs[3]
nt_coords_handle = open(nt_coords, 'r')

ref_genome = Genome(virus, nt_coords_handle, ref_nt_seq, ref_aa_seq,
                    reference_sequence, base)

for h, s in fasta:
    query_seq = get_query(base, s, False)
    query = Query(base, ref_genome, query_sequence=query_seq)
    left, right = query.qcoords
    sys.stdout.write('{}\t{}\t{}\n'.format(h, left, right))

Some of this is unnecessarily complicated, such as setting up the Genome object. Ideally the workflow would look more like this:

from poplar import sequence_locator as locator

handle = open(sys.argv[1])
for h, s in convert_fasta(handle):
    result = locator(s, base='NT', virus='hiv')
    sys.stdout.write('{}\t{}\t{}\n'.format(h, result.left, result.right))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant