Skip to content

sequence_locator: internal use case #37

@ArtPoon

Description

@ArtPoon

This is a Python script I used to batch process a FASTA file within Python:

from poplars.sequence_locator import *
from poplars.common import *
import sys

fasta = convert_fasta(open(sys.argv[1]))

virus = 'hiv'
base = 'NA'

configs = handle_args(virus, base)
ref_nt_seq, ref_aa_seq = configs[0][0][1], configs[1]
nt_coords = configs[2]
reference_sequence = configs[3]
nt_coords_handle = open(nt_coords, 'r')

ref_genome = Genome(virus, nt_coords_handle, ref_nt_seq, ref_aa_seq,
                    reference_sequence, base)

for h, s in fasta:
    query_seq = get_query(base, s, False)
    query = Query(base, ref_genome, query_sequence=query_seq)
    left, right = query.qcoords
    sys.stdout.write('{}\t{}\t{}\n'.format(h, left, right))

Some of this is unnecessarily complicated, such as setting up the Genome object. Ideally the workflow would look more like this:

from poplar import sequence_locator as locator

handle = open(sys.argv[1])
for h, s in convert_fasta(handle):
    result = locator(s, base='NT', virus='hiv')
    sys.stdout.write('{}\t{}\t{}\n'.format(h, result.left, result.right))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions