BaSeRPro - Batch Sequence Retrieval and Processing for GenBank

Application Preview

Features

Portable Application (Windows / Mac / Linux)
Retrieves GenBank sequences from following arguments (comma seperated OR in multiple fields):
- Multiple species / organism (at least one argument must be supplied)
- Strain / Isolate / Title / Wildcard (Any)
- Gene / Product / Locus tag (at least one argument must be supplied)
- Sequence length / range
- Country
- Title of record (according to GenBank .GBFF format)
- Three RefSeq modes: All sequences / RefSeq only / RefSeq priority (preferentially extracts RefSeq sequences first. If number of target records is not achieved, continue retrieval with non-Refseq sequences)
- Four extraction modes:
  - Single gene
  - Gene range (Extracts segment of sequence from one gene to another [Requires input of two gene/product/locus tag arguments])
  - Gene order (Extracts segment of sequence when a specific order of genes is met [Requires input of two or moregene/product/locus tag arguments])
  - Entire sequence
Progress bar for fetching GenBank sequences
Automatic retries sequence retrieval to obtain desired number of target sequences if duplicate accession number is found (e.g. NZ_ vs. non-NZ_)
Automatically generates .xlsx file containing annotations/summary of extracted GenBank records
Automatically aligns sequences using FAMSA alignment
Automatically trims gaps in sequence using gap threshold (0-1)
- Example: If gappyness threshold is 0.7, at least 70 % of sequences must contain a gap in a position for EACH species for it to be trimmed
- Default: 0.95
Automatically generates consensus sequence based on trimmed alignment using consensus threhsold (0-1)
- Example: If consensus threhsold is 0.2, at least 20% of equences must contain the same base in a position for it to be incorporated into the consensus sequence
- Default: 0.1
- Full support for ambiguous bases (both input and output)
- Automatic detection of DNA, RNA and amino acids
- Base frequencies of degenerate bases are shown in .xlsx summary file
Automatically generates conservation plot based on consensus sequence (only positions with degenerate bases from consensus sequence are shown)
Automatically retries search using provided email only if API key is invalid
Advanced features
- Expand extracted sequence range by number of base pairs upstream/downstream (used for checking gene boundaries)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BaSeRPro - Batch Sequence Retrieval and Processing for GenBank

Application Preview

Features

Files

README.md

Latest commit

History

README.md

File metadata and controls

BaSeRPro - Batch Sequence Retrieval and Processing for GenBank

Application Preview

Features