Skip to content

Formats

Lydia Buntrock edited this page Nov 3, 2020 · 9 revisions

Annotation I/O

VCF & BCF GFF GFF2 & GTF* GFF3
Specific Header x no header no header
CHROM x seqname Reference sequence
source x x
method x type
POS x start & end start & end start & end
ID x seqid
ALT x
QUAL x score score score
strand x x x
frame x phase phase
group x
FILTER x
INFO**** x attribute attributes****
FORMAT***** (optional) x
SAMPLES (optional) x

*The GTF is identical to GFF version 2.

**Gene, Variation, Similarity

***INFO: Arbitrary keys are permitted, although some sub-fields are reserved (albeit optional).

****attributes: ID, Name, Alias, Parent, Target, Gap, Derives_from, Note, Dbxref, Ontology_term

*****FORMAT Tags: AD, ADF, ADR, DP, EC, FT, GL, GP , GQ, GT, HQ, MQ, PL, PQ, PS

Input / Output:

Formats:

VCF (Variant Call Format), BCF, GFF (General Feature Format), GFF2 (deprecated), GTF (General Transfer Format), GFF3, BED (Browser Extensible Data), GVF

HGVS vs BED vs GVF vs VCF Format example: https://www.ncbi.nlm.nih.gov/variation/tools/reporter/docs/examples#section-1.2.3

Format Overview

Format specification:

Desired format conversions

VCF to GFF: http://seqanswers.com/forums/showthread.php?t=9796&highlight=gff+vcf%3C/a

Clone this wiki locally