diff --git a/bcftools-man.html b/bcftools-man.html index f17e00be8..dca2ffbd5 100644 --- a/bcftools-man.html +++ b/bcftools-man.html @@ -50,7 +50,7 @@

DESCRIPTION

VERSION

-

This manual page was last updated 2023-05-30 09:18 BST and refers to bcftools git version 1.17-50-ga8249495+.

+

This manual page was last updated 2024-04-29 08:11 BST and refers to bcftools git version 1.20-6-g5977f1f3+.

@@ -426,9 +426,12 @@

Common Options

Use multithreading with INT worker threads. The option is currently used only for the compression of the output stream, only when --output-type is b or z. Default: 0.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output files. Can be used only for compressed BCF and VCF output.

+

Automatically index the output files. FMT is optional and can be +one of "tbi" or "csi" depending on output file format. Defaults to +CSI unless specified otherwise. Can be used only for compressed +BCF and VCF output.

@@ -487,7 +490,7 @@

bcftools annotate [OPTIONS] FILE

Comma-separated list of columns or tags to carry over from the annotation file (see also -a, --annotations). If the annotation file is not a VCF/BCF, list describes the columns of the annotation file and must include CHROM, -POS (or, alternatively, FROM and TO), and optionally REF and ALT. Unused +POS (or, alternatively, FROM,TO or BEG,END), and optionally REF and ALT. Unused columns which should be ignored can be indicated by "-".  
 
@@ -511,16 +514,50 @@

bcftools annotate [OPTIONS] FILE

To append to existing values (rather than replacing or leaving untouched), use "=TAG" (instead of "TAG" or "+TAG"). To replace only existing values without modifying missing annotations, use "-TAG". +As a special case of this, if position needs to be replaced, mark the column with the new coordinate as "-POS". +(Note that in previous releases this used to be "~POS", now deprecated.) + 

To match the record also by ID or INFO/END, in addition to REF and ALT, use "~ID" or "~INFO/END". -If position needs to be replaced, mark the column with the new position as "~POS". +Note that this works only for ID and POS, for other fields see the description of -i below.  
 
If the annotation file is not a VCF/BCF, all new annotations must be defined via -h, --header-lines.  
 
-See also the -l, --merge-logic option.

+See also the -l, --merge-logic option. + 

+Summary of -c, --columns:

+ + +
+
+
    CHROM,POS,TAG       .. match by chromosome and position, transfer annotation from TAG
+    CHROM,POS,-,TAG     .. same as above, but ignore the third column of the annotation file
+    CHROM,BEG,END,TAG   .. match by region (BEG,END are synonymous to FROM,TO)
+    CHROM,POS,REF,ALT   .. match by CHROM, POS, REF and ALT
+
+    DST_TAG:=SRC_TAG    .. transfer the SRC_TAG using the new name DST_TAG
+    INFO                .. transfer all INFO annotations
+    ^INFO/TAG           .. transfer all INFO annotations except "TAG"
+
+    TAG       .. add or overwrite existing target value if source is not "." and skip otherwise
+    +TAG      .. add or overwrite existing target value only it is "."
+    .TAG      .. add or overwrite existing target value even if source is "."
+    .+TAG     .. add new but never overwrite existing tag, regardless of its value; can transfer "." if target does not exist
+    -TAG      .. overwrite existing value, never add new if target does not exist
+    =TAG      .. do not overwrite but append value to existing tags
+
+    ~FIELD    .. use this column to match lines with -i/-e expression (see the description of -i below)
+    ~ID       .. in addition to CHROM,POS,REF,ALT match by also ID
+    ~INFO/END .. in addition to CHROM,POS,REF,ALT match by also INFO/END
+
+
+
+
-C, --columns-file file

Read the list of columns from a file (normally given via the -c, --columns option). @@ -532,7 +569,7 @@

bcftools annotate [OPTIONS] FILE

-e, --exclude EXPRESSION

exclude sites for which EXPRESSION is true. For valid expressions see -EXPRESSIONS.

+EXPRESSIONS and the extension described in -i, --include below.

--force
@@ -573,8 +610,27 @@

bcftools annotate [OPTIONS] FILE

-i, --include EXPRESSION

include only sites for which EXPRESSION is true. For valid expressions see -EXPRESSIONS.

+EXPRESSIONS. + 

+Additionally, the command bcftools annotate supports expressions updated from the annotation +file dynamically for each record:

+
+
+
+
+
    # The field 'STR' from the -a file is required to match INFO/TAG in VCF. In the first example
+    # the alleles REF,ALT must match, in the second example they are ignored. The option -k is required
+    # to output also records that are not annotated. The third example shows the same concept with
+    # a numerical expression.
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k input.vcf
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,-,-,SCORE,~STR     -i'TAG={STR}' -k input.vcf
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,-,-,SCORE,~INT     -i'TAG>{INT}' -k input.vcf
+
+
+
+
-k, --keep-sites

keep sites which do not pass -i and -e expressions instead of discarding them

@@ -681,9 +737,10 @@

bcftools annotate [OPTIONS] FILE

"^INFO/FOO,INFO/BAR" (and similarly for FORMAT and FILTER). "INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -720,7 +777,7 @@

bcftools annotate [OPTIONS] FILE

# that INFO/END is already present in the VCF header. bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf - # For more examples see http://samtools.github.io/bcftools/howtos/annotate.html + # For (many) more examples see http://samtools.github.io/bcftools/howtos/annotate.html @@ -814,9 +871,10 @@

File format options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -830,6 +888,10 @@

Input/output options:

output all alternate alleles present in the alignments even if they do not appear in any of the genotypes

+
-*, --keep-unseen-allele
+
+

keep the unobserved allele <*> or <NON_REF>, useful mainly for gVCF output

+
-f, --format-fields list

comma-separated list of FORMAT fields to output for each sample. Currently @@ -866,7 +928,7 @@

Input/output options:

-G, --group-samples FILE|-
-

by default, all samples are assumed to come from a single population. This option allows to group samples +

by default, all samples are assumed to come from a single population. This option groups samples into populations and apply the HWE assumption within but not across the populations. FILE is a tab-delimited text file with sample names in the first column and group names in the second column. If - is given instead, no HWE assumption is made at all and single-sample calling is performed. (Note that @@ -1182,9 +1244,10 @@

bcftools concat [OPTIONS] FILE1 FILE2

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -1306,6 +1369,11 @@

bcftools consensus [OPTIONS] FILE

write output to a file

+
--regions-overlap 0|1|2
+
+

how to treat VCF variants overlapping the target region in the fasta file: +see Common Options

+
-s, --samples LIST

apply variants of the listed samples. See also the option -I, --iupac-codes

@@ -1401,9 +1469,10 @@

VCF input options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -1740,6 +1809,10 @@

bcftools csq [OPTIONS] FILE

if more are required, see the --ncsq option.

+

Note that the program annotates only records with a functional consequence and +intergenic regions will pass through unchanged.

+
+

The program requires on input a VCF/BCF file, the reference genome in fasta format (--fasta-ref) and genomic features in the GFF3 format downloadable from the Ensembl website (--gff-annot), and outputs an annotated VCF/BCF @@ -1789,7 +1862,7 @@

bcftools csq [OPTIONS] FILE

--force
-

run even if some sanity checks fail. Currently the option allows to skip +

run even if some sanity checks fail. Currently the option enables skipping transcripts in malformatted GFFs with incorrect phase

-g, --gff-annot FILE
@@ -1946,9 +2019,10 @@

bcftools csq [OPTIONS] FILE

and VCF, such as "chrX" vs "X". The chromosome names in the output VCF will match that of the input VCF. The default is to attempt the automatic translation.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2141,7 +2215,7 @@

bcftools filter [OPTIONS] FILE

-s, --soft-filter STRING|+

annotate FILTER column with STRING or, with +, a unique filter name generated -by the program ("Filter%d").

+by the program ("Filter%d"). Applies to records that do not meet filter expression.

-S, --set-GTs .|0
@@ -2163,9 +2237,10 @@

bcftools filter [OPTIONS] FILE

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2178,6 +2253,11 @@

bcftools gtcheck [OPTIONS] [-g ge is checked against the samples in the -g file. Without the -g option, multi-sample cross-check of samples in query.vcf.gz is performed.

+
+

Note that the interpretation of the discordance score depends on the options provided (specifically -e and +-u) and on the available annotations (FORMAT/PL vs FORMAT/GT). +The discordance score can be interpreted as the number of mismatching genotypes if only GT-vs-GT matching is performed.

+
--distinctive-sites NUM[,MEM[,DIR]]
@@ -2191,16 +2271,29 @@

bcftools gtcheck [OPTIONS] [-g ge

Stop after first record to estimate required time.

-
-e, --error-probability INT
+
-e, --exclude [qry|gt]:'EXPRESSION'
+
+

Exclude sites from query file (qry:) or genotype file (gt:) for which EXPRESSION is true. +For valid expressions see EXPRESSIONS.

+
+
-E, --error-probability INT

Interpret genotypes and genotype likelihoods probabilistically. The value of INT represents genotype quality when GT tag is used (e.g. Q=30 represents one error in 1,000 genotypes and Q=40 one error in 10,000 genotypes) and is ignored when PL tag is used (in that case an arbitrary -non-zero integer can be provided). See also the -u, --use option below. If set to 0, -the discordance equals to the number of mismatching genotypes when GT vs GT is compared. -Note that the values with and without -e are not comparable, only values generated -with -e 0 correspond to mismatching genotypes. -If performance is an issue, set to 0 for faster run but less accurate results.

+non-zero integer can be provided). + 

+If -E is set to 0, the discordance score can be interpreted as the number of mismatching genotypes, +but only in the GT-vs-GT matching mode. See the -u, --use option below for additional notes and caveats. + 

+If performance is an issue, set -E 0 for faster run times but less accurate results. + 

+Note that in previous versions of bcftools (⇐1.18), this option used to be a smaller case -e. It +changed to make room for the filtering option -e, --exclude to stay consistent across other +commands.

-g, --genotypes FILE
@@ -2210,6 +2303,11 @@

bcftools gtcheck [OPTIONS] [-g ge

Homozygous genotypes only, useful with low coverage data (requires -g, --genotypes)

+
-i, --include [qry|gt]:'EXPRESSION'
+
+

Include sites from query file (qry:) or genotype file (gt:) for which EXPRESSION is true. +For valid expressions see EXPRESSIONS.

+
--n-matches INT

Print only top INT matches for each sample, 0 for unlimited. Use negative value @@ -2221,6 +2319,14 @@

bcftools gtcheck [OPTIONS] [-g ge

Disable calculation of HWE probability to reduce memory requirements with comparisons between very large number of sample pairs.

+
-o, --output FILE
+
+

Write to FILE rather than to standard output, where it is written by default.

+
+
-O, --output-type t|z
+
+

Write a plain (t) or compressed (z) text tab-delimited output.

+
-p, --pairs LIST

A comma-separated list of sample pairs to compare. When the -g option is given, the first @@ -2274,8 +2380,13 @@

bcftools gtcheck [OPTIONS] [-g ge
-u, --use TAG1[,TAG2]

specifies which tag to use in the query file (TAG1) and the -g (TAG2) file. -By default, the PL tag is used in the query file and GT in the -g file when -available.

+By default, the PL tag is used in the query file and, when available, the GT tags in the +-g file. + 

+Note that when the requested tag is not available, the program will attempt to use +the other tag. The output includes the number of sites that were matched by the four +possible modes (for example GT-vs-GT or GT-vs-PL).

@@ -2284,10 +2395,10 @@

bcftools gtcheck [OPTIONS] [-g ge
-
   # Check discordance of all samples from B against all sample in A
+
   # Check discordance of all samples from B against all samples in A
    bcftools gtcheck -g A.bcf B.bcf
 
-   # Limit comparisons to the fiven list of samples
+   # Limit comparisons to the given list of samples
    bcftools gtcheck -s gt:a1,a2,a3 -s qry:b1,b2 -g A.bcf B.bcf
 
    # Compare only two pairs a1,b1 and a1,b2
@@ -2322,6 +2433,13 @@ 

Options:

Also display the first INT variant records. By default, no variant records are displayed.

+
-s, --samples INT
+
+

Display the first INT variant records including the last #CHROM header line with samples. +Running with -s 0 alone outputs the #CHROM header line only. Note that +the list of samples, with each sample per line, can be obtained with bcftools query using +the option -l, --list-samples.

+
@@ -2430,6 +2548,10 @@

bcftools isec [OPTIONS] A.vcf.gz B.vcf.gzinclude only sites for which EXPRESSION is true. See discussion of -e, --exclude above.

+
-f, --file-list FILE
+
+

Read file names from FILE, one file name per line.

+
-n, --nfiles [+-=]INT|~BITMAP

output positions present in this many (=), this many or more (+), this @@ -2474,12 +2596,14 @@

bcftools isec [OPTIONS] A.vcf.gz B.vcf.gz
-w, --write LIST
-

list of input files to output given as 1-based indices. With -p and no +

comma-separated list of input files to output given as 1-based indices. With -p and no -w, all files are written.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file. This is done automatically with the -p option.

+

Automatically index the output file. FMT is optional and defaults +to tbi for vcf.gz and csi for bcf. This is done automatically +with the -p option if the output format is compressed.

@@ -2550,6 +2674,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<
+
--force-no-index
+
+

synonymous to --no-index

+
--force-samples

if the merged files contain duplicate samples names, proceed anyway. @@ -2557,6 +2685,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< as it appeared on the command line to the conflicting sample name (see 2:S3 in the above example).

+
--force-single
+
+

run even if only one file is given on input

+
--print-header

print only merged header and exit

@@ -2605,16 +2737,18 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<

Sites with many alternate alleles can require extremely large storage space which can exceed the 2GB size limit representable by BCF. This is caused by Number=G tags (such as FORMAT/PL) which store a value for each combination of reference -and alternate alleles. The -L, --local-alleles option allows to replace such tags +and alternate alleles. The -L, --local-alleles option allows replacement of such tags with a localized tag (FORMAT/LPL) which only includes a subset of alternate alleles relevant for that sample. A new FORMAT/LAA tag is added which lists 1-based indices of the alternate alleles relevant (local) for the current sample. The number INT gives the maximum number of alternate alleles that can be included in the PL tag. The default value is 0 which disables the feature and outputs values for all alternate alleles.

-
-m, --merge snps|indels|both|snp-ins-del|all|none|id
+
-m, --merge snps|indels|both|snp-ins-del|all|none|id[,*]
-

The option controls what types of multiallelic records can be created:

+

The option controls what types of multiallelic records can be created. If single asterisk +* is appended, the unobserved allele <*> or <NON_REF> will be removed at variant sites; +if two asterisks ** are appended, the unobserved allele will be removed all sites.

@@ -2624,6 +2758,8 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< -m snps .. allow multiallelic SNP records -m indels .. allow multiallelic indel records -m both .. both SNP and indel records can be multiallelic +-m both,* .. same as above but remove <*> (or <NON_REF>) from variant sites +-m both,** .. same as above but remove <*> (or <NON_REF>) at all sites -m all .. SNP records can be merged with indel records -m snp-ins-del .. allow multiallelic SNVs, insertions, deletions, but don't mix them -m id .. merge by ID @@ -2637,13 +2773,13 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< alleles, vector fields pertaining to unobserved alleles are set to missing (.) by default. The METHOD is one of . (the default, use missing values), NUMBER (use a constant value, e.g. 0), max (the maximum value observed for other alleles in the sample). When --gvcf option is set, -the rule -M PL:max,AD:0 is implied. This can be overriden with providing -M - or -M PL:.,AD:.. +the rule -M PL:max,AD:0 is implied. This can be overridden with providing -M - or -M PL:.,AD:.. Note that if the unobserved allele is explicitly present as <*> or <NON_REF>, then its corresponding value will be used regardless of -M settings.

--no-index
-

the option allows to merge files without indexing them first. In order for this +

the option allows files to be merged without indexing them first. In order for this option to work, the user must ensure that the input files have chromosomes in the same order and consistent with the order of sequences in the VCF header.

@@ -2675,9 +2811,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2817,7 +2954,23 @@

Input options

A new EXPERIMENTAL indel calling model which aims to address some known deficiencies of the current indel calling algorithm. Specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity -but at the cost of decreased specificity

+but at the cost of decreased specificity. +Only works with short-read sequencing technologies.

+ +
--indels-cns
+
+

Another EXPERIMENTAL indel calling method, predating indels-2.0 in +PR form, but merged more recently. It also uses a diploid +reference consensus, but with added parameters and heuristics to +optimise for a variety of sequencing platforms. This is usually +faster and more accurate than the default caller and --indels-2.0, +but has not been tested on non-diploid samples and samples without +approximately even allele frequency.

+
+
--no-indels-cns
+
+

May be used to turn off --indels-cns mode when using one of the +newer profiles that has this enabled by default.

-q, -min-MQ INT
@@ -2991,9 +3144,10 @@

Output options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3004,15 +3158,70 @@

Options for SNP/I
-X, --config STR
-

Specify a platform specific configuration profile. The profile -should be one of 1.12, illumina, ont or pacbio-ccs. -Settings applied are as follows:

+

Specify a platform specific configuration profile. Specifying the +profile as "list" will list the available profile names and the +parameters they change. There are profiles named after a release, +which should be used if you wish to ensure forward compatibility +of results. The non-versioned names (eg "illumina") will always +point to the most recent set of parameters for that instrument type. +The current values are:

-
1.12           -Q13 -h100 -m1
-illumina       [ default values ]
-ont	           -B -Q5 --max-BQ 30 -I
-pacbio-ccs     -D -Q5 --max-BQ 50 -F0.1 -o25 -e1 -M99999
+
1.12            -Q13 -h100 -m1
+
+
+
+
+
bgi
+bgi-1.20        --indels-cns -B --indel-size 80 -F0.1 --indel-bias 0.9
+                --seqq-offset 120
+
+
+
+
+
illumina-1.18   [ default values ]
+
+
+
+
+
illumina
+illumina-1.20   --indels-cns --seqq-offset 125
+
+
+
+
+
ont             -B -Q5 --max-BQ 30 -I
+
+
+
+
+
ont-sup
+ont-sup-1.20    --indels-cns -B -Q1 --max-BQ 35 --delta-BQ 99 -F0.2
+                -o15 -e1 -h110 --del-bias 0.4 --indel-bias 0.7
+                --poly-mqual --seqq-offset 130 --indel-size 80
+
+
+
+
+
pacbio-ccs-1.18 -D -Q5 --max-BQ 50 -F0.1 -o25 -e1 -M99999
+
+
+
+
+
pacbio-ccs
+pacbio-ccs-1.20  --indels-cns -B -Q5 --max-BQ 50 -F0.1 -o25 -e1 -h300
+                 --delta-BQ 10 --del-bias 0.4 --poly-mqual
+                 --indel-bias 0.9 --seqq-offset 118 --indel-size 80
+                 --score-vs-ref 0.7
+
+
+
+
+
ultima
+ultima-1.20      --indels-cns -B -Q1 --max-BQ 30 --delta-BQ 10 -F0.15
+                 -o20 -e10 -h250 --del-bias 0.3 --indel-bias 0.7
+                 --poly-mqual --seqq-offset 140 --score-vs-ref 0.3
+                 --indel-size 80
@@ -3058,12 +3267,32 @@

Options for SNP/I 0.75) while higher depth samples or where you favour recall rates over precision may work better with a higher value such as 2.0.

+
--del-bias FLOAT
+
+

Skews the likelihood of deletions over insertions. Defaults to an +even distribution value of 1.0. Lower values imply a higher rate +of false positive deletions (meaning candidate deletions are less +likely to be real).

+
--indel-size INT

Indel window size to use when assessing the quality of candidate indels. Note that although the window size approximately corresponds to the maximum indel size considered, it is not an exact threshold [110]

+
--seqq-offset INT
+
+

Tunes the importance of indel sequence quality per depth. The +final "seqQ" quality used is "offset - 5*MIN(depth,20)". [120]

+
+
--poly-mqual
+
+

Use the lowest quality value within a homopolymer run, instead of +the quality immediately adjacent to the indel. This may be +important for unclocked instruments, particularly ones with a flow +chemistry where runs of bases of identical type are incorporated +together.

+
-I, --skip-indels

Do not perform INDEL calling

@@ -3157,14 +3386,14 @@

bcftools norm [OPTIONS] file.vcf.gz

100 CC C,GG 1/2 # After: - # bcftools norm -a . + # bcftools norm -a --atom-overlaps . 100 C G ./1 100 CC C 1/. 101 C G ./1 # After: - # bcftools norm -a '*' - # bcftools norm -a \* + # bcftools norm -a --atom-overlaps '*' + # bcftools norm -a --atom-overlaps \* 100 C G,* 2/1 100 CC C,* 1/2 101 C G,* 2/1 @@ -3205,6 +3434,12 @@

bcftools norm [OPTIONS] file.vcf.gz

try to proceed with -m- even if malformed tags with incorrect number of fields are encountered, discarding such tags. (Experimental, use at your own risk.)

+
-g, --gff-annot FILE
+
+

when a GFF file is provided, follow HGVS 3’rule and right-align variants in transcripts on the forward +strand. In case of overlapping transcripts, the default mode is to left-align the variant. For a +description of the supported GFF3 file format see bcftools csq.

+
--keep-sum TAG[,…​]

keep vector sum constant when splitting multiallelic sites. Only AD tag @@ -3218,7 +3453,11 @@

bcftools norm [OPTIONS] file.vcf.gz

together: If only SNP records should be split or merged, specify snps; if both SNPs and indels should be merged separately into two records, specify both; if SNPs and indels should be merged into a single record, specify -any.

+any. + 

+Note that multiallelic sites with both SNPs and indels will be split into +biallelic sites with both -m -snps and -m -indels.

--multi-overlaps 0|.
@@ -3285,9 +3524,10 @@

bcftools norm [OPTIONS] file.vcf.gz

maximum distance between two records to consider when locally sorting variants which changed position during the realignment

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3364,9 +3604,10 @@

VCF output options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3613,13 +3854,14 @@

List of plugins coming wi
split-vep
-

extract fields from structured annotations such as INFO/CSQ created by bcftools/csq or VEP. These -can be added as a new INFO field to the VCF or in a custom text format. See +

extract fields from structured annotations such as INFO/CSQ created by VEP or INFO/BCSQ created by +bcftools/csq. These can be added as a new INFO field to the VCF or in a custom text format. See http://samtools.github.io/bcftools/howtos/plugin.split-vep.html for more.

tag2tag
-

Convert between similar tags, such as GL,PL,GP or QR,QA,QS.

+

Convert between similar tags, such as GL,PL,GP or QR,QA,QS or tags with localized alleles e.g. LPL,LAD. +See http://samtools.github.io/bcftools/howtos/plugin.tag2tag.html for more.

trio-dnm2
@@ -3830,6 +4072,12 @@

bcftools query [OPTIONS] file.vcf.gz [file.

learn by example, see below

+
-F, --print-filtered STR
+
+

by default, samples failing -i/-e filtering expressions are suppressed from output +when FORMAT fields are queried (for example %CHROM %POS [ %GT]). With -F, such +fields will be still printed but instead of their actual value, STR will be used.

+
-H, --print-header

print header

@@ -3843,6 +4091,14 @@

bcftools query [OPTIONS] file.vcf.gz [file.

list sample names and exit

+
-N, --disable-automatic-newline
+
+

disable automatic addition of a missing newline character at the end of the formatting +expression. By default, the program checks if the expression contains a newline +and appends it if not, to prevent formatting the entire output into a single +line by mistake. Note that versions prior to 1.18 had no automatic check and newline +had to be included explicitly.

+
-o, --output FILE

see Common Options

@@ -3913,6 +4169,7 @@

Format:

%TBCSQ Translated FORMAT/BCSQ. See the csq command above for explanation and examples. %TGT Translated genotype (e.g. C/A) %TYPE Variant type (REF, SNP, MNP, INDEL, BND, OTHER) +%VKX VariantKey, biallelic hexadecimal encoding of CHROM,POS,REF,ALT (https://github.com/tecnickcom/variantkey) [] Format fields must be enclosed in brackets to loop over all samples \n new line \t tab character @@ -3976,6 +4233,14 @@

Examples:

bcftools query -f '%AC{1}\n' -i 'AC[1]>10' file.vcf.gz +
+
+
# Print all samples at sites where at least one sample has DP=1 or DP=2. In the second case
+# print only samples with DP=1 or DP=2, the difference is in the logical operator used, || vs |.
+bcftools query -f '[%SAMPLE %GT %DP\n]' -i 'FMT/DP=1 || FMT/DP=2' file.vcf
+bcftools query -f '[%SAMPLE %GT %DP\n]' -i 'FMT/DP=1 |  FMT/DP=2' file.vcf
+
+
@@ -4010,7 +4275,7 @@

bcftools reheader [OPTIONS] file.vcf.gz

-T, --temp-prefix PATH
-

template for temporary file names, used with -f

+

this option is ignored, but left for compatibility with earlier versions of bcftools.

--threads INT
@@ -4248,11 +4513,13 @@

bcftools sort [OPTIONS] file.bcf

-T, --temp-dir DIR
-

Use this directory to store temporary files

+

Use this directory to store temporary files. If the last six characters of the string DIR are XXXXXX, +then these are replaced with a string that makes the directory name unique.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -4457,9 +4724,10 @@

Output options

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -4468,6 +4736,11 @@

Output options

Subset options:

+
-A, --trim-unseen-alleles
+
+

remove the unseen allele <*> or <NON_REF> at variant sites when the option is given once (-A) or +at all sites when the options is given twice (-AA).

+
-a, --trim-alt-alleles

remove alleles not seen in the genotype fields from the ALT column. Note that if no alternate allele @@ -4660,6 +4933,98 @@

bcftools [--version-only]

+

SCRIPTS

+
+
+

gff2gff

+
+

Attempts to fix a GFF file to be correctly parsed by csq.

+
+
+
+
+
+
zcat in.gff.gz | gff2gff | gzip -c > out.gff.gz
+
+
+
+
+
+
+

plot-vcfstats [OPTIONS] file.vchk […​]

+
+

Script for processing output of bcftools stats. It can merge +results from multiple outputs (useful when running the stats for each +chromosome separately), plots graphs and creates a PDF presentation.

+
+
+
+
-m, --merge
+
+

Merge vcfstats files to STDOUT, skip plotting.

+
+
-p, --prefix DIR
+
+

The output directory. This directory will be created if it does not exist.

+
+
-P, --no-PDF
+
+

Skip the PDF creation step.

+
+
-r, --rasterize
+
+

Rasterize PDF images for faster rendering. This is the default and the opposite of -v, --vectors.

+
+
-s, --sample-names
+
+

Use sample names for xticks rather than numeric IDs.

+
+
-t, --title STRING
+
+

Identify files by these titles in plots. The option can be given multiple +times, for each ID in the bcftools stats output. If not +present, the script will use abbreviated source file names for the titles.

+
+
-v, --vectors
+
+

Generate vector graphics for PDF images, the opposite of -r, --rasterize.

+
+
-T, --main-title STRING
+
+

Main title for the PDF.

+
+
+
+
+

Example:

+
+
+
+
+
+
# Generate the stats
+bcftools stats -s - > file.vchk
+
+
+
+
+
# Plot the stats
+plot-vcfstats -p outdir file.vchk
+
+
+
+
+
# The final looks can be customized by editing the generated
+# 'outdir/plot.py' script and re-running manually
+cd outdir && python plot.py && pdflatex summary.tex
+
+
+
+
+
+
+
+

FILTERING EXPRESSIONS

@@ -4669,8 +5034,7 @@

FILTERING EXPRESSIONS

Valid expressions may contain:
  • -

    numerical constants, string constants, file names (this is currently -supported only to filter by the ID column)

    +

    numerical constants, string constants, file names (indicated by the prefix @)

    1, 1.0, 1e-4
    @@ -4804,7 +5168,7 @@ 

    FILTERING EXPRESSIONS

  • -

    TYPE for variant type in REF,ALT columns (indel,snp,mnp,ref,bnd,other,overlap). Use the regex +

    TYPE for variant type in REF,ALT columns (indel,snp,mnp,ref,bnd,other,overlap, see TERMINOLOGY). Use the regex operator "\~" to require at least one allele of the given type or the equal sign "=" to require that all alleles are of the given type. Compare

    @@ -5052,12 +5416,17 @@

    FILTERING EXPRESSIONS

    -
    ID=@file       .. selects lines with ID present in the file
    +
    ID=@file               .. selects lines with ID present in the file
    +
    +
    +
    +
    +
    ID!=@~/file            .. skip lines with ID present in the ~/file
    -
    ID!=@~/file    .. skip lines with ID present in the ~/file
    +
    INFO/TAG=@file         .. selects lines with INFO/TAG value present in the file
    @@ -5096,91 +5465,27 @@

    FILTERING EXPRESSIONS

-

SCRIPTS

+

TERMINOLOGY

-
-

gff2gff

-
-

Attempts to fix a GFF file to be correctly parsed by csq.

-
-
-
-
-
-
zcat in.gff.gz | gff2gff | gzip -c > out.gff.gz
-
-
-
-
-
-
-

plot-vcfstats [OPTIONS] file.vchk […​]

-
-

Script for processing output of bcftools stats. It can merge -results from multiple outputs (useful when running the stats for each -chromosome separately), plots graphs and creates a PDF presentation.

-
-
-
-
-m, --merge
-
-

Merge vcfstats files to STDOUT, skip plotting.

-
-
-p, --prefix DIR
-
-

The output directory. This directory will be created if it does not exist.

-
-
-P, --no-PDF
-
-

Skip the PDF creation step.

-
-
-r, --rasterize
-
-

Rasterize PDF images for faster rendering. This is the default and the opposite of -v, --vectors.

-
-
-s, --sample-names
-
-

Use sample names for xticks rather than numeric IDs.

-
-
-t, --title STRING
-
-

Identify files by these titles in plots. The option can be given multiple -times, for each ID in the bcftools stats output. If not -present, the script will use abbreviated source file names for the titles.

-
-
-v, --vectors
-
-

Generate vector graphics for PDF images, the opposite of -r, --rasterize.

-
-
-T, --main-title STRING
-
-

Main title for the PDF.

-
-
-
-

Example:

+

The program and the documentation uses the following terminology, multiple terms can be used +interchangeably for the same VCF record type

-
# Generate the stats
-bcftools stats -s - > file.vchk
-
-
-
-
-
# Plot the stats
-plot-vcfstats -p outdir file.vchk
-
-
-
-
-
# The final looks can be customized by editing the generated
-# 'outdir/plot.py' script and re-running manually
-cd outdir && python plot.py && pdflatex summary.tex
-
+
REF   ALT
+---------
+C     .         .. reference allele / non-variant site / ref-only site
+C     T         .. SNP or SNV (single-nucleotide polymorphism or variant), used interchangeably
+CC    TT        .. MNP (multi-nucleotide polymorphism)
+CAAA  C         .. indel, deletion (regardless of length)
+C     CAAA      .. indel, insertion (regardless of length)
+C     <*>       .. gVCF block, the allele <*> is a placeholder for alternate allele possibly missed because of low coverage
+C     <NON_REF> .. synonymous to <*>
+C     *         .. overlapping deletion
+C     <INS>     .. symbolic allele, known also as 'other [than above]'
@@ -5257,7 +5562,7 @@

COPYING

diff --git a/bcftools.html b/bcftools.html index f17e00be8..dca2ffbd5 100644 --- a/bcftools.html +++ b/bcftools.html @@ -50,7 +50,7 @@

DESCRIPTION

VERSION

-

This manual page was last updated 2023-05-30 09:18 BST and refers to bcftools git version 1.17-50-ga8249495+.

+

This manual page was last updated 2024-04-29 08:11 BST and refers to bcftools git version 1.20-6-g5977f1f3+.

@@ -426,9 +426,12 @@

Common Options

Use multithreading with INT worker threads. The option is currently used only for the compression of the output stream, only when --output-type is b or z. Default: 0.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output files. Can be used only for compressed BCF and VCF output.

+

Automatically index the output files. FMT is optional and can be +one of "tbi" or "csi" depending on output file format. Defaults to +CSI unless specified otherwise. Can be used only for compressed +BCF and VCF output.

@@ -487,7 +490,7 @@

bcftools annotate [OPTIONS] FILE

Comma-separated list of columns or tags to carry over from the annotation file (see also -a, --annotations). If the annotation file is not a VCF/BCF, list describes the columns of the annotation file and must include CHROM, -POS (or, alternatively, FROM and TO), and optionally REF and ALT. Unused +POS (or, alternatively, FROM,TO or BEG,END), and optionally REF and ALT. Unused columns which should be ignored can be indicated by "-".  
 
@@ -511,16 +514,50 @@

bcftools annotate [OPTIONS] FILE

To append to existing values (rather than replacing or leaving untouched), use "=TAG" (instead of "TAG" or "+TAG"). To replace only existing values without modifying missing annotations, use "-TAG". +As a special case of this, if position needs to be replaced, mark the column with the new coordinate as "-POS". +(Note that in previous releases this used to be "~POS", now deprecated.) + 

To match the record also by ID or INFO/END, in addition to REF and ALT, use "~ID" or "~INFO/END". -If position needs to be replaced, mark the column with the new position as "~POS". +Note that this works only for ID and POS, for other fields see the description of -i below.  
 
If the annotation file is not a VCF/BCF, all new annotations must be defined via -h, --header-lines.  
 
-See also the -l, --merge-logic option.

+See also the -l, --merge-logic option. + 

+Summary of -c, --columns:

+ + +
+
+
    CHROM,POS,TAG       .. match by chromosome and position, transfer annotation from TAG
+    CHROM,POS,-,TAG     .. same as above, but ignore the third column of the annotation file
+    CHROM,BEG,END,TAG   .. match by region (BEG,END are synonymous to FROM,TO)
+    CHROM,POS,REF,ALT   .. match by CHROM, POS, REF and ALT
+
+    DST_TAG:=SRC_TAG    .. transfer the SRC_TAG using the new name DST_TAG
+    INFO                .. transfer all INFO annotations
+    ^INFO/TAG           .. transfer all INFO annotations except "TAG"
+
+    TAG       .. add or overwrite existing target value if source is not "." and skip otherwise
+    +TAG      .. add or overwrite existing target value only it is "."
+    .TAG      .. add or overwrite existing target value even if source is "."
+    .+TAG     .. add new but never overwrite existing tag, regardless of its value; can transfer "." if target does not exist
+    -TAG      .. overwrite existing value, never add new if target does not exist
+    =TAG      .. do not overwrite but append value to existing tags
+
+    ~FIELD    .. use this column to match lines with -i/-e expression (see the description of -i below)
+    ~ID       .. in addition to CHROM,POS,REF,ALT match by also ID
+    ~INFO/END .. in addition to CHROM,POS,REF,ALT match by also INFO/END
+
+
+
+
-C, --columns-file file

Read the list of columns from a file (normally given via the -c, --columns option). @@ -532,7 +569,7 @@

bcftools annotate [OPTIONS] FILE

-e, --exclude EXPRESSION

exclude sites for which EXPRESSION is true. For valid expressions see -EXPRESSIONS.

+EXPRESSIONS and the extension described in -i, --include below.

--force
@@ -573,8 +610,27 @@

bcftools annotate [OPTIONS] FILE

-i, --include EXPRESSION

include only sites for which EXPRESSION is true. For valid expressions see -EXPRESSIONS.

+EXPRESSIONS. + 

+Additionally, the command bcftools annotate supports expressions updated from the annotation +file dynamically for each record:

+
+
+
+
+
    # The field 'STR' from the -a file is required to match INFO/TAG in VCF. In the first example
+    # the alleles REF,ALT must match, in the second example they are ignored. The option -k is required
+    # to output also records that are not annotated. The third example shows the same concept with
+    # a numerical expression.
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k input.vcf
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,-,-,SCORE,~STR     -i'TAG={STR}' -k input.vcf
+    bcftools annotate -a annots.tsv.gz -c CHROM,POS,-,-,SCORE,~INT     -i'TAG>{INT}' -k input.vcf
+
+
+
+
-k, --keep-sites

keep sites which do not pass -i and -e expressions instead of discarding them

@@ -681,9 +737,10 @@

bcftools annotate [OPTIONS] FILE

"^INFO/FOO,INFO/BAR" (and similarly for FORMAT and FILTER). "INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -720,7 +777,7 @@

bcftools annotate [OPTIONS] FILE

# that INFO/END is already present in the VCF header. bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf - # For more examples see http://samtools.github.io/bcftools/howtos/annotate.html + # For (many) more examples see http://samtools.github.io/bcftools/howtos/annotate.html @@ -814,9 +871,10 @@

File format options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -830,6 +888,10 @@

Input/output options:

output all alternate alleles present in the alignments even if they do not appear in any of the genotypes

+
-*, --keep-unseen-allele
+
+

keep the unobserved allele <*> or <NON_REF>, useful mainly for gVCF output

+
-f, --format-fields list

comma-separated list of FORMAT fields to output for each sample. Currently @@ -866,7 +928,7 @@

Input/output options:

-G, --group-samples FILE|-
-

by default, all samples are assumed to come from a single population. This option allows to group samples +

by default, all samples are assumed to come from a single population. This option groups samples into populations and apply the HWE assumption within but not across the populations. FILE is a tab-delimited text file with sample names in the first column and group names in the second column. If - is given instead, no HWE assumption is made at all and single-sample calling is performed. (Note that @@ -1182,9 +1244,10 @@

bcftools concat [OPTIONS] FILE1 FILE2

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -1306,6 +1369,11 @@

bcftools consensus [OPTIONS] FILE

write output to a file

+
--regions-overlap 0|1|2
+
+

how to treat VCF variants overlapping the target region in the fasta file: +see Common Options

+
-s, --samples LIST

apply variants of the listed samples. See also the option -I, --iupac-codes

@@ -1401,9 +1469,10 @@

VCF input options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -1740,6 +1809,10 @@

bcftools csq [OPTIONS] FILE

if more are required, see the --ncsq option.

+

Note that the program annotates only records with a functional consequence and +intergenic regions will pass through unchanged.

+
+

The program requires on input a VCF/BCF file, the reference genome in fasta format (--fasta-ref) and genomic features in the GFF3 format downloadable from the Ensembl website (--gff-annot), and outputs an annotated VCF/BCF @@ -1789,7 +1862,7 @@

bcftools csq [OPTIONS] FILE

--force
-

run even if some sanity checks fail. Currently the option allows to skip +

run even if some sanity checks fail. Currently the option enables skipping transcripts in malformatted GFFs with incorrect phase

-g, --gff-annot FILE
@@ -1946,9 +2019,10 @@

bcftools csq [OPTIONS] FILE

and VCF, such as "chrX" vs "X". The chromosome names in the output VCF will match that of the input VCF. The default is to attempt the automatic translation.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2141,7 +2215,7 @@

bcftools filter [OPTIONS] FILE

-s, --soft-filter STRING|+

annotate FILTER column with STRING or, with +, a unique filter name generated -by the program ("Filter%d").

+by the program ("Filter%d"). Applies to records that do not meet filter expression.

-S, --set-GTs .|0
@@ -2163,9 +2237,10 @@

bcftools filter [OPTIONS] FILE

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2178,6 +2253,11 @@

bcftools gtcheck [OPTIONS] [-g ge is checked against the samples in the -g file. Without the -g option, multi-sample cross-check of samples in query.vcf.gz is performed.

+
+

Note that the interpretation of the discordance score depends on the options provided (specifically -e and +-u) and on the available annotations (FORMAT/PL vs FORMAT/GT). +The discordance score can be interpreted as the number of mismatching genotypes if only GT-vs-GT matching is performed.

+
--distinctive-sites NUM[,MEM[,DIR]]
@@ -2191,16 +2271,29 @@

bcftools gtcheck [OPTIONS] [-g ge

Stop after first record to estimate required time.

-
-e, --error-probability INT
+
-e, --exclude [qry|gt]:'EXPRESSION'
+
+

Exclude sites from query file (qry:) or genotype file (gt:) for which EXPRESSION is true. +For valid expressions see EXPRESSIONS.

+
+
-E, --error-probability INT

Interpret genotypes and genotype likelihoods probabilistically. The value of INT represents genotype quality when GT tag is used (e.g. Q=30 represents one error in 1,000 genotypes and Q=40 one error in 10,000 genotypes) and is ignored when PL tag is used (in that case an arbitrary -non-zero integer can be provided). See also the -u, --use option below. If set to 0, -the discordance equals to the number of mismatching genotypes when GT vs GT is compared. -Note that the values with and without -e are not comparable, only values generated -with -e 0 correspond to mismatching genotypes. -If performance is an issue, set to 0 for faster run but less accurate results.

+non-zero integer can be provided). + 

+If -E is set to 0, the discordance score can be interpreted as the number of mismatching genotypes, +but only in the GT-vs-GT matching mode. See the -u, --use option below for additional notes and caveats. + 

+If performance is an issue, set -E 0 for faster run times but less accurate results. + 

+Note that in previous versions of bcftools (⇐1.18), this option used to be a smaller case -e. It +changed to make room for the filtering option -e, --exclude to stay consistent across other +commands.

-g, --genotypes FILE
@@ -2210,6 +2303,11 @@

bcftools gtcheck [OPTIONS] [-g ge

Homozygous genotypes only, useful with low coverage data (requires -g, --genotypes)

+
-i, --include [qry|gt]:'EXPRESSION'
+
+

Include sites from query file (qry:) or genotype file (gt:) for which EXPRESSION is true. +For valid expressions see EXPRESSIONS.

+
--n-matches INT

Print only top INT matches for each sample, 0 for unlimited. Use negative value @@ -2221,6 +2319,14 @@

bcftools gtcheck [OPTIONS] [-g ge

Disable calculation of HWE probability to reduce memory requirements with comparisons between very large number of sample pairs.

+
-o, --output FILE
+
+

Write to FILE rather than to standard output, where it is written by default.

+
+
-O, --output-type t|z
+
+

Write a plain (t) or compressed (z) text tab-delimited output.

+
-p, --pairs LIST

A comma-separated list of sample pairs to compare. When the -g option is given, the first @@ -2274,8 +2380,13 @@

bcftools gtcheck [OPTIONS] [-g ge
-u, --use TAG1[,TAG2]

specifies which tag to use in the query file (TAG1) and the -g (TAG2) file. -By default, the PL tag is used in the query file and GT in the -g file when -available.

+By default, the PL tag is used in the query file and, when available, the GT tags in the +-g file. + 

+Note that when the requested tag is not available, the program will attempt to use +the other tag. The output includes the number of sites that were matched by the four +possible modes (for example GT-vs-GT or GT-vs-PL).

@@ -2284,10 +2395,10 @@

bcftools gtcheck [OPTIONS] [-g ge
-
   # Check discordance of all samples from B against all sample in A
+
   # Check discordance of all samples from B against all samples in A
    bcftools gtcheck -g A.bcf B.bcf
 
-   # Limit comparisons to the fiven list of samples
+   # Limit comparisons to the given list of samples
    bcftools gtcheck -s gt:a1,a2,a3 -s qry:b1,b2 -g A.bcf B.bcf
 
    # Compare only two pairs a1,b1 and a1,b2
@@ -2322,6 +2433,13 @@ 

Options:

Also display the first INT variant records. By default, no variant records are displayed.

+
-s, --samples INT
+
+

Display the first INT variant records including the last #CHROM header line with samples. +Running with -s 0 alone outputs the #CHROM header line only. Note that +the list of samples, with each sample per line, can be obtained with bcftools query using +the option -l, --list-samples.

+
@@ -2430,6 +2548,10 @@

bcftools isec [OPTIONS] A.vcf.gz B.vcf.gzinclude only sites for which EXPRESSION is true. See discussion of -e, --exclude above.

+
-f, --file-list FILE
+
+

Read file names from FILE, one file name per line.

+
-n, --nfiles [+-=]INT|~BITMAP

output positions present in this many (=), this many or more (+), this @@ -2474,12 +2596,14 @@

bcftools isec [OPTIONS] A.vcf.gz B.vcf.gz
-w, --write LIST
-

list of input files to output given as 1-based indices. With -p and no +

comma-separated list of input files to output given as 1-based indices. With -p and no -w, all files are written.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file. This is done automatically with the -p option.

+

Automatically index the output file. FMT is optional and defaults +to tbi for vcf.gz and csi for bcf. This is done automatically +with the -p option if the output format is compressed.

@@ -2550,6 +2674,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<
+
--force-no-index
+
+

synonymous to --no-index

+
--force-samples

if the merged files contain duplicate samples names, proceed anyway. @@ -2557,6 +2685,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< as it appeared on the command line to the conflicting sample name (see 2:S3 in the above example).

+
--force-single
+
+

run even if only one file is given on input

+
--print-header

print only merged header and exit

@@ -2605,16 +2737,18 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<

Sites with many alternate alleles can require extremely large storage space which can exceed the 2GB size limit representable by BCF. This is caused by Number=G tags (such as FORMAT/PL) which store a value for each combination of reference -and alternate alleles. The -L, --local-alleles option allows to replace such tags +and alternate alleles. The -L, --local-alleles option allows replacement of such tags with a localized tag (FORMAT/LPL) which only includes a subset of alternate alleles relevant for that sample. A new FORMAT/LAA tag is added which lists 1-based indices of the alternate alleles relevant (local) for the current sample. The number INT gives the maximum number of alternate alleles that can be included in the PL tag. The default value is 0 which disables the feature and outputs values for all alternate alleles.

-
-m, --merge snps|indels|both|snp-ins-del|all|none|id
+
-m, --merge snps|indels|both|snp-ins-del|all|none|id[,*]
-

The option controls what types of multiallelic records can be created:

+

The option controls what types of multiallelic records can be created. If single asterisk +* is appended, the unobserved allele <*> or <NON_REF> will be removed at variant sites; +if two asterisks ** are appended, the unobserved allele will be removed all sites.

@@ -2624,6 +2758,8 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< -m snps .. allow multiallelic SNP records -m indels .. allow multiallelic indel records -m both .. both SNP and indel records can be multiallelic +-m both,* .. same as above but remove <*> (or <NON_REF>) from variant sites +-m both,** .. same as above but remove <*> (or <NON_REF>) at all sites -m all .. SNP records can be merged with indel records -m snp-ins-del .. allow multiallelic SNVs, insertions, deletions, but don't mix them -m id .. merge by ID @@ -2637,13 +2773,13 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz< alleles, vector fields pertaining to unobserved alleles are set to missing (.) by default. The METHOD is one of . (the default, use missing values), NUMBER (use a constant value, e.g. 0), max (the maximum value observed for other alleles in the sample). When --gvcf option is set, -the rule -M PL:max,AD:0 is implied. This can be overriden with providing -M - or -M PL:.,AD:.. +the rule -M PL:max,AD:0 is implied. This can be overridden with providing -M - or -M PL:.,AD:.. Note that if the unobserved allele is explicitly present as <*> or <NON_REF>, then its corresponding value will be used regardless of -M settings.

--no-index
-

the option allows to merge files without indexing them first. In order for this +

the option allows files to be merged without indexing them first. In order for this option to work, the user must ensure that the input files have chromosomes in the same order and consistent with the order of sequences in the VCF header.

@@ -2675,9 +2811,10 @@

bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz<

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -2817,7 +2954,23 @@

Input options

A new EXPERIMENTAL indel calling model which aims to address some known deficiencies of the current indel calling algorithm. Specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity -but at the cost of decreased specificity

+but at the cost of decreased specificity. +Only works with short-read sequencing technologies.

+ +
--indels-cns
+
+

Another EXPERIMENTAL indel calling method, predating indels-2.0 in +PR form, but merged more recently. It also uses a diploid +reference consensus, but with added parameters and heuristics to +optimise for a variety of sequencing platforms. This is usually +faster and more accurate than the default caller and --indels-2.0, +but has not been tested on non-diploid samples and samples without +approximately even allele frequency.

+
+
--no-indels-cns
+
+

May be used to turn off --indels-cns mode when using one of the +newer profiles that has this enabled by default.

-q, -min-MQ INT
@@ -2991,9 +3144,10 @@

Output options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3004,15 +3158,70 @@

Options for SNP/I
-X, --config STR
-

Specify a platform specific configuration profile. The profile -should be one of 1.12, illumina, ont or pacbio-ccs. -Settings applied are as follows:

+

Specify a platform specific configuration profile. Specifying the +profile as "list" will list the available profile names and the +parameters they change. There are profiles named after a release, +which should be used if you wish to ensure forward compatibility +of results. The non-versioned names (eg "illumina") will always +point to the most recent set of parameters for that instrument type. +The current values are:

-
1.12           -Q13 -h100 -m1
-illumina       [ default values ]
-ont	           -B -Q5 --max-BQ 30 -I
-pacbio-ccs     -D -Q5 --max-BQ 50 -F0.1 -o25 -e1 -M99999
+
1.12            -Q13 -h100 -m1
+
+
+
+
+
bgi
+bgi-1.20        --indels-cns -B --indel-size 80 -F0.1 --indel-bias 0.9
+                --seqq-offset 120
+
+
+
+
+
illumina-1.18   [ default values ]
+
+
+
+
+
illumina
+illumina-1.20   --indels-cns --seqq-offset 125
+
+
+
+
+
ont             -B -Q5 --max-BQ 30 -I
+
+
+
+
+
ont-sup
+ont-sup-1.20    --indels-cns -B -Q1 --max-BQ 35 --delta-BQ 99 -F0.2
+                -o15 -e1 -h110 --del-bias 0.4 --indel-bias 0.7
+                --poly-mqual --seqq-offset 130 --indel-size 80
+
+
+
+
+
pacbio-ccs-1.18 -D -Q5 --max-BQ 50 -F0.1 -o25 -e1 -M99999
+
+
+
+
+
pacbio-ccs
+pacbio-ccs-1.20  --indels-cns -B -Q5 --max-BQ 50 -F0.1 -o25 -e1 -h300
+                 --delta-BQ 10 --del-bias 0.4 --poly-mqual
+                 --indel-bias 0.9 --seqq-offset 118 --indel-size 80
+                 --score-vs-ref 0.7
+
+
+
+
+
ultima
+ultima-1.20      --indels-cns -B -Q1 --max-BQ 30 --delta-BQ 10 -F0.15
+                 -o20 -e10 -h250 --del-bias 0.3 --indel-bias 0.7
+                 --poly-mqual --seqq-offset 140 --score-vs-ref 0.3
+                 --indel-size 80
@@ -3058,12 +3267,32 @@

Options for SNP/I 0.75) while higher depth samples or where you favour recall rates over precision may work better with a higher value such as 2.0.

+
--del-bias FLOAT
+
+

Skews the likelihood of deletions over insertions. Defaults to an +even distribution value of 1.0. Lower values imply a higher rate +of false positive deletions (meaning candidate deletions are less +likely to be real).

+
--indel-size INT

Indel window size to use when assessing the quality of candidate indels. Note that although the window size approximately corresponds to the maximum indel size considered, it is not an exact threshold [110]

+
--seqq-offset INT
+
+

Tunes the importance of indel sequence quality per depth. The +final "seqQ" quality used is "offset - 5*MIN(depth,20)". [120]

+
+
--poly-mqual
+
+

Use the lowest quality value within a homopolymer run, instead of +the quality immediately adjacent to the indel. This may be +important for unclocked instruments, particularly ones with a flow +chemistry where runs of bases of identical type are incorporated +together.

+
-I, --skip-indels

Do not perform INDEL calling

@@ -3157,14 +3386,14 @@

bcftools norm [OPTIONS] file.vcf.gz

100 CC C,GG 1/2 # After: - # bcftools norm -a . + # bcftools norm -a --atom-overlaps . 100 C G ./1 100 CC C 1/. 101 C G ./1 # After: - # bcftools norm -a '*' - # bcftools norm -a \* + # bcftools norm -a --atom-overlaps '*' + # bcftools norm -a --atom-overlaps \* 100 C G,* 2/1 100 CC C,* 1/2 101 C G,* 2/1 @@ -3205,6 +3434,12 @@

bcftools norm [OPTIONS] file.vcf.gz

try to proceed with -m- even if malformed tags with incorrect number of fields are encountered, discarding such tags. (Experimental, use at your own risk.)

+
-g, --gff-annot FILE
+
+

when a GFF file is provided, follow HGVS 3’rule and right-align variants in transcripts on the forward +strand. In case of overlapping transcripts, the default mode is to left-align the variant. For a +description of the supported GFF3 file format see bcftools csq.

+
--keep-sum TAG[,…​]

keep vector sum constant when splitting multiallelic sites. Only AD tag @@ -3218,7 +3453,11 @@

bcftools norm [OPTIONS] file.vcf.gz

together: If only SNP records should be split or merged, specify snps; if both SNPs and indels should be merged separately into two records, specify both; if SNPs and indels should be merged into a single record, specify -any.

+any. + 

+Note that multiallelic sites with both SNPs and indels will be split into +biallelic sites with both -m -snps and -m -indels.

--multi-overlaps 0|.
@@ -3285,9 +3524,10 @@

bcftools norm [OPTIONS] file.vcf.gz

maximum distance between two records to consider when locally sorting variants which changed position during the realignment

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3364,9 +3604,10 @@

VCF output options:

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -3613,13 +3854,14 @@

List of plugins coming wi
split-vep
-

extract fields from structured annotations such as INFO/CSQ created by bcftools/csq or VEP. These -can be added as a new INFO field to the VCF or in a custom text format. See +

extract fields from structured annotations such as INFO/CSQ created by VEP or INFO/BCSQ created by +bcftools/csq. These can be added as a new INFO field to the VCF or in a custom text format. See http://samtools.github.io/bcftools/howtos/plugin.split-vep.html for more.

tag2tag
-

Convert between similar tags, such as GL,PL,GP or QR,QA,QS.

+

Convert between similar tags, such as GL,PL,GP or QR,QA,QS or tags with localized alleles e.g. LPL,LAD. +See http://samtools.github.io/bcftools/howtos/plugin.tag2tag.html for more.

trio-dnm2
@@ -3830,6 +4072,12 @@

bcftools query [OPTIONS] file.vcf.gz [file.

learn by example, see below

+
-F, --print-filtered STR
+
+

by default, samples failing -i/-e filtering expressions are suppressed from output +when FORMAT fields are queried (for example %CHROM %POS [ %GT]). With -F, such +fields will be still printed but instead of their actual value, STR will be used.

+
-H, --print-header

print header

@@ -3843,6 +4091,14 @@

bcftools query [OPTIONS] file.vcf.gz [file.

list sample names and exit

+
-N, --disable-automatic-newline
+
+

disable automatic addition of a missing newline character at the end of the formatting +expression. By default, the program checks if the expression contains a newline +and appends it if not, to prevent formatting the entire output into a single +line by mistake. Note that versions prior to 1.18 had no automatic check and newline +had to be included explicitly.

+
-o, --output FILE

see Common Options

@@ -3913,6 +4169,7 @@

Format:

%TBCSQ Translated FORMAT/BCSQ. See the csq command above for explanation and examples. %TGT Translated genotype (e.g. C/A) %TYPE Variant type (REF, SNP, MNP, INDEL, BND, OTHER) +%VKX VariantKey, biallelic hexadecimal encoding of CHROM,POS,REF,ALT (https://github.com/tecnickcom/variantkey) [] Format fields must be enclosed in brackets to loop over all samples \n new line \t tab character @@ -3976,6 +4233,14 @@

Examples:

bcftools query -f '%AC{1}\n' -i 'AC[1]>10' file.vcf.gz +
+
+
# Print all samples at sites where at least one sample has DP=1 or DP=2. In the second case
+# print only samples with DP=1 or DP=2, the difference is in the logical operator used, || vs |.
+bcftools query -f '[%SAMPLE %GT %DP\n]' -i 'FMT/DP=1 || FMT/DP=2' file.vcf
+bcftools query -f '[%SAMPLE %GT %DP\n]' -i 'FMT/DP=1 |  FMT/DP=2' file.vcf
+
+
@@ -4010,7 +4275,7 @@

bcftools reheader [OPTIONS] file.vcf.gz

-T, --temp-prefix PATH
-

template for temporary file names, used with -f

+

this option is ignored, but left for compatibility with earlier versions of bcftools.

--threads INT
@@ -4248,11 +4513,13 @@

bcftools sort [OPTIONS] file.bcf

-T, --temp-dir DIR
-

Use this directory to store temporary files

+

Use this directory to store temporary files. If the last six characters of the string DIR are XXXXXX, +then these are replaced with a string that makes the directory name unique.

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -4457,9 +4724,10 @@

Output options

see Common Options

-
--write-index
+
-W[FMT], -W[=FMT], --write-index[=FMT]
-

Automatically index the output file

+

Automatically index the output file. FMT is optional and can be +one of "tbi" or "csi" depending on output file format.

@@ -4468,6 +4736,11 @@

Output options

Subset options:

+
-A, --trim-unseen-alleles
+
+

remove the unseen allele <*> or <NON_REF> at variant sites when the option is given once (-A) or +at all sites when the options is given twice (-AA).

+
-a, --trim-alt-alleles

remove alleles not seen in the genotype fields from the ALT column. Note that if no alternate allele @@ -4660,6 +4933,98 @@

bcftools [--version-only]

+

SCRIPTS

+
+
+

gff2gff

+
+

Attempts to fix a GFF file to be correctly parsed by csq.

+
+
+
+
+
+
zcat in.gff.gz | gff2gff | gzip -c > out.gff.gz
+
+
+
+
+
+
+

plot-vcfstats [OPTIONS] file.vchk […​]

+
+

Script for processing output of bcftools stats. It can merge +results from multiple outputs (useful when running the stats for each +chromosome separately), plots graphs and creates a PDF presentation.

+
+
+
+
-m, --merge
+
+

Merge vcfstats files to STDOUT, skip plotting.

+
+
-p, --prefix DIR
+
+

The output directory. This directory will be created if it does not exist.

+
+
-P, --no-PDF
+
+

Skip the PDF creation step.

+
+
-r, --rasterize
+
+

Rasterize PDF images for faster rendering. This is the default and the opposite of -v, --vectors.

+
+
-s, --sample-names
+
+

Use sample names for xticks rather than numeric IDs.

+
+
-t, --title STRING
+
+

Identify files by these titles in plots. The option can be given multiple +times, for each ID in the bcftools stats output. If not +present, the script will use abbreviated source file names for the titles.

+
+
-v, --vectors
+
+

Generate vector graphics for PDF images, the opposite of -r, --rasterize.

+
+
-T, --main-title STRING
+
+

Main title for the PDF.

+
+
+
+
+

Example:

+
+
+
+
+
+
# Generate the stats
+bcftools stats -s - > file.vchk
+
+
+
+
+
# Plot the stats
+plot-vcfstats -p outdir file.vchk
+
+
+
+
+
# The final looks can be customized by editing the generated
+# 'outdir/plot.py' script and re-running manually
+cd outdir && python plot.py && pdflatex summary.tex
+
+
+
+
+
+
+
+

FILTERING EXPRESSIONS

@@ -4669,8 +5034,7 @@

FILTERING EXPRESSIONS

Valid expressions may contain:
  • -

    numerical constants, string constants, file names (this is currently -supported only to filter by the ID column)

    +

    numerical constants, string constants, file names (indicated by the prefix @)

    1, 1.0, 1e-4
    @@ -4804,7 +5168,7 @@ 

    FILTERING EXPRESSIONS

  • -

    TYPE for variant type in REF,ALT columns (indel,snp,mnp,ref,bnd,other,overlap). Use the regex +

    TYPE for variant type in REF,ALT columns (indel,snp,mnp,ref,bnd,other,overlap, see TERMINOLOGY). Use the regex operator "\~" to require at least one allele of the given type or the equal sign "=" to require that all alleles are of the given type. Compare

    @@ -5052,12 +5416,17 @@

    FILTERING EXPRESSIONS

    -
    ID=@file       .. selects lines with ID present in the file
    +
    ID=@file               .. selects lines with ID present in the file
    +
    +
    +
    +
    +
    ID!=@~/file            .. skip lines with ID present in the ~/file
    -
    ID!=@~/file    .. skip lines with ID present in the ~/file
    +
    INFO/TAG=@file         .. selects lines with INFO/TAG value present in the file
    @@ -5096,91 +5465,27 @@

    FILTERING EXPRESSIONS

-

SCRIPTS

+

TERMINOLOGY

-
-

gff2gff

-
-

Attempts to fix a GFF file to be correctly parsed by csq.

-
-
-
-
-
-
zcat in.gff.gz | gff2gff | gzip -c > out.gff.gz
-
-
-
-
-
-
-

plot-vcfstats [OPTIONS] file.vchk […​]

-
-

Script for processing output of bcftools stats. It can merge -results from multiple outputs (useful when running the stats for each -chromosome separately), plots graphs and creates a PDF presentation.

-
-
-
-
-m, --merge
-
-

Merge vcfstats files to STDOUT, skip plotting.

-
-
-p, --prefix DIR
-
-

The output directory. This directory will be created if it does not exist.

-
-
-P, --no-PDF
-
-

Skip the PDF creation step.

-
-
-r, --rasterize
-
-

Rasterize PDF images for faster rendering. This is the default and the opposite of -v, --vectors.

-
-
-s, --sample-names
-
-

Use sample names for xticks rather than numeric IDs.

-
-
-t, --title STRING
-
-

Identify files by these titles in plots. The option can be given multiple -times, for each ID in the bcftools stats output. If not -present, the script will use abbreviated source file names for the titles.

-
-
-v, --vectors
-
-

Generate vector graphics for PDF images, the opposite of -r, --rasterize.

-
-
-T, --main-title STRING
-
-

Main title for the PDF.

-
-
-
-

Example:

+

The program and the documentation uses the following terminology, multiple terms can be used +interchangeably for the same VCF record type

-
# Generate the stats
-bcftools stats -s - > file.vchk
-
-
-
-
-
# Plot the stats
-plot-vcfstats -p outdir file.vchk
-
-
-
-
-
# The final looks can be customized by editing the generated
-# 'outdir/plot.py' script and re-running manually
-cd outdir && python plot.py && pdflatex summary.tex
-
+
REF   ALT
+---------
+C     .         .. reference allele / non-variant site / ref-only site
+C     T         .. SNP or SNV (single-nucleotide polymorphism or variant), used interchangeably
+CC    TT        .. MNP (multi-nucleotide polymorphism)
+CAAA  C         .. indel, deletion (regardless of length)
+C     CAAA      .. indel, insertion (regardless of length)
+C     <*>       .. gVCF block, the allele <*> is a placeholder for alternate allele possibly missed because of low coverage
+C     <NON_REF> .. synonymous to <*>
+C     *         .. overlapping deletion
+C     <INS>     .. symbolic allele, known also as 'other [than above]'
@@ -5257,7 +5562,7 @@

COPYING