From 061112dca3b732a229a9ad6182d205b18df4459e Mon Sep 17 00:00:00 2001 From: Petr Danecek Date: Fri, 31 May 2024 09:37:28 +0100 Subject: [PATCH] Add FAQ item: XYZ not defined in the header --- howtos/FAQ.html | 30 ++++++++++++++++++++++++++++++ howtos/FAQ.txt | 28 +++++++++++++++++++++++++++- howtos/roh-calling.html | 2 +- 3 files changed, 58 insertions(+), 2 deletions(-) diff --git a/howtos/FAQ.html b/howtos/FAQ.html index f646b1d4..8b5d93c5 100644 --- a/howtos/FAQ.html +++ b/howtos/FAQ.html @@ -83,6 +83,36 @@

Frequently Asked Questions

+
+
'XYZ' is not defined in the header, assuming Type=String
+

The VCF specification recommends that all INFO and +FORMAT tags that appear throughout the file body are defined in the VCF header.

+
+
+

Fix the header using the reheader command

+
+
+
+
# Write out the header to be modified
+bcftools view -h old.vcf > header.txt
+
+# Edit the header using your favorite text editor and add the missing definition, eg
+#   ##INFO=<ID=XYZ,Number=1,Type=Integer,Description="Describe the tag">
+vi header.txt
+
+# Reheader the file
+bcftools reheader -h header.txt -o new.vcf old.vcf
+
+
+
+

Why do you have to do it? Although VCF specification allows undefined tags, HTSlib and BCFtools internally +treat VCF as BCF, where all tags must be defined in the header. This is because of the way BCF is designed: +the tags throughout the BCF file are represented as pointers to the dictionary of tags stored in the header. +We work around this problem by adding missing definitions on the fly. Note this can work for read-only operations, but +will still lead to problems when writing the file out as BCF: even though the reader +updated its internal structures with a dummy definition and continued reading, the writer was not +aware about the new tag when the header was written.

+
Incorrect number of fields at chr1:1234567

This error is triggered when the number of values in the data line does not match diff --git a/howtos/FAQ.txt b/howtos/FAQ.txt index 279a982d..ddbf2603 100644 --- a/howtos/FAQ.txt +++ b/howtos/FAQ.txt @@ -4,8 +4,34 @@ include::header.inc[] Frequently Asked Questions -------------------------- -.*Incorrect number of fields at chr1:1234567* +.*'XYZ' is not defined in the header, assuming Type=String* +[#undefined-tag] +The link:https://samtools.github.io/hts-specs/VCFv4.3.pdf[VCF specification] recommends that all INFO and +FORMAT tags that appear throughout the file body are defined in the VCF header. + +Fix the header using the reheader command +---- +# Write out the header to be modified +bcftools view -h old.vcf > header.txt +# Edit the header using your favorite text editor and add the missing definition, eg +# ##INFO= +vi header.txt + +# Reheader the file +bcftools reheader -h header.txt -o new.vcf old.vcf +---- + +Why do you have to do it? Although VCF specification allows undefined tags, HTSlib and BCFtools internally +treat VCF as BCF, where all tags must be defined in the header. This is because of the way BCF is designed: +the tags throughout the BCF file are represented as pointers to the dictionary of tags stored in the header. +We work around this problem by adding missing definitions on the fly. Note this can work for read-only operations, but +will still lead to problems when writing the file out as BCF: even though the reader +updated its internal structures with a dummy definition and continued reading, the writer was not +aware about the new tag when the header was written. + + +.*Incorrect number of fields at chr1:1234567* [#incorrect-nfields] This error is triggered when the number of values in the data line does not match its definition in the header. For example, one may see an error like diff --git a/howtos/roh-calling.html b/howtos/roh-calling.html index 8bb6f77f..f52f9888 100644 --- a/howtos/roh-calling.html +++ b/howtos/roh-calling.html @@ -262,4 +262,4 @@

Feedback

- + \ No newline at end of file