Skip to content

Commit

Permalink
Add FAQ item: XYZ not defined in the header
Browse files Browse the repository at this point in the history
  • Loading branch information
pd3 committed May 31, 2024
1 parent 30c6443 commit 061112d
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 2 deletions.
30 changes: 30 additions & 0 deletions howtos/FAQ.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,36 @@
<div class="sect1">
<h2 id="_frequently_asked_questions">Frequently Asked Questions</h2>
<div class="sectionbody">
<div id="undefined-tag" class="paragraph">
<div class="title"><strong>'XYZ' is not defined in the header, assuming Type=String</strong></div>
<p>The <a href="https://samtools.github.io/hts-specs/VCFv4.3.pdf">VCF specification</a> recommends that all INFO and
FORMAT tags that appear throughout the file body are defined in the VCF header.</p>
</div>
<div class="paragraph">
<p>Fix the header using the reheader command</p>
</div>
<div class="listingblock">
<div class="content">
<pre># Write out the header to be modified
bcftools view -h old.vcf &gt; header.txt

# Edit the header using your favorite text editor and add the missing definition, eg
# ##INFO=&lt;ID=XYZ,Number=1,Type=Integer,Description="Describe the tag"&gt;
vi header.txt

# Reheader the file
bcftools reheader -h header.txt -o new.vcf old.vcf</pre>
</div>
</div>
<div class="paragraph">
<p>Why do you have to do it? Although VCF specification allows undefined tags, HTSlib and BCFtools internally
treat VCF as BCF, where all tags must be defined in the header. This is because of the way BCF is designed:
the tags throughout the BCF file are represented as pointers to the dictionary of tags stored in the header.
We work around this problem by adding missing definitions on the fly. Note this can work for read-only operations, but
will still lead to problems when writing the file out as BCF: even though the reader
updated its internal structures with a dummy definition and continued reading, the writer was not
aware about the new tag when the header was written.</p>
</div>
<div id="incorrect-nfields" class="paragraph">
<div class="title"><strong>Incorrect number of fields at chr1:1234567</strong></div>
<p>This error is triggered when the number of values in the data line does not match
Expand Down
28 changes: 27 additions & 1 deletion howtos/FAQ.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,34 @@ include::header.inc[]
Frequently Asked Questions
--------------------------

.*Incorrect number of fields at chr1:1234567*
.*'XYZ' is not defined in the header, assuming Type=String*
[#undefined-tag]
The link:https://samtools.github.io/hts-specs/VCFv4.3.pdf[VCF specification] recommends that all INFO and
FORMAT tags that appear throughout the file body are defined in the VCF header.

Fix the header using the reheader command
----
# Write out the header to be modified
bcftools view -h old.vcf > header.txt

# Edit the header using your favorite text editor and add the missing definition, eg
# ##INFO=<ID=XYZ,Number=1,Type=Integer,Description="Describe the tag">
vi header.txt

# Reheader the file
bcftools reheader -h header.txt -o new.vcf old.vcf
----

Why do you have to do it? Although VCF specification allows undefined tags, HTSlib and BCFtools internally
treat VCF as BCF, where all tags must be defined in the header. This is because of the way BCF is designed:
the tags throughout the BCF file are represented as pointers to the dictionary of tags stored in the header.
We work around this problem by adding missing definitions on the fly. Note this can work for read-only operations, but
will still lead to problems when writing the file out as BCF: even though the reader
updated its internal structures with a dummy definition and continued reading, the writer was not
aware about the new tag when the header was written.


.*Incorrect number of fields at chr1:1234567*
[#incorrect-nfields]
This error is triggered when the number of values in the data line does not match
its definition in the header. For example, one may see an error like
Expand Down
2 changes: 1 addition & 1 deletion howtos/roh-calling.html
Original file line number Diff line number Diff line change
Expand Up @@ -262,4 +262,4 @@ <h3 id="_feedback">Feedback</h3>
</div>
</div>
</body>
</html>
</html>

0 comments on commit 061112d

Please sign in to comment.