Skip to content

Commit e54b570

Browse files
committed
Implement double escaping for alpn in SVCB RR
1 parent bdccca2 commit e54b570

File tree

5 files changed

+399
-80
lines changed

5 files changed

+399
-80
lines changed

doc/manual/format.rst

+239
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
.. include:: links.rst
2+
3+
###################
4+
Presentation format
5+
###################
6+
7+
DNS resource records (RRs) can be expressed in text form using the DNS
8+
presentation format. The format is originally defined in
9+
|url::rfc1035_section_5_1| and |url::rfc1034_section_3_6_1| and is most
10+
frequently used to define a zone in master files, more commonly known as
11+
zone files. The term "presentation format" is officially established in
12+
|url::rfc8499_section_5|.
13+
14+
The presentation format is a concise tabular serialization format with
15+
provisions for convenient editing. The DNS is intentionally extensible and
16+
many RFCs define additional types and the typical representation for the
17+
corresponding RDATA sections. Consequently, the presentation format is not
18+
defined by one single specification, but rather many specifications.
19+
20+
The presentation format is NOT context-free and correct interpretation of the
21+
specification(s) is rather dependent on extensive knowledge of the DNS.
22+
23+
.. note::
24+
This document is meant to be a concise source on interpretation of the
25+
presentation format, but is still very much a work in progress. Please
26+
consider contributing if anything is unclear or incorrect.
27+
28+
Format
29+
======
30+
31+
.. note:: Modified text from |url::rfc1035_section_5_1|
32+
33+
The presentation format defines a number of entries. Entries are predominantly
34+
line-oriented, though parentheses can be use to continue a list of items
35+
across a line boundrary, and text literals can contain CRLF within the text.
36+
Any combination of tabs and spaces act as a delimiter between the separate
37+
items that make up an entry. The end of any line can end with a comment.
38+
Comments start with a ``;`` (semicolon).
39+
40+
The following entries are defined:
41+
42+
<blank>[<comment>]
43+
44+
$ORIGIN <domain-name> [<comment>]
45+
46+
$INCLUDE <file-name> [<domain-name>] [<comment>]
47+
48+
$TTL <TTL> [<comment>]
49+
50+
<domain-name><rr> [<comment>]
51+
52+
<blank><rr> [<comment>]
53+
54+
Blank lines, with or without comments, are allowed anywhere in the file.
55+
56+
Three control entries are defined: $ORIGIN, $INCLUDE and $TTL (defined in
57+
|url::rfc2308_section_4|). $ORIGIN is followed by a domain name, and resets the
58+
current origin for relative domain names to the stated name. $INCLUDE inserts
59+
the named file into the current file, and may optionally specify a domain name
60+
that sets the relative domain name origin for the included file. $INCLUDE may
61+
also have a comment. Note that a $INCLUDE entry never changes the relative
62+
origin of the parent file, regardless of changes to the relative origin made
63+
within the included file. $TTL is followed by a decimal integer, and resets
64+
the default TTL for RRs which do not explicitly include a TTL value.
65+
66+
The last two forms represent RRs. If an entry for an RR begins with a
67+
``<blank>``, then the RR is assumed to be owned by the last stated owner. If
68+
an RR entry begins with a ``<domain-name>``, then the owner name is reset.
69+
70+
``<rr>`` contents take one of the following forms:
71+
72+
[<TTL>] [<class>] <type> <RDATA>
73+
74+
[<class>] [<TTL>] <type> <RDATA>
75+
76+
The RR begins with optional TTL and class fields, followed by a type and
77+
RDATA field appropriate to the type and class. Class and type use the
78+
standard mnemonics, TTL is a decimal integer. Omitted class and TTL
79+
values are default to the last explicitly stated values. Since type and
80+
class mnemonics are disjoint, the parse is unique. (Note that this
81+
order is different from wire format order; the given order allows easier
82+
parsing and defaulting.)
83+
84+
<domain-name>s make up a large share of the data in the master file.
85+
The labels in the domain name are expressed as character strings and
86+
separated by dots. Quoting conventions allow arbitrary characters to be
87+
stored in domain names. Domain names that end in a dot are called
88+
absolute, and are taken as complete. Domain names which do not end in a
89+
dot are called relative; the actual domain name is the concatenation of
90+
the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as
91+
an argument to the master file loading routine. A relative name is an
92+
error when no origin is available.
93+
94+
<character-string> is expressed in one or two ways: as a contiguous set of
95+
characters without interior spaces, or as a string beginning with a ``"``
96+
and ending with a ``"``. Inside a ``"`` delimited string any character can
97+
occur, except for a ``"`` itself, which must be quoted using ``\\``
98+
(backslash).
99+
100+
Because these files are text files several special encodings are
101+
necessary to allow arbitrary data to be loaded. In particular:
102+
103+
of the root.
104+
105+
@ A free standing @ is used to denote the current origin.
106+
107+
\X where X is any character other than a digit (0-9), is
108+
used to quote that character so that its special meaning
109+
does not apply. For example, "\." can be used to place
110+
a dot character in a label.
111+
112+
\DDD where each D is a digit is the octet corresponding to
113+
the decimal number described by DDD. The resulting
114+
octet is assumed to be text and is not checked for
115+
special meaning.
116+
117+
( ) Parentheses are used to group data that crosses a line
118+
boundary. In effect, line terminations are not
119+
recognized within parentheses.
120+
121+
; Semicolon is used to start a comment; the remainder of
122+
the line is ignored.
123+
124+
125+
Handling of Unknown DNS Resource Record (RR) Types
126+
--------------------------------------------------
127+
128+
The intentional extensibility in the DNS may lead to software implementations
129+
lagging behind in support. |url::rfc3597_section_5| introduces generic
130+
notations to represent unknown types, classes and the corresponding RDATA in
131+
text form.
132+
133+
.. note:: Modified text from |url::rfc3597_section_5|.
134+
135+
The type field for an unknown RR type is represented by the word ``TYPE``
136+
immediately followed by the decimal RR type code, with no intervening
137+
whitespace. In the class field, an unknown class is similarly represented
138+
as the word ``CLASS`` immediately followed by the decimal class code.
139+
140+
This convention allows types and classes to be distinguished from each other
141+
and from TTL values, allowing both <rr> forms to be unambiguously parsed.
142+
143+
[<TTL>] [<class>] <type> <RDATA>
144+
145+
[<class>] [<TTL>] <type> <RDATA>
146+
147+
148+
The RDATA section of an RR of unknown type is represented as a sequence of
149+
white space separated words as follows:
150+
151+
The special token ``\\#`` (a backslash immediately followed by a hash
152+
sign), which identifies the RDATA as having the generic encoding
153+
defined herein rather than a traditional type-specific encoding.
154+
155+
An unsigned decimal integer specifying the RDATA length in octets.
156+
157+
Zero or more words of hexadecimal data encoding the actual RDATA field,
158+
each containing an even number of hexadecimal digits.
159+
160+
If the RDATA is of zero length, the text representation contains only the
161+
``\\#`` token and the single zero representing the length.
162+
163+
Even though an RR of known type represented in the ``\#`` format is effectively
164+
treated as an unknown type for the purpose of parsing the RDATA text
165+
representation, all further processing by the server MUST treat it as a
166+
known type and take into account any applicable type-specific rules regarding
167+
compression, canonicalization, etc.
168+
169+
170+
Service Binding and Parameter Specification via the DNS
171+
-------------------------------------------------------
172+
173+
|url::rfc9460| introduces a key-value syntax to the presentation format for
174+
the ``SVCB`` and ``HTTPS`` type (initially). The addition is a major change
175+
for implementors of presentation format parsers.
176+
177+
.. note::
178+
Write (or copy) a section on the format from |url::rfc9460_section_2_1|.
179+
180+
The RFC specifies a number of initial Service Parameter Keys (SvcParamKeys).
181+
IANA maintains these and additional keys in the Service Parameter Keys
182+
(SvcParamKeys) registry in the |url::dns-svcb| category.
183+
184+
alpn and no-default-alpn
185+
^^^^^^^^^^^^^^^^^^^^^^^^
186+
187+
|url::rfc9460_section_7_1_1| specifies the ``alpn`` and ``no-default-alpn``
188+
SvcParamKeys. The ``alpn`` SvcParamKey takes a comma-separated list of
189+
Application-Layer Protocol Negotiation (ALPN) Protocol IDs (maintained
190+
by IANA in the |url::tls-extensiontype-values| category), the syntax for which
191+
is defined in |url::rfc9460_appendix_a_1|.
192+
193+
A problem arises when items in the comma-separated list may contain a ``,``
194+
(comma) or ``\\`` (backslash). |url::rfc9460_section_2_1| specifies
195+
SvcParamValue to be a ``char-string`` and some implementations (incorrectly)
196+
unescape ``char-string`` during the scanner stage. Consequently, the fact that
197+
a character is ``escaped`` (``\000`` or ``\X``) is lost to the comma-separated
198+
list parser. None of the registered protocol identifiers (currently) contains
199+
a ``,`` (comma) and the specification dismisses the issue in the interest of
200+
progress.
201+
202+
|url::rfc9460_appendix_a_1| specifies ``simple-comma-separated``, for lists of
203+
items that cannot contain either of the aforementioned characters, and
204+
``comma-separated`` for lists of items that can. The specification overlooks
205+
that ``alpn``, or comma-separated lists, are encoded on the wire as a sequence
206+
of strings, or a sequence of length octet followed by a maximum of 255 data
207+
octets. A name server writing a transfer to disk in plain text can therefore
208+
not encode data using the ``simple-comma-separated`` scheme.
209+
210+
The specification contradicts itself in |url::rfc9460_section_7_1_1| by
211+
stating that presentation format parsers MAY simply disallow the ``,`` and
212+
``\\`` characters in ALPN IDs instead of implementing the value-list escaping
213+
procedure by relying on the opaque key format (e.g., ``key1=\002h2``) in the
214+
event that these characters are needed. Since SvcParamValue is defined to be
215+
``char-string``, the problem persists. To implementations that unescape during
216+
the scanner stage, the escape sequence is still lost and implementations that
217+
unescape during the parser stage did not have the problem to start with.
218+
219+
|url::rfc9460| incorrectly assumes that ``char-string`` presents text.
220+
Programming languages typically classify a token as string if it is quoted,
221+
an identifier or keyword if it is a contiguous set of characters, etc.
222+
Unescaping is then typically done by the scanner because tokens can be
223+
classified during that stage. The presentation format defines basic syntax to
224+
identify tokens, but as the format is NOT context-free and intentionally
225+
existensible, the token can only be classified during the parser stage. Simply
226+
put, ``char-string`` in the presentation format cannot be unescaped during the
227+
scanner stage as the scanner does not know the type of information the
228+
``char-string`` presents. Domain names are a prime example.
229+
230+
The RR ``foo. NS \.`` defines ``bar\.`` as a relative domain name. The ``\\``
231+
(backslash) is important because it signals that the trailing dot does not
232+
serve as a label separator.
233+
234+
.. note::
235+
This issue has been `discussed <https://mailarchive.ietf.org/arch/msg/dnsop/SXnlsE1B8gmlDjn4HtOo1lwtqAI/>` on the DNSOP IETF mailing list.
236+
237+
As BIND, Knot and NSD implement double escaping, so does simdzone even though
238+
the behavior is incorrect.
239+

doc/manual/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Fast and standards compliant DNS presentation format parser.
88
DNS resource records (RRs) can be expressed in text form using the
99
presentation format. The format is most frequently used to define a zone in
1010
master files, more commonly known as zone files, and is best considered a
11-
tabular serialization format with provisions for convenient editing.
11+
concise tabular serialization format with provisions for convenient editing.
1212

1313
The format is originally defined in RFC1035 section 5 and
1414
RFC1034 section 3.6.1, but as the DNS is intentionally extensible, the format
@@ -20,4 +20,5 @@ has been extended over time.
2020
building_sources
2121
getting_started
2222
api_reference
23+
format
2324
design_notes

doc/manual/links.rst

+44
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,47 @@
2929
.. |url::simdjson| raw:: html
3030

3131
<a href="https://github.com/simdjson/simdjson" target="_blank">simdjson</a>
32+
33+
.. |url::dns-svcb| raw:: html
34+
35+
<a href="https://www.iana.org/assignments/dns-svcb/dns-svcb.xhtml" target="_blank">DNS Service Bindings (SVCB)</a>
36+
37+
.. |url::tls-extensiontype-values| raw:: html
38+
39+
<a href="https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#alpn-protocol-ids" target="_blank">TLS Application-Layer Protocol Negotiation (ALPN) Protocol IDs</a>
40+
41+
.. |url::rfc1034_section_3_6_1| raw:: html
42+
43+
<a href="https://datatracker.ietf.org/doc/html/rfc1034#section-3.6.1" target="_blank">RFC1034 section 3.6.1</a>
44+
45+
.. |url::rfc1035_section_5_1| raw:: html
46+
47+
<a href="https://datatracker.ietf.org/doc/html/rfc1035#section-5.1" target="_blank">RFC1035 section 5.1</a>
48+
49+
.. |url::rfc2308_section_4| raw:: html
50+
51+
<a href="https://datatracker.ietf.org/doc/html/rfc2308#section-4" target="_blank">RFC2308 section 4</a>
52+
53+
.. |url::rfc3597_section_5| raw:: html
54+
55+
<a href="https://datatracker.ietf.org/doc/html/rfc3597#section-5" target="_blank">RFC3597 section 5</a>
56+
57+
.. |url::rfc8499_section_5| raw:: html
58+
59+
<a href="https://datatracker.ietf.org/doc/html/rfc8499#section-5" target="_blank">RFC8499 section 5</a>
60+
61+
.. |url::rfc9460| raw:: html
62+
63+
<a href="https://datatracker.ietf.org/doc/html/rfc9460" target="_blank">RFC9460</a>
64+
65+
.. |url::rfc9460_section_2_1| raw:: html
66+
67+
<a href="https://datatracker.ietf.org/doc/html/rfc9460#section-2.1" target="_blank">RFC9460 section 2.1</a>
68+
69+
.. |url::rfc9460_section_7_1_1| raw:: html
70+
71+
<a href="https://datatracker.ietf.org/doc/html/rfc9460#section-7.1.1" target="_blank">RFC9460 section 7.1.1</a>
72+
73+
.. |url::rfc9460_appendix_a_1| raw:: html
74+
75+
<a href="https://datatracker.ietf.org/doc/html/rfc9460#appendix-A.1" target="_blank">RFC9460 Appendix A</a>

0 commit comments

Comments
 (0)