Skip to content

Commit f54bafb

Browse files
committed
Revise notes on the presentation format
Applying the default TTL to RRs that omit it is not as straightforward as one might think. Use detailed notes by @wtoorop to update notes accordingly.
1 parent a411742 commit f54bafb

File tree

2 files changed

+126
-61
lines changed

2 files changed

+126
-61
lines changed

FORMAT.md

+124-61
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,13 @@ document aims to clarify the format by listing (some of) the relevant
1010
specifications and then proceed to explain why certain design decisions were
1111
made in simdzone.
1212

13-
* [RFC1035 Section 5](https://datatracker.ietf.org/doc/html/rfc1035#section-5)
14-
* [RFC2308 Section 4](https://datatracker.ietf.org/doc/html/rfc2308#section-4)
15-
* [RFC3597 Section 5](https://datatracker.ietf.org/doc/html/rfc3597#section-5)
16-
* [draft-ietf-dnsop-svcb-https Section 2.1](https://www.ietf.org/archive/id/draft-ietf-dnsop-svcb-https-12.html#name-zone-file-presentation-form)
13+
* [RFC 1034 Section 3.6.1][rfc1034#3.6.1]
14+
* [RFC 1035 Section 5][rfc1035#5]
15+
* [RFC 2065 Section 4.5][rfc2065#4.5]
16+
* [RFC 2181 Section 8][rfc2181#8]
17+
* [RFC 2308 Section 4][rfc2308#4]
18+
* [RFC 3597 Section 5][rfc3597#5]
19+
* [RFC 9460 Section 2.1][rfc9460#2.1]
1720

1821

1922
## Clarification (work-in-progress)
@@ -22,7 +25,7 @@ made in simdzone.
2225
2326
Historically, master files where edited by hand, which is reflected in the
2427
syntax. Consider the format a tabular serialization format with provisions
25-
for easier editing. i.e. the owner, class and ttl fields may be omitted
28+
for convenient editing. i.e. the owner, class and ttl fields may be omitted
2629
(provided the line starts with \<blank\> for the owner) and $INCLUDE directives
2730
can be used for templating.
2831

@@ -31,12 +34,13 @@ may represent either a type, class or ttl and a symbolic constant, e.g. A
3134
or NS, may have a different meaning if specified as an RDATA field.
3235

3336
The DNS is intentionally extensible. The specification is not explicit about
34-
how that affects syntax, but it may explain why no specific notation for
35-
data-types is enforced. To make it easier for data-types to be added at a later
36-
stage the syntax cannot enforce a certain notation (or the scanner would need
37-
to be revised). As such, it seems logical for the scanner to only identify
38-
character strings, which can be expressed as either a contiguous set of
39-
characters without interior spaces, or as a quoted string.
37+
how that affects syntax, but it explains why no specific notation for
38+
data-types can be enforced by RFC 1035. To make it easier for data-types to
39+
be added at a later stage the syntax cannot enforce a certain notation (or
40+
the scanner would need to be revised). Consequently, the scanner only
41+
identifies items (or fields) and structural characters, which can be
42+
expressed as either a contiguous set of characters without interior spaces,
43+
or as a quoted string.
4044

4145
The format allows for including structural characters in fields by means of
4246
escaping the actual character or enclosing the field in quotes. The example
@@ -45,40 +49,35 @@ The dot is normally a label separator, replaced by the length of the label
4549
on the wire. If a domain name includes an actual ASCII dot, the character
4650
must be escaped in the textual representation (`\X` or `\DDD`).
4751

48-
Note that ASCII dot characters must be escaped whether the name is contained
49-
in a quoted section or not. The same is not true for newlines and parentheses.
52+
Note that ASCII dot characters strictly speaking do not have to be escaped
53+
in a quoted string. RFC 1035 clearly states labels in domain names are
54+
expressed as character strings. However, behavior differs across
55+
implementations, so support for quoted labels is best dropped (see below).
5056

51-
Going by the specification, integer values like the TTL may be written as
52-
a plain number, contain escape sequences (\DDD can encode an ASCII digit) or
53-
may be enclosed in quotes. However, going by common sense, writing it down as
54-
anything but a plain number only requires more space and needlessly
55-
complicates things (impacting parsing performance). The pragmatic approach is
56-
to allow escape sequences only in fields that may actually contain data that
57-
needs escaping (domain names and text strings).
57+
RFC 1035 states both \<contiguous\> and \<quoted\> are \<character-string\>.
58+
Meaning, items can be either \<contiguous\> or \<quoted\>. Wether a specific
59+
item is interpreted as a \<character-string\> depends on type of value for
60+
that item. E.g., TTLs are decimal integers and therefore cannot be expressed
61+
as \<quoted\> as it is not a \<character-string\>. Similarly, base64
62+
sequences are encoded binary blobs, not \<character-string\>s and therefore
63+
cannot be expressed as such. Escape sequences are valid only in
64+
\<character-string\>s.
5865

59-
RFC1035 states both \<contiguous\> and \<quoted\> are \<character-string\>.
60-
However, it makes little sense to quote e.g. a TTL because it cannot contain
61-
characters that overlap with any structural characters and in practice, it
62-
really never happens. The same applies to base64 sequences, which was
63-
specifically designed to encode binary data in printable ASCII characters. To
64-
quote a field and include whitespace is more-or-less instructing the parser
65-
to not ignore it. Fields that cannot contain structural characters, i.e.
66-
anything other than domain names and text strings, MUST not be quoted.
66+
* Mnemonics are NOT character strings.
6767

68-
> BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA
69-
> RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL
70-
> header fields all result in a syntax error too if quoted.
68+
> BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA
69+
> RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL
70+
> header fields all result in a syntax error too if quoted.
7171
72-
73-
* Some integer fields allow for using symbolic values. e.g. the algorithm
72+
* Some integer fields allow for using mnemonics too. E.g., the algorithm
7473
field in RRSIG records.
7574

76-
* RFC1035 states: A freestanding @ denotes the current origin.
75+
* RFC 1035 states: A freestanding @ denotes the current origin.
7776
There has been discussion in which locations @ is interpreted as the origin.
7877
e.g. how is a freestanding @ be interpreted in the RDATA section of a TXT RR.
7978
Note that there is no mention of text expansion in the original text. A
8079
freestanding @ denotes the origin. As such, it stands to reason that it's
81-
use is limited to locations where domain names are expected, which also
80+
use is limited to locations where domain names are expressed, which also
8281
happens to be the most practical way to implement the functionality.
8382

8483
> This also seems to be the behavior that other name servers implement (at
@@ -111,24 +110,24 @@ anything other than domain names and text strings, MUST not be quoted.
111110
112111
* The encoding is non-ASCII. Some characters have special meaning, but users
113112
are technically allowed to put in non-printable octets outside the ASCII
114-
range without custom encoding.
115-
Of course, this rarely occurs in practice and users are encouraged to use
116-
the \DDD encoding for "special".
113+
range without custom encoding. Of course, this rarely occurs in practice
114+
and users are encouraged to use the \DDD encoding for "special".
117115

118116
* Parenthesis may not be nested.
119117

120118
* $ORIGIN must be an absolute domain.
121119

122-
* Escape sequences must not be unescaped in the lexer as is common with
120+
* Escape sequences must NOT be unescaped in the scanner as is common with
123121
programming languages like C that have a preprocessor. Instead, the
124-
original text is necessary in the parsing stage to distinguish between dots.
122+
original text is necessary in the parsing stage to distinguish between
123+
label separators (dots).
125124

126-
* RFC1035 specifies that the current origin should be restored after an
125+
* RFC 1035 specifies that the current origin should be restored after an
127126
$INCLUDE, but it is silent on whether the current domain name should also be
128127
restored. BIND 9 restores both of them. This could be construed as a
129128
deviation from RFC 1035, a feature, or both.
130129

131-
* RFC1035 states: and text literals can contain CRLF within the text.
130+
* RFC 1035 states: and text literals can contain CRLF within the text.
132131
BIND, however, does not allow newlines in text (escaped or not). For
133132
performance reasons, we may adopt the same behavior as that would relieve
134133
the need to keep track of possibly embedded newlines.
@@ -148,14 +147,65 @@ anything other than domain names and text strings, MUST not be quoted.
148147
> the number of includes to 10 by default (compile option). For security, it
149148
> must be possible to set a hard limit.
150149
151-
* Should quoting of domain names be supported?
152-
RFC1035: The labels in the domain name are expressed as character strings
153-
and separated by dots.
154-
RFC1035: \<character-string\> is expressed in one or two ways:
155-
as \<contiguous\> (characters without interior spaces), or as \<quoted\>.
156-
157-
However, quoted domain names are very uncommon. Implementations handle
158-
quoted names both in OWNER and RDATA very differently.
150+
* Default values for TTLs can be quite complicated.
151+
152+
A [commit to ldns](https://github.com/NLnetLabs/ldns/commit/cb101c9) by
153+
@wtoorop nicely sums it up in code.
154+
155+
RFC 1035 section 5.1:
156+
> Omitted class and TTL values are default to the last explicitly stated
157+
> values.
158+
159+
This behavior is updated by RFC 2308 section 4:
160+
> All resource records appearing after the directive, and which do not
161+
> explicitly include a TTL value, have their TTL set to the TTL given
162+
> in the $TTL directive. SIG records without a explicit TTL get their
163+
> TTL from the "original TTL" of the SIG record [RFC 2065 Section 4.5].
164+
165+
The TTL rules for `SIG` RRs stated in RFC 2065 Section 4.5:
166+
> If the original TTL, which applies to the type signed, is the same as
167+
> the TTL of the SIG RR itself, it may be omitted. The date field
168+
> which follows it is larger than the maximum possible TTL so there is
169+
> no ambiguity.
170+
171+
The same applies applies to `RRSIG` RRs, although not stated as explicitly
172+
in RFC 4034 Section 3:
173+
> The TTL value of an RRSIG RR MUST match the TTL value of the RRset it
174+
> covers. This is an exception to the [RFC2181] rules for TTL values
175+
> of individual RRs within a RRset: individual RRSIG RRs with the same
176+
> owner name will have different TTL values if the RRsets they cover
177+
> have different TTL values.
178+
179+
Logic spanning RRs must not be handled during deserialization. The order in
180+
which RRs appear in the zone file is not relevant and keeping a possibly
181+
infinite backlog of RRs to handle it "automatically" is inefficient. As
182+
the name server retains RRs in a database already it seems most elegant to
183+
signal the TTL value was omitted and a default was used so that it may be
184+
updated in some post processing step.
185+
186+
[RFC 2181 Section 8][rfc2181#8] contains additional notes on the maximum
187+
value for TTLs. During deserialization, any value exceeding the specified
188+
maximum is considered an error in "primary" mode. The error is downgraded
189+
to a warning in "secondary" mode.
190+
191+
[RFC 2181 Section 5][rfc2181#5.2] states the TTLs of all RRs in an RRSet
192+
must be the same. As with default values for `SIG` and `RRSIG` RRs, this
193+
must NOT be handled during deserialization. Presumably, the application
194+
should transparently fix TTLs (NLnetLabs/nsd#178).
195+
196+
* Do NOT allow for quoted labels in domain names.
197+
[RFC 1035 Section 5][rfc1035#5] states:
198+
> The labels in the domain name are expressed as character strings and
199+
> separated by dots.
200+
201+
[RFC 1035 section 5][rfc1035#5] states:
202+
> \<character-string\> is expressed in one or two ways: as a contiguous set
203+
> of characters without interior spaces, or as string beginning with a " and
204+
> ending with a ".
205+
206+
However, quoted labels in domain names are very uncommon and implementations
207+
handle quoted names both in OWNER and RDATA very differently. The Flex+Bison
208+
based parser used in NSD before was the only parser that got it right.
159209

160210
* BIND
161211
* owner: yes, interpreted as quoted
@@ -185,17 +235,14 @@ anything other than domain names and text strings, MUST not be quoted.
185235
example.com. xxx IN NS \"quoted.example.com.\".example.com.
186236
```
187237
188-
> The text "The labels in the domain name" can be confusing as one might
189-
> interpret that as stating that each label can individually can be quoted,
190-
> that is however not the case. NSD and BIND both print a syntax error if
191-
> such a construct occurs.
192-
193238
> [libzscanner](https://github.com/CZ-NIC/knot/tree/master/src/libzscanner),
194239
> the (standalone) zone parser used by Knot seems mosts consistent.
195240
241+
Drop support for quoted labels or domain names for consistent behavior.
242+
196243
* Should any domain names that are not valid host names as specified by
197-
RFC1123 section 2, i.e. use characters not in the preferred naming syntax
198-
as specified by RFC1035 section 2.3.1, be accepted? RFC2181 section 11 is
244+
RFC 1123 section 2, i.e. use characters not in the preferred naming syntax
245+
as specified by RFC 1035 section 2.3.1, be accepted? RFC 2181 section 11 is
199246
very specific on this topic, but it merely states that labels may contain
200247
characters outside the set on the wire, it does not address what is, or is
201248
not, allowed in zone files.
@@ -205,9 +252,9 @@ anything other than domain names and text strings, MUST not be quoted.
205252
additionally accepts `-`, `_` and `/` according to
206253
[NOTES](https://github.com/CZ-NIC/knot/blob/master/src/libzscanner/NOTES).
207254
208-
* [RFC1123 section 2](https://datatracker.ietf.org/doc/html/rfc1123#section-2)
209-
* [RFC1035 section 2.3.1](https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1)
210-
* [RFC2181 section 11](https://datatracker.ietf.org/doc/html/rfc2181#section-11)
255+
* [RFC1035 Section 2.3.1][rfc1035#2.3.1]
256+
* [RFC1123 Section 2][rfc1123#2]
257+
* [RFC2181 Section 11][rfc2181#11]
211258
212259
* RFC1035 specifies two control directives "$INCLUDE" and "$ORIGIN". RFC2308
213260
specifies the "$TTL" directive. BIND additionally implements the "$DATE" and
@@ -236,4 +283,20 @@ anything other than domain names and text strings, MUST not be quoted.
236283
strings longer than 255 characters. Others (BIND, simdzone) will throw a
237284
syntax error.
238285
239-
* Leading zeroes in integers appear to be allowed judging by the zone file generated for the [socket10kxfr](https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64) test in NSD. BIND and Knot (and the old parser in NSD) all parsed it without problems.
286+
* Leading zeroes in integers appear to be allowed judging by the zone file
287+
generated for the [socket10kxfr][socket10kxfr.pre#L64] test in NSD. BIND
288+
and Knot parsed it without problems too.
289+
290+
[rfc1034#3.6.1]: (https://datatracker.ietf.org/doc/html/rfc1034#section-3.6.1)
291+
[rfc1035#5]: (https://datatracker.ietf.org/doc/html/rfc1035#section-5)
292+
[rfc1035#2.3.1]: (https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1)
293+
[rfc1123#2]: (https://datatracker.ietf.org/doc/html/rfc1123#section-2)
294+
[rfc2065#4.5]: (https://datatracker.ietf.org/doc/html/rfc2065#section-4.5)
295+
[rfc2181#5.2]: (https://datatracker.ietf.org/doc/html/rfc2181#section-5.2)
296+
[rfc2181#8]: (https://datatracker.ietf.org/doc/html/rfc2181#section-8)
297+
[rfc2181#11]: (https://datatracker.ietf.org/doc/html/rfc2181#section-11)
298+
[rfc2308#4]: (https://datatracker.ietf.org/doc/html/rfc2308#section-4)
299+
[rfc3597#5]: (https://datatracker.ietf.org/doc/html/rfc3597#section-5)
300+
[rfc9460#2.1]: (https://datatracker.ietf.org/doc/html/rfc9460#section-2.1)
301+
302+
[socket10kxfr.pre#L64]: (https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64)

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,5 @@ $ cmake --build .
8484
## Contributing
8585
Contributions in any way, shape or form are very welcome! Please see
8686
[CONTRIBUTING.md](CONTRIBUTING.md) to find out how you can help.
87+
88+
Design decisions and notes on the [FORMAT](FORMAT.md).

0 commit comments

Comments
 (0)