@@ -10,10 +10,13 @@ document aims to clarify the format by listing (some of) the relevant
10
10
specifications and then proceed to explain why certain design decisions were
11
11
made in simdzone.
12
12
13
- * [ RFC1035 Section 5] ( https://datatracker.ietf.org/doc/html/rfc1035#section-5 )
14
- * [ RFC2308 Section 4] ( https://datatracker.ietf.org/doc/html/rfc2308#section-4 )
15
- * [ RFC3597 Section 5] ( https://datatracker.ietf.org/doc/html/rfc3597#section-5 )
16
- * [ draft-ietf-dnsop-svcb-https Section 2.1] ( https://www.ietf.org/archive/id/draft-ietf-dnsop-svcb-https-12.html#name-zone-file-presentation-form )
13
+ * [ RFC 1034 Section 3.6.1] [ rfc1034#3.6.1 ]
14
+ * [ RFC 1035 Section 5] [ rfc1035#5 ]
15
+ * [ RFC 2065 Section 4.5] [ rfc2065#4.5 ]
16
+ * [ RFC 2181 Section 8] [ rfc2181#8 ]
17
+ * [ RFC 2308 Section 4] [ rfc2308#4 ]
18
+ * [ RFC 3597 Section 5] [ rfc3597#5 ]
19
+ * [ RFC 9460 Section 2.1] [ rfc9460#2.1 ]
17
20
18
21
19
22
## Clarification (work-in-progress)
@@ -22,7 +25,7 @@ made in simdzone.
22
25
23
26
Historically, master files where edited by hand, which is reflected in the
24
27
syntax. Consider the format a tabular serialization format with provisions
25
- for easier editing. i.e. the owner, class and ttl fields may be omitted
28
+ for convenient editing. i.e. the owner, class and ttl fields may be omitted
26
29
(provided the line starts with \< blank\> for the owner) and $INCLUDE directives
27
30
can be used for templating.
28
31
@@ -31,12 +34,13 @@ may represent either a type, class or ttl and a symbolic constant, e.g. A
31
34
or NS, may have a different meaning if specified as an RDATA field.
32
35
33
36
The DNS is intentionally extensible. The specification is not explicit about
34
- how that affects syntax, but it may explain why no specific notation for
35
- data-types is enforced. To make it easier for data-types to be added at a later
36
- stage the syntax cannot enforce a certain notation (or the scanner would need
37
- to be revised). As such, it seems logical for the scanner to only identify
38
- character strings, which can be expressed as either a contiguous set of
39
- characters without interior spaces, or as a quoted string.
37
+ how that affects syntax, but it explains why no specific notation for
38
+ data-types can be enforced by RFC 1035. To make it easier for data-types to
39
+ be added at a later stage the syntax cannot enforce a certain notation (or
40
+ the scanner would need to be revised). Consequently, the scanner only
41
+ identifies items (or fields) and structural characters, which can be
42
+ expressed as either a contiguous set of characters without interior spaces,
43
+ or as a quoted string.
40
44
41
45
The format allows for including structural characters in fields by means of
42
46
escaping the actual character or enclosing the field in quotes. The example
@@ -45,40 +49,35 @@ The dot is normally a label separator, replaced by the length of the label
45
49
on the wire. If a domain name includes an actual ASCII dot, the character
46
50
must be escaped in the textual representation (` \X ` or ` \DDD ` ).
47
51
48
- Note that ASCII dot characters must be escaped whether the name is contained
49
- in a quoted section or not. The same is not true for newlines and parentheses.
52
+ Note that ASCII dot characters strictly speaking do not have to be escaped
53
+ in a quoted string. RFC 1035 clearly states labels in domain names are
54
+ expressed as character strings. However, behavior differs across
55
+ implementations, so support for quoted labels is best dropped (see below).
50
56
51
- Going by the specification, integer values like the TTL may be written as
52
- a plain number, contain escape sequences (\DDD can encode an ASCII digit) or
53
- may be enclosed in quotes. However, going by common sense, writing it down as
54
- anything but a plain number only requires more space and needlessly
55
- complicates things (impacting parsing performance). The pragmatic approach is
56
- to allow escape sequences only in fields that may actually contain data that
57
- needs escaping (domain names and text strings).
57
+ RFC 1035 states both \< contiguous\> and \< quoted\> are \< character-string\> .
58
+ Meaning, items can be either \< contiguous\> or \< quoted\> . Wether a specific
59
+ item is interpreted as a \< character-string\> depends on type of value for
60
+ that item. E.g., TTLs are decimal integers and therefore cannot be expressed
61
+ as \< quoted\> as it is not a \< character-string\> . Similarly, base64
62
+ sequences are encoded binary blobs, not \< character-string\> s and therefore
63
+ cannot be expressed as such. Escape sequences are valid only in
64
+ \< character-string\> s.
58
65
59
- RFC1035 states both \< contiguous\> and \< quoted\> are \< character-string\> .
60
- However, it makes little sense to quote e.g. a TTL because it cannot contain
61
- characters that overlap with any structural characters and in practice, it
62
- really never happens. The same applies to base64 sequences, which was
63
- specifically designed to encode binary data in printable ASCII characters. To
64
- quote a field and include whitespace is more-or-less instructing the parser
65
- to not ignore it. Fields that cannot contain structural characters, i.e.
66
- anything other than domain names and text strings, MUST not be quoted.
66
+ * Mnemonics are NOT character strings.
67
67
68
- > BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA
69
- > RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL
70
- > header fields all result in a syntax error too if quoted.
68
+ > BIND does not accept quoted fields for A or NS RDATA. TTL values in SOA
69
+ > RDATA, base64 Signature in DNSKEY RDATA, as well as type, class and TTL
70
+ > header fields all result in a syntax error too if quoted.
71
71
72
-
73
- * Some integer fields allow for using symbolic values. e.g. the algorithm
72
+ * Some integer fields allow for using mnemonics too. E.g., the algorithm
74
73
field in RRSIG records.
75
74
76
- * RFC1035 states: A freestanding @ denotes the current origin.
75
+ * RFC 1035 states: A freestanding @ denotes the current origin.
77
76
There has been discussion in which locations @ is interpreted as the origin.
78
77
e.g. how is a freestanding @ be interpreted in the RDATA section of a TXT RR.
79
78
Note that there is no mention of text expansion in the original text. A
80
79
freestanding @ denotes the origin. As such, it stands to reason that it's
81
- use is limited to locations where domain names are expected , which also
80
+ use is limited to locations where domain names are expressed , which also
82
81
happens to be the most practical way to implement the functionality.
83
82
84
83
> This also seems to be the behavior that other name servers implement (at
@@ -111,24 +110,24 @@ anything other than domain names and text strings, MUST not be quoted.
111
110
112
111
* The encoding is non-ASCII. Some characters have special meaning, but users
113
112
are technically allowed to put in non-printable octets outside the ASCII
114
- range without custom encoding.
115
- Of course, this rarely occurs in practice and users are encouraged to use
116
- the \DDD encoding for "special".
113
+ range without custom encoding. Of course, this rarely occurs in practice
114
+ and users are encouraged to use the \DDD encoding for "special".
117
115
118
116
* Parenthesis may not be nested.
119
117
120
118
* $ORIGIN must be an absolute domain.
121
119
122
- * Escape sequences must not be unescaped in the lexer as is common with
120
+ * Escape sequences must NOT be unescaped in the scanner as is common with
123
121
programming languages like C that have a preprocessor. Instead, the
124
- original text is necessary in the parsing stage to distinguish between dots.
122
+ original text is necessary in the parsing stage to distinguish between
123
+ label separators (dots).
125
124
126
- * RFC1035 specifies that the current origin should be restored after an
125
+ * RFC 1035 specifies that the current origin should be restored after an
127
126
$INCLUDE, but it is silent on whether the current domain name should also be
128
127
restored. BIND 9 restores both of them. This could be construed as a
129
128
deviation from RFC 1035, a feature, or both.
130
129
131
- * RFC1035 states: and text literals can contain CRLF within the text.
130
+ * RFC 1035 states: and text literals can contain CRLF within the text.
132
131
BIND, however, does not allow newlines in text (escaped or not). For
133
132
performance reasons, we may adopt the same behavior as that would relieve
134
133
the need to keep track of possibly embedded newlines.
@@ -148,14 +147,66 @@ anything other than domain names and text strings, MUST not be quoted.
148
147
> the number of includes to 10 by default (compile option). For security, it
149
148
> must be possible to set a hard limit.
150
149
151
- * Should quoting of domain names be supported?
152
- RFC1035: The labels in the domain name are expressed as character strings
153
- and separated by dots.
154
- RFC1035: \< character-string\> is expressed in one or two ways:
155
- as \< contiguous\> (characters without interior spaces), or as \< quoted\> .
156
-
157
- However, quoted domain names are very uncommon. Implementations handle
158
- quoted names both in OWNER and RDATA very differently.
150
+ * Default values for TTLs can be quite complicated.
151
+
152
+ A [ commit to ldns] ( https://github.com/NLnetLabs/ldns/commit/cb101c9 ) by
153
+ @wtoorop nicely sums it up in code.
154
+
155
+ RFC 1035 section 5.1:
156
+ > Omitted class and TTL values are default to the last explicitly stated
157
+ > values.
158
+
159
+ This behavior is updated by RFC 2308 section 4:
160
+ > All resource records appearing after the directive, and which do not
161
+ > explicitly include a TTL value, have their TTL set to the TTL given
162
+ > in the $TTL directive. SIG records without a explicit TTL get their
163
+ > TTL from the "original TTL" of the SIG record [ RFC 2065 Section 4.5] .
164
+
165
+ The TTL rules for ` SIG ` RRs stated in RFC 2065 Section 4.5:
166
+ > If the original TTL, which applies to the type signed, is the same as
167
+ > the TTL of the SIG RR itself, it may be omitted. The date field
168
+ > which follows it is larger than the maximum possible TTL so there is
169
+ > no ambiguity.
170
+
171
+ The same applies applies to ` RRSIG ` RRs, although not stated as explicitly
172
+ in RFC 4034 Section 3:
173
+ > The TTL value of an RRSIG RR MUST match the TTL value of the RRset it
174
+ > covers. This is an exception to the [ RFC2181] rules for TTL values
175
+ > of individual RRs within a RRset: individual RRSIG RRs with the same
176
+ > owner name will have different TTL values if the RRsets they cover
177
+ > have different TTL values.
178
+
179
+ Logic spanning RRs must not be handled during deserialization. The order in
180
+ which RRs appear in the zone file is not relevant and keeping a possibly
181
+ infinite backlog of RRs to handle it "automatically" is inefficient. As
182
+ the name server retains RRs in a database already it seems most elegant to
183
+ signal the TTL value was omitted and a default was used so that it may be
184
+ updated in some post processing step.
185
+
186
+ [ RFC 2181 Section 8] [ rfc2181#8 ] contains additional notes on the maximum
187
+ value for TTLs. During deserialization, any value exceeding 2147483647 is
188
+ considered an error in primary mode, or a warning in secondary mode.
189
+ [ RFC 8767 Section 4] [ rfc8767#4 ] updates the text, but the update does not
190
+ update handling during deserialization.
191
+
192
+ [ RFC 2181 Section 5] [ rfc2181#5.2 ] states the TTLs of all RRs in an RRSet
193
+ must be the same. As with default values for ` SIG ` and ` RRSIG ` RRs, this
194
+ must NOT be handled during deserialization. Presumably, the application
195
+ should transparently fix TTLs (NLnetLabs/nsd #178 ).
196
+
197
+ * Do NOT allow for quoted labels in domain names.
198
+ [ RFC 1035 Section 5] [ rfc1035#5 ] states:
199
+ > The labels in the domain name are expressed as character strings and
200
+ > separated by dots.
201
+
202
+ [ RFC 1035 section 5] [ rfc1035#5 ] states:
203
+ > \< character-string\> is expressed in one or two ways: as a contiguous set
204
+ > of characters without interior spaces, or as string beginning with a " and
205
+ > ending with a ".
206
+
207
+ However, quoted labels in domain names are very uncommon and implementations
208
+ handle quoted names both in OWNER and RDATA very differently. The Flex+Bison
209
+ based parser used in NSD before was the only parser that got it right.
159
210
160
211
* BIND
161
212
* owner: yes, interpreted as quoted
@@ -185,17 +236,14 @@ anything other than domain names and text strings, MUST not be quoted.
185
236
example.com. xxx IN NS \"quoted.example.com.\".example.com.
186
237
```
187
238
188
- > The text "The labels in the domain name" can be confusing as one might
189
- > interpret that as stating that each label can individually can be quoted,
190
- > that is however not the case. NSD and BIND both print a syntax error if
191
- > such a construct occurs.
192
-
193
239
> [libzscanner](https://github.com/CZ-NIC/knot/tree/master/src/libzscanner),
194
240
> the (standalone) zone parser used by Knot seems mosts consistent.
195
241
242
+ Drop support for quoted labels or domain names for consistent behavior.
243
+
196
244
* Should any domain names that are not valid host names as specified by
197
- RFC1123 section 2, i.e. use characters not in the preferred naming syntax
198
- as specified by RFC1035 section 2.3.1, be accepted? RFC2181 section 11 is
245
+ RFC 1123 section 2, i.e. use characters not in the preferred naming syntax
246
+ as specified by RFC 1035 section 2.3.1, be accepted? RFC 2181 section 11 is
199
247
very specific on this topic, but it merely states that labels may contain
200
248
characters outside the set on the wire, it does not address what is, or is
201
249
not, allowed in zone files.
@@ -205,11 +253,11 @@ anything other than domain names and text strings, MUST not be quoted.
205
253
additionally accepts `-`, `_` and `/` according to
206
254
[NOTES](https://github.com/CZ-NIC/knot/blob/master/src/libzscanner/NOTES).
207
255
208
- * [RFC1123 section 2](https://datatracker.ietf.org/doc/html/rfc1123#section-2)
209
- * [RFC1035 section 2.3.1](https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1)
210
- * [RFC2181 section 11](https://datatracker.ietf.org/doc/html/ rfc2181#section-11)
256
+ * [RFC1035 Section 2.3.1][rfc1035#2.3.1]
257
+ * [RFC1123 Section 2][rfc1123#2]
258
+ * [RFC2181 Section 11][ rfc2181#11]
211
259
212
- * RFC1035 specifies two control directives "$INCLUDE" and "$ORIGIN". RFC2308
260
+ * RFC 1035 specifies two control directives "$INCLUDE" and "$ORIGIN". RFC 2308
213
261
specifies the "$TTL" directive. BIND additionally implements the "$DATE" and
214
262
"$GENERATE" directives. Since "$" (dollar sign) is not reserved, both
215
263
"$DATE" and "$GENERATE" (and "$TTL" before RFC2308) are considered valid
@@ -236,4 +284,28 @@ anything other than domain names and text strings, MUST not be quoted.
236
284
strings longer than 255 characters. Others (BIND, simdzone) will throw a
237
285
syntax error.
238
286
239
- * Leading zeroes in integers appear to be allowed judging by the zone file generated for the [socket10kxfr](https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64) test in NSD. BIND and Knot (and the old parser in NSD) all parsed it without problems.
287
+ * How do we handle the corner case where the first record does not have a TTL
288
+ when the file does not define a zone? (from @shane-kerr).
289
+
290
+ At this point in time, the application provides a default TTL value before
291
+ parsing. Whether that is the right approach is unclear, but it is what NSD
292
+ did before.
293
+
294
+ * Leading zeroes in integers appear to be allowed judging by the zone file
295
+ generated for the [socket10kxfr][socket10kxfr.pre#L64] test in NSD. BIND
296
+ and Knot parsed it without problems too.
297
+
298
+ [rfc1034#3.6.1]: https://datatracker.ietf.org/doc/html/rfc1034#section-3.6.1
299
+ [rfc1035#5]: https://datatracker.ietf.org/doc/html/rfc1035#section-5
300
+ [rfc1035#2.3.1]: https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.1
301
+ [rfc1123#2]: https://datatracker.ietf.org/doc/html/rfc1123#section-2
302
+ [rfc2065#4.5]: https://datatracker.ietf.org/doc/html/rfc2065#section-4.5
303
+ [rfc2181#5.2]: https://datatracker.ietf.org/doc/html/rfc2181#section-5.2
304
+ [rfc2181#8]: https://datatracker.ietf.org/doc/html/rfc2181#section-8
305
+ [rfc2181#11]: https://datatracker.ietf.org/doc/html/rfc2181#section-11
306
+ [rfc2308#4]: https://datatracker.ietf.org/doc/html/rfc2308#section-4
307
+ [rfc3597#5]: https://datatracker.ietf.org/doc/html/rfc3597#section-5
308
+ [rfc8767#4]: https://www.rfc-editor.org/rfc/rfc8767#section-4
309
+ [rfc9460#2.1]: https://datatracker.ietf.org/doc/html/rfc9460#section-2.1
310
+
311
+ [socket10kxfr.pre#L64]: https://github.com/NLnetLabs/nsd/blob/86a6961f2ca64f169d7beece0ed8a5e1dd1cd302/tpkg/long/socket10kxfr.tdir/socket10kxfr.pre#L64
0 commit comments