|
| 1 | +.. include:: links.rst |
| 2 | + |
| 3 | +################### |
| 4 | +Presentation format |
| 5 | +################### |
| 6 | + |
| 7 | +DNS resource records (RRs) can be expressed in text form using the DNS |
| 8 | +presentation format. The format is originally defined in |
| 9 | +|url::rfc1035_section_5_1| and |url::rfc1034_section_3_6_1| and is most |
| 10 | +frequently used to define a zone in master files, more commonly known as |
| 11 | +zone files. The term "presentation format" is officially established in |
| 12 | +|url::rfc8499_section_5|. |
| 13 | + |
| 14 | +The presentation format is a concise tabular serialization format with |
| 15 | +provisions for convenient editing. The DNS is intentionally extensible and |
| 16 | +many RFCs define additional types and the typical representation for the |
| 17 | +corresponding RDATA sections. Consequently, the presentation format is not |
| 18 | +defined by one single specification, but rather many specifications. |
| 19 | + |
| 20 | +The presentation format is NOT context-free and correct interpretation of the |
| 21 | +specification(s) is rather dependent on extensive knowledge of the DNS. |
| 22 | + |
| 23 | +.. note:: |
| 24 | + This document is meant to be a concise source on interpretation of the |
| 25 | + presentation format, but is still very much a work in progress. Please |
| 26 | + consider contributing if anything is unclear or incorrect. |
| 27 | + |
| 28 | +Format |
| 29 | +====== |
| 30 | + |
| 31 | +.. note:: Modified text from |url::rfc1035_section_5_1| |
| 32 | + |
| 33 | +The presentation format defines a number of entries. Entries are predominantly |
| 34 | +line-oriented, though parentheses can be use to continue a list of items |
| 35 | +across a line boundrary, and text literals can contain CRLF within the text. |
| 36 | +Any combination of tabs and spaces act as a delimiter between the separate |
| 37 | +items that make up an entry. The end of any line can end with a comment. |
| 38 | +Comments start with a ``;`` (semicolon). |
| 39 | + |
| 40 | +The following entries are defined: |
| 41 | + |
| 42 | + <blank>[<comment>] |
| 43 | + |
| 44 | + $ORIGIN <domain-name> [<comment>] |
| 45 | + |
| 46 | + $INCLUDE <file-name> [<domain-name>] [<comment>] |
| 47 | + |
| 48 | + $TTL <TTL> [<comment>] |
| 49 | + |
| 50 | + <domain-name><rr> [<comment>] |
| 51 | + |
| 52 | + <blank><rr> [<comment>] |
| 53 | + |
| 54 | +Blank lines, with or without comments, are allowed anywhere in the file. |
| 55 | + |
| 56 | +Three control entries are defined: $ORIGIN, $INCLUDE and $TTL (defined in |
| 57 | +|url::rfc2308_section_4|). $ORIGIN is followed by a domain name, and resets the |
| 58 | +current origin for relative domain names to the stated name. $INCLUDE inserts |
| 59 | +the named file into the current file, and may optionally specify a domain name |
| 60 | +that sets the relative domain name origin for the included file. $INCLUDE may |
| 61 | +also have a comment. Note that a $INCLUDE entry never changes the relative |
| 62 | +origin of the parent file, regardless of changes to the relative origin made |
| 63 | +within the included file. $TTL is followed by a decimal integer, and resets |
| 64 | +the default TTL for RRs which do not explicitly include a TTL value. |
| 65 | + |
| 66 | +The last two forms represent RRs. If an entry for an RR begins with a |
| 67 | +``<blank>``, then the RR is assumed to be owned by the last stated owner. If |
| 68 | +an RR entry begins with a ``<domain-name>``, then the owner name is reset. |
| 69 | + |
| 70 | +``<rr>`` contents take one of the following forms: |
| 71 | + |
| 72 | + [<TTL>] [<class>] <type> <RDATA> |
| 73 | + |
| 74 | + [<class>] [<TTL>] <type> <RDATA> |
| 75 | + |
| 76 | +The RR begins with optional TTL and class fields, followed by a type and |
| 77 | +RDATA field appropriate to the type and class. Class and type use the |
| 78 | +standard mnemonics, TTL is a decimal integer. Omitted class and TTL |
| 79 | +values are default to the last explicitly stated values. Since type and |
| 80 | +class mnemonics are disjoint, the parse is unique. (Note that this |
| 81 | +order is different from wire format order; the given order allows easier |
| 82 | +parsing and defaulting.) |
| 83 | + |
| 84 | +<domain-name>s make up a large share of the data in the master file. |
| 85 | +The labels in the domain name are expressed as character strings and |
| 86 | +separated by dots. Quoting conventions allow arbitrary characters to be |
| 87 | +stored in domain names. Domain names that end in a dot are called |
| 88 | +absolute, and are taken as complete. Domain names which do not end in a |
| 89 | +dot are called relative; the actual domain name is the concatenation of |
| 90 | +the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as |
| 91 | +an argument to the master file loading routine. A relative name is an |
| 92 | +error when no origin is available. |
| 93 | + |
| 94 | +<character-string> is expressed in one or two ways: as a contiguous set of |
| 95 | +characters without interior spaces, or as a string beginning with a ``"`` |
| 96 | +and ending with a ``"``. Inside a ``"`` delimited string any character can |
| 97 | +occur, except for a ``"`` itself, which must be quoted using ``\\`` |
| 98 | +(backslash). |
| 99 | + |
| 100 | +Because these files are text files several special encodings are |
| 101 | +necessary to allow arbitrary data to be loaded. In particular: |
| 102 | + |
| 103 | + of the root. |
| 104 | + |
| 105 | +@ A free standing @ is used to denote the current origin. |
| 106 | + |
| 107 | +\X where X is any character other than a digit (0-9), is |
| 108 | + used to quote that character so that its special meaning |
| 109 | + does not apply. For example, "\." can be used to place |
| 110 | + a dot character in a label. |
| 111 | + |
| 112 | +\DDD where each D is a digit is the octet corresponding to |
| 113 | + the decimal number described by DDD. The resulting |
| 114 | + octet is assumed to be text and is not checked for |
| 115 | + special meaning. |
| 116 | + |
| 117 | +( ) Parentheses are used to group data that crosses a line |
| 118 | + boundary. In effect, line terminations are not |
| 119 | + recognized within parentheses. |
| 120 | + |
| 121 | +; Semicolon is used to start a comment; the remainder of |
| 122 | + the line is ignored. |
| 123 | + |
| 124 | + |
| 125 | +Handling of Unknown DNS Resource Record (RR) Types |
| 126 | +-------------------------------------------------- |
| 127 | + |
| 128 | +The intentional extensibility in the DNS may lead to software implementations |
| 129 | +lagging behind in support. |url::rfc3597_section_5| introduces generic |
| 130 | +notations to represent unknown types, classes and the corresponding RDATA in |
| 131 | +text form. |
| 132 | + |
| 133 | +.. note:: Modified text from |url::rfc3597_section_5|. |
| 134 | + |
| 135 | +The type field for an unknown RR type is represented by the word ``TYPE`` |
| 136 | +immediately followed by the decimal RR type code, with no intervening |
| 137 | +whitespace. In the class field, an unknown class is similarly represented |
| 138 | +as the word ``CLASS`` immediately followed by the decimal class code. |
| 139 | + |
| 140 | +This convention allows types and classes to be distinguished from each other |
| 141 | +and from TTL values, allowing both <rr> forms to be unambiguously parsed. |
| 142 | + |
| 143 | + [<TTL>] [<class>] <type> <RDATA> |
| 144 | + |
| 145 | + [<class>] [<TTL>] <type> <RDATA> |
| 146 | + |
| 147 | + |
| 148 | +The RDATA section of an RR of unknown type is represented as a sequence of |
| 149 | +white space separated words as follows: |
| 150 | + |
| 151 | + The special token ``\\#`` (a backslash immediately followed by a hash |
| 152 | + sign), which identifies the RDATA as having the generic encoding |
| 153 | + defined herein rather than a traditional type-specific encoding. |
| 154 | + |
| 155 | + An unsigned decimal integer specifying the RDATA length in octets. |
| 156 | + |
| 157 | + Zero or more words of hexadecimal data encoding the actual RDATA field, |
| 158 | + each containing an even number of hexadecimal digits. |
| 159 | + |
| 160 | +If the RDATA is of zero length, the text representation contains only the |
| 161 | +``\\#`` token and the single zero representing the length. |
| 162 | + |
| 163 | +Even though an RR of known type represented in the ``\#`` format is effectively |
| 164 | +treated as an unknown type for the purpose of parsing the RDATA text |
| 165 | +representation, all further processing by the server MUST treat it as a |
| 166 | +known type and take into account any applicable type-specific rules regarding |
| 167 | +compression, canonicalization, etc. |
| 168 | + |
| 169 | + |
| 170 | +Service Binding and Parameter Specification via the DNS |
| 171 | +------------------------------------------------------- |
| 172 | + |
| 173 | +|url::rfc9460| introduces a key-value syntax to the presentation format for |
| 174 | +the ``SVCB`` and ``HTTPS`` type (initially). The addition is a major change |
| 175 | +for implementors of presentation format parsers. |
| 176 | + |
| 177 | +.. note:: |
| 178 | + Write (or copy) a section on the format from |url::rfc9460_section_2_1|. |
| 179 | + |
| 180 | +The RFC specifies a number of initial Service Parameter Keys (SvcParamKeys). |
| 181 | +IANA maintains these and additional keys in the Service Parameter Keys |
| 182 | +(SvcParamKeys) registry in the |url::dns-svcb| category. |
| 183 | + |
| 184 | +alpn and no-default-alpn |
| 185 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 186 | + |
| 187 | +|url::rfc9460_section_7_1_1| specifies the ``alpn`` and ``no-default-alpn`` |
| 188 | +SvcParamKeys. The ``alpn`` SvcParamKey takes a comma-separated list of |
| 189 | +Application-Layer Protocol Negotiation (ALPN) Protocol IDs (maintained |
| 190 | +by IANA in the |url::tls-extensiontype-values| category), the syntax for which |
| 191 | +is defined in |url::rfc9460_appendix_a_1|. |
| 192 | + |
| 193 | +A problem arises when items in the comma-separated list may contain a ``,`` |
| 194 | +(comma) or ``\\`` (backslash). |url::rfc9460_section_2_1| specifies |
| 195 | +SvcParamValue to be a ``char-string`` and some implementations (incorrectly) |
| 196 | +unescape ``char-string`` during the scanner stage. Consequently, the fact that |
| 197 | +a character is ``escaped`` (``\000`` or ``\X``) is lost to the comma-separated |
| 198 | +list parser. None of the registered protocol identifiers (currently) contains |
| 199 | +a ``,`` (comma) and the specification dismisses the issue in the interest of |
| 200 | +progress. |
| 201 | + |
| 202 | +|url::rfc9460_appendix_a_1| specifies ``simple-comma-separated``, for lists of |
| 203 | +items that cannot contain either of the aforementioned characters, and |
| 204 | +``comma-separated`` for lists of items that can. The specification overlooks |
| 205 | +that ``alpn``, or comma-separated lists, are encoded on the wire as a sequence |
| 206 | +of strings, or a sequence of length octet followed by a maximum of 255 data |
| 207 | +octets. A name server writing a transfer to disk in plain text can therefore |
| 208 | +not encode data using the ``simple-comma-separated`` scheme. |
| 209 | + |
| 210 | +The specification contradicts itself in |url::rfc9460_section_7_1_1| by |
| 211 | +stating that presentation format parsers MAY simply disallow the ``,`` and |
| 212 | +``\\`` characters in ALPN IDs instead of implementing the value-list escaping |
| 213 | +procedure by relying on the opaque key format (e.g., ``key1=\002h2``) in the |
| 214 | +event that these characters are needed. Since SvcParamValue is defined to be |
| 215 | +``char-string``, the problem persists. To implementations that unescape during |
| 216 | +the scanner stage, the escape sequence is still lost and implementations that |
| 217 | +unescape during the parser stage did not have the problem to start with. |
| 218 | + |
| 219 | +|url::rfc9460| incorrectly assumes that ``char-string`` presents text. |
| 220 | +Programming languages typically classify a token as string if it is quoted, |
| 221 | +an identifier or keyword if it is a contiguous set of characters, etc. |
| 222 | +Unescaping is then typically done by the scanner because tokens can be |
| 223 | +classified during that stage. The presentation format defines basic syntax to |
| 224 | +identify tokens, but as the format is NOT context-free and intentionally |
| 225 | +existensible, the token can only be classified during the parser stage. Simply |
| 226 | +put, ``char-string`` in the presentation format cannot be unescaped during the |
| 227 | +scanner stage as the scanner does not know the type of information the |
| 228 | +``char-string`` presents. Domain names are a prime example. |
| 229 | + |
| 230 | +The RR ``foo. NS \.`` defines ``bar\.`` as a relative domain name. The ``\\`` |
| 231 | +(backslash) is important because it signals that the trailing dot does not |
| 232 | +serve as a label separator. |
| 233 | + |
| 234 | +.. note:: |
| 235 | + This issue has been `discussed <https://mailarchive.ietf.org/arch/msg/dnsop/SXnlsE1B8gmlDjn4HtOo1lwtqAI/>` on the DNSOP IETF mailing list. |
| 236 | + |
| 237 | +As BIND, Knot and NSD implement double escaping, so does simdzone even though |
| 238 | +the behavior is incorrect. |
| 239 | + |
0 commit comments