Skip to content

Commit eb1572c

Browse files
rolandshoemakergopherbot
authored andcommitted
html: another shot at security doc
Be clearer about the operation of the tokenizer and the parser (and their differences), and be explicit about the need for re-serialization when they are being used in security contexts. Change-Id: Ieb8f2a9d4806fb7a8849a15671667396e81c53b9 Reviewed-on: https://go-review.googlesource.com/c/net/+/484795 Auto-Submit: Roland Shoemaker <[email protected]> Reviewed-by: Damien Neil <[email protected]> Run-TryBot: Roland Shoemaker <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
1 parent 9001ca7 commit eb1572c

File tree

1 file changed

+14
-8
lines changed

1 file changed

+14
-8
lines changed

html/doc.go

+14-8
Original file line numberDiff line numberDiff line change
@@ -99,14 +99,20 @@ Care should be taken when parsing and interpreting HTML, whether full documents
9999
or fragments, within the framework of the HTML specification, especially with
100100
regard to untrusted inputs.
101101
102-
This package provides both a tokenizer and a parser. Only the parser constructs
103-
a DOM according to the HTML specification, resolving malformed and misplaced
104-
tags where appropriate. The tokenizer simply tokenizes the HTML presented to it,
105-
and as such does not resolve issues that may exist in the processed HTML,
106-
producing a literal interpretation of the input.
107-
108-
If your use case requires semantically well-formed HTML, as defined by the
109-
WHATWG specification, the parser should be used rather than the tokenizer.
102+
This package provides both a tokenizer and a parser, which implement the
103+
tokenization, and tokenization and tree construction stages of the WHATWG HTML
104+
parsing specification respectively. While the tokenizer parses and normalizes
105+
individual HTML tokens, only the parser constructs the DOM tree from the
106+
tokenized HTML, as described in the tree construction stage of the
107+
specification, dynamically modifying or extending the docuemnt's DOM tree.
108+
109+
If your use case requires semantically well-formed HTML documents, as defined by
110+
the WHATWG specification, the parser should be used rather than the tokenizer.
111+
112+
In security contexts, if trust decisions are being made using the tokenized or
113+
parsed content, the input must be re-serialized (for instance by using Render or
114+
Token.String) in order for those trust decisions to hold, as the process of
115+
tokenization or parsing may alter the content.
110116
*/
111117
package html // import "golang.org/x/net/html"
112118

0 commit comments

Comments
 (0)