Skip to content

Commit 78f8712

Browse files
Fix handling with "xml:" prefixed namespace (#208)
I found parsing XHTML documents like below fails since v3.3.3: ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>XHTML Document</title> </head> <body> <h1>XHTML Document</h1> <p xml:lang="ja" lang="ja">この段落は日本語です。</p> </body> </html> ``` [XML namespace spec][spec] is a little bit ambiguous but document above is valid according to an [article W3C serves][article]. I fixed the parsing algorithm. Can you review it? As an aside, `<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">` style language declaration is often used in XHTML files included in EPUB files because [sample EPUB files][samples] provided by IDPF, former EPUB spec authority, use the style. [spec]: https://www.w3.org/TR/REC-xml-names/#defaulting [article]: https://www.w3.org/International/questions/qa-html-language-declarations#attributes [samples]: https://github.com/IDPF/epub3-samples
1 parent 2e1cd64 commit 78f8712

File tree

2 files changed

+38
-2
lines changed

2 files changed

+38
-2
lines changed

Diff for: lib/rexml/parsers/baseparser.rb

+3-2
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ module Private
156156
default_entities.each do |term|
157157
DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
158158
end
159+
XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
159160
end
160161
private_constant :Private
161162

@@ -185,7 +186,7 @@ def stream=( source )
185186
@tags = []
186187
@stack = []
187188
@entities = []
188-
@namespaces = {}
189+
@namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE}
189190
@namespaces_restore_stack = []
190191
end
191192

@@ -790,7 +791,7 @@ def parse_attributes(prefixes)
790791
@source.match(/\s*/um, true)
791792
if prefix == "xmlns"
792793
if local_part == "xml"
793-
if value != "http://www.w3.org/XML/1998/namespace"
794+
if value != Private::XML_PREFIXED_NAMESPACE
794795
msg = "The 'xml' prefix must not be bound to any other namespace "+
795796
"(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
796797
raise REXML::ParseException.new( msg, @source, self )

Diff for: test/parser/test_base_parser.rb

+35
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,40 @@ def test_large_xml
2323
parser.position < xml.bytesize
2424
end
2525
end
26+
27+
def test_attribute_prefixed_by_xml
28+
xml = <<-XML
29+
<?xml version="1.0" encoding="UTF-8"?>
30+
<!DOCTYPE html>
31+
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
32+
<head>
33+
<title>XHTML Document</title>
34+
</head>
35+
<body>
36+
<h1>XHTML Document</h1>
37+
<p xml:lang="ja" lang="ja">この段落は日本語です。</p>
38+
</body>
39+
</html>
40+
XML
41+
42+
parser = REXML::Parsers::BaseParser.new(xml)
43+
5.times {parser.pull}
44+
45+
html = parser.pull
46+
assert_equal([:start_element,
47+
"html",
48+
{"xmlns" => "http://www.w3.org/1999/xhtml",
49+
"xml:lang" => "en",
50+
"lang" => "en"}],
51+
html)
52+
53+
15.times {parser.pull}
54+
55+
p = parser.pull
56+
assert_equal([:start_element,
57+
"p",
58+
{"xml:lang" => "ja", "lang" => "ja"}],
59+
p)
60+
end
2661
end
2762
end

0 commit comments

Comments
 (0)