Skip to content

Commit 2362298

Browse files
committed
Prevent "unknown encoding: ASCII-8BIT" errors
Nokogiri would throw these errors when outputting the fragment with #to_s. Instead call the format serializers directly and pass a UTF-8 encoding.
1 parent 1206877 commit 2362298

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

lib/rails/html/sanitizer.rb

+5-1
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ def sanitize(html, options = {})
120120
loofah_fragment.scrub!(:strip)
121121
end
122122

123-
loofah_fragment.to_s
123+
properly_encode(loofah_fragment, encoding: 'UTF-8')
124124
end
125125

126126
def sanitize_css(style_string)
@@ -136,6 +136,10 @@ def allowed_tags(options)
136136
def allowed_attributes(options)
137137
options[:attributes] || self.class.allowed_attributes
138138
end
139+
140+
def properly_encode(fragment, options)
141+
fragment.xml? ? fragment.to_xml(options) : fragment.to_html(options)
142+
end
139143
end
140144
end
141145
end

test/sanitizer_test.rb

+7
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,13 @@ def test_x03a_legitimate
441441
assert_sanitized %(<a href="http&#x3A;//legit">), %(<a href="http://legit">)
442442
end
443443

444+
def test_sanitize_ascii_8bit_string
445+
white_list_sanitize('<a>hello</a>'.encode('ASCII-8BIT')).tap do |sanitized|
446+
assert_equal '<a>hello</a>', sanitized
447+
assert_equal Encoding::UTF_8, sanitized.encoding
448+
end
449+
end
450+
444451
protected
445452

446453
def xpath_sanitize(input, options = {})

0 commit comments

Comments
 (0)