Skip to content

Commit 67d21be

Browse files
authored
Reduced regular expression processing in the form of processing white space first (#237)
## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.4.1/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 19.849 20.109 36.064 38.655 i/s - 100.000 times in 5.038102s 4.972864s 2.772838s 2.586981s sax 30.339 30.449 52.946 54.873 i/s - 100.000 times in 3.296102s 3.284176s 1.888722s 1.822391s pull 34.785 34.916 65.808 65.219 i/s - 100.000 times in 2.874810s 2.863976s 1.519581s 1.533305s stream 34.766 34.921 61.920 63.277 i/s - 100.000 times in 2.876359s 2.863571s 1.615000s 1.580354s Comparison: dom after(YJIT): 38.7 i/s before(YJIT): 36.1 i/s - 1.07x slower after: 20.1 i/s - 1.92x slower before: 19.8 i/s - 1.95x slower sax after(YJIT): 54.9 i/s before(YJIT): 52.9 i/s - 1.04x slower after: 30.4 i/s - 1.80x slower before: 30.3 i/s - 1.81x slower pull before(YJIT): 65.8 i/s after(YJIT): 65.2 i/s - 1.01x slower after: 34.9 i/s - 1.88x slower before: 34.8 i/s - 1.89x slower stream after(YJIT): 63.3 i/s before(YJIT): 61.9 i/s - 1.02x slower after: 34.9 i/s - 1.81x slower before: 34.8 i/s - 1.82x slower ``` - YJIT=ON : 0.99x - 1.07x faster - YJIT=OFF : 1.00x - 1.01x faster
1 parent f63c510 commit 67d21be

File tree

2 files changed

+13
-10
lines changed

2 files changed

+13
-10
lines changed

lib/rexml/parsers/baseparser.rb

+8-5
Original file line numberDiff line numberDiff line change
@@ -297,10 +297,11 @@ def pull_event
297297
raise REXML::ParseException.new(message, @source)
298298
end
299299
name = parse_name(base_error_message)
300-
if @source.match?(/\s*\[/um, true)
300+
@source.match?(/\s*/um, true) # skip spaces
301+
if @source.match?("[", true)
301302
id = [nil, nil, nil]
302303
@document_status = :in_doctype
303-
elsif @source.match?(/\s*>/um, true)
304+
elsif @source.match?(">", true)
304305
id = [nil, nil, nil]
305306
@document_status = :after_doctype
306307
@source.ensure_buffer
@@ -312,9 +313,10 @@ def pull_event
312313
# For backward compatibility
313314
id[1], id[2] = id[2], nil
314315
end
315-
if @source.match?(/\s*\[/um, true)
316+
@source.match?(/\s*/um, true) # skip spaces
317+
if @source.match?("[", true)
316318
@document_status = :in_doctype
317-
elsif @source.match?(/\s*>/um, true)
319+
elsif @source.match?(">", true)
318320
@document_status = :after_doctype
319321
@source.ensure_buffer
320322
else
@@ -409,7 +411,8 @@ def pull_event
409411
id = parse_id(base_error_message,
410412
accept_external_id: true,
411413
accept_public_id: true)
412-
unless @source.match?(/\s*>/um, true)
414+
@source.match?(/\s*/um, true) # skip spaces
415+
unless @source.match?(">", true)
413416
message = "#{base_error_message}: garbage before end >"
414417
raise REXML::ParseException.new(message, @source)
415418
end

test/parse/test_document_type_declaration.rb

+5-5
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ def test_no_literal
153153
Line: 3
154154
Position: 26
155155
Last 80 unconsumed characters:
156-
SYSTEM> <r/>
156+
SYSTEM> <r/>
157157
DETAIL
158158
end
159159

@@ -200,7 +200,7 @@ def test_content_double_quote
200200
Line: 3
201201
Position: 62
202202
Last 80 unconsumed characters:
203-
PUBLIC 'double quote " is invalid' "r.dtd"> <r/>
203+
PUBLIC 'double quote " is invalid' "r.dtd"> <r/>
204204
DETAIL
205205
end
206206

@@ -228,10 +228,10 @@ def test_garbage_after_literal
228228
end
229229
assert_equal(<<-DETAIL.chomp, exception.to_s)
230230
Malformed DOCTYPE: garbage after external ID
231-
Line: 3
232-
Position: 65
231+
Line: 1
232+
Position: 58
233233
Last 80 unconsumed characters:
234-
x'> <r/>
234+
x'>
235235
DETAIL
236236
end
237237

0 commit comments

Comments
 (0)