Skip to content

Commit 1ec9be1

Browse files
naitohkou
andcommitted
Improve CDATA parse performance
## Why? GitHub: fix #243 ## Benchmark (Comparison with rexml 3.4.1) ``` $ benchmark-driver benchmark/parse_cdata.yaml Calculating ------------------------------------- rexml 3.4.1 master 3.4.1(YJIT) master(YJIT) dom 648.361 1.178k 591.590 1.046k i/s - 100.000 times in 0.154235s 0.084913s 0.169036s 0.095627s sax 699.061 1.378k 651.148 1.196k i/s - 100.000 times in 0.143049s 0.072549s 0.153575s 0.083611s pull 699.271 1.379k 660.275 1.210k i/s - 100.000 times in 0.143006s 0.072527s 0.151452s 0.082622s stream 701.725 1.383k 659.483 1.228k i/s - 100.000 times in 0.142506s 0.072307s 0.151634s 0.081455s Comparison: dom master: 1177.7 i/s master(YJIT): 1045.7 i/s - 1.13x slower rexml 3.4.1: 648.4 i/s - 1.82x slower 3.4.1(YJIT): 591.6 i/s - 1.99x slower sax master: 1378.4 i/s master(YJIT): 1196.0 i/s - 1.15x slower rexml 3.4.1: 699.1 i/s - 1.97x slower 3.4.1(YJIT): 651.1 i/s - 2.12x slower pull master: 1378.8 i/s master(YJIT): 1210.3 i/s - 1.14x slower rexml 3.4.1: 699.3 i/s - 1.97x slower 3.4.1(YJIT): 660.3 i/s - 2.09x slower stream master: 1383.0 i/s master(YJIT): 1227.7 i/s - 1.13x slower rexml 3.4.1: 701.7 i/s - 1.97x slower 3.4.1(YJIT): 659.5 i/s - 2.10x slower ``` - YJIT=ON : 1.76x - 1.83x faster - YJIT=OFF : 1.82x - 1.97x faster Co-authored-by: Sutou Kouhei <kou@clear-code.com>
1 parent b97e454 commit 1ec9be1

File tree

4 files changed

+77
-5
lines changed

4 files changed

+77
-5
lines changed

benchmark/parse_cdata.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
loop_count: 100
2+
contexts:
3+
- gems:
4+
rexml: 3.2.6
5+
require: false
6+
prelude: require 'rexml'
7+
- name: master
8+
prelude: |
9+
$LOAD_PATH.unshift(File.expand_path("lib"))
10+
require 'rexml'
11+
- name: 3.2.6(YJIT)
12+
gems:
13+
rexml: 3.2.6
14+
require: false
15+
prelude: |
16+
require 'rexml'
17+
RubyVM::YJIT.enable
18+
- name: master(YJIT)
19+
prelude: |
20+
$LOAD_PATH.unshift(File.expand_path("lib"))
21+
require 'rexml'
22+
RubyVM::YJIT.enable
23+
24+
prelude: |
25+
require 'rexml/document'
26+
require 'rexml/parsers/sax2parser'
27+
require 'rexml/parsers/pullparser'
28+
require 'rexml/parsers/streamparser'
29+
require 'rexml/streamlistener'
30+
31+
def build_xml(size)
32+
xml = "<?xml version=\"1.0\"?>\n" +
33+
"<root>Test</root>\n" +
34+
"<![CDATA[" + "a" * size + "]]>\n"
35+
end
36+
xml = build_xml(100000)
37+
38+
class Listener
39+
include REXML::StreamListener
40+
end
41+
42+
benchmark:
43+
'dom' : REXML::Document.new(xml)
44+
'sax' : REXML::Parsers::SAX2Parser.new(xml).parse
45+
'pull' : |
46+
parser = REXML::Parsers::PullParser.new(xml)
47+
while parser.has_next?
48+
parser.pull
49+
end
50+
'stream' : REXML::Parsers::StreamParser.new(xml, Listener.new).parse

lib/rexml/parsers/baseparser.rb

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -471,9 +471,13 @@ def pull_event
471471
end
472472

473473
return [ :comment, md[1] ]
474-
else
475-
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
476-
return [ :cdata, md[1] ] if md
474+
elsif @source.match?("[CDATA[", true)
475+
text = @source.read_until("]]>")
476+
if text.chomp!("]]>")
477+
return [ :cdata, text ]
478+
else
479+
raise REXML::ParseException.new("Malformed CDATA: Missing end ']]>'", @source)
480+
end
477481
end
478482
raise REXML::ParseException.new( "Declarations can only occur "+
479483
"in the doctype declaration.", @source)

lib/rexml/source.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ class Source
6767
module Private
6868
SCANNER_RESET_SIZE = 100000
6969
PRE_DEFINED_TERM_PATTERNS = {}
70-
pre_defined_terms = ["'", '"', "<"]
70+
pre_defined_terms = ["'", '"', "<", "]]>"]
7171
if StringScanner::Version < "3.1.1"
7272
pre_defined_terms.each do |term|
7373
PRE_DEFINED_TERM_PATTERNS[term] = /#{Regexp.escape(term)}/

test/parse/test_cdata.rb

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,28 @@ module REXMLTests
77
class TestParseCData < Test::Unit::TestCase
88
include Test::Unit::CoreAssertions
99

10+
def parse(xml)
11+
REXML::Document.new(xml)
12+
end
13+
1014
def test_linear_performance_gt
1115
seq = [10000, 50000, 100000, 150000, 200000]
1216
assert_linear_performance(seq, rehearsal: 10) do |n|
13-
REXML::Document.new('<description><![CDATA[ ' + ">" * n + ' ]]></description>')
17+
parse('<description><![CDATA[ ' + ">" * n + ' ]]></description>')
18+
end
19+
end
20+
21+
class TestInvalid < self
22+
def test_unclosed_cdata
23+
exception = assert_raise(REXML::ParseException) do
24+
parse("<root><![CDATA[a]></root>")
25+
end
26+
assert_equal(<<~DETAIL, exception.to_s)
27+
Malformed CDATA: Missing end ']]>'
28+
Line: 1
29+
Position: 25
30+
Last 80 unconsumed characters:
31+
DETAIL
1432
end
1533
end
1634
end

0 commit comments

Comments
 (0)