Skip to content

Slow store_xml performance due to rexml performing one "require" on per event #83

Closed
@jsvd

Description

@jsvd

After observing a quite poor ~1000 eps performance in a test I noticed the thread dump points to the plugin spending a significant time trying to require a library again, and again:

--------------------------------------------------------------------------------
18.46 % of cpu usage, state: blocked, thread name: '[main]>worker9', thread id: 62 
	app//org.jruby.runtime.load.LibrarySearcher.findLibraryForRequire(LibrarySearcher.java:82)
	app//org.jruby.runtime.load.LoadService.searchForRequire(LoadService.java:744)
	app//org.jruby.runtime.load.LoadService.smartLoadInternal(LoadService.java:550)
	app//org.jruby.runtime.load.LoadService.require(LoadService.java:423)
	app//org.jruby.RubyKernel.requireCommon(RubyKernel.java:1191)
	app//org.jruby.RubyKernel.require(RubyKernel.java:1184)
	app//org.jruby.RubyKernel$INVOKER$s$1$0$require.call(RubyKernel$INVOKER$s$1$0$require.gen)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.source.invokeOther6:require(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/source.rb:38)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.source.RUBY$method$create_from$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/source.rb:38)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.parsers.baseparser.invokeOther0:create_from(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/parsers/baseparser.rb:183)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.parsers.baseparser.RUBY$method$stream=$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/parsers/baseparser.rb:183)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.parsers.baseparser.invokeOther0:stream=(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/parsers/baseparser.rb:164)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.parsers.baseparser.RUBY$method$initialize$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/parsers/baseparser.rb:164)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.parsers.treeparser.RUBY$method$initialize$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/parsers/treeparser.rb:10)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.document.RUBY$method$build$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/document.rb:452)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.rexml_minus_3_dot_3_dot_9.lib.rexml.document.RUBY$method$initialize$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/rexml-3.3.9/lib/rexml/document.rb:103)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.xml_minus_simple_minus_1_dot_1_dot_9.lib.xmlsimple.RUBY$method$parse$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/xml-simple-1.1.9/lib/xmlsimple.rb:979)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.xml_minus_simple_minus_1_dot_1_dot_9.lib.xmlsimple.RUBY$method$xml_in$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/xml-simple-1.1.9/lib/xmlsimple.rb:164)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.xml_minus_simple_minus_1_dot_1_dot_9.lib.xmlsimple.RUBY$method$xml_in$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/xml-simple-1.1.9/lib/xmlsimple.rb:203)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_filter_minus_xml_minus_4_dot_2_dot_0.lib.logstash.filters.xml.RUBY$method$filter$0(/private/tmp/meh/logstash-8.16.0/vendor/bundle/jruby/3.1.0/gems/logstash-filter-xml-4.2.0/lib/logstash/filters/xml.rb:195)
	private.tmp.meh.logstash_minus_8_dot_16_dot_0.logstash_minus_core.lib.logstash.filters.base.RUBY$method$do_filter$0(/private/tmp/meh/logstash-8.16.0/logstash-core/lib/logstash/filters/base.rb:158)

This is due to xml-simple's XmlSimple.xml_in calling rexml filter's SourceFactory::create_from per event, code:

    def SourceFactory::create_from(arg)
      if arg.respond_to? :read and
          arg.respond_to? :readline and
          arg.respond_to? :nil? and
          arg.respond_to? :eof?
        IOSource.new(arg)
      elsif arg.respond_to? :to_str
        require 'stringio'
        IOSource.new(StringIO.new(arg))
...

Since JRuby doesn't have a GIL, running this code in multiple threads will cause the require 'string' to hit a synchronized code block in JRuby.

A potential fix is to monkeypatch SourceFactory::create_from to not require stringio.

In a local experiment, not having that require causes a performance improvement of up to 10x.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions