Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Event#from_json method #4631

Merged
merged 1 commit into from
Feb 9, 2016

Conversation

colinsurprenant
Copy link
Contributor

will fix #4595

This introduces Event#from_json which takes as argument a json string and returns a new Event object. The idea here is to update the json_lines codec to use this method to avoid double type conversion by first parsing the json to a Ruby structure then passing that structure to the Java Event and go through unnecessary type conversions. the POC has shown a ~50% performance increase using this.

Note that this PR adds specs for the input corner cases behaviour of the current deserialization behaviour to make sure from_json behaves the same way.

@colinsurprenant
Copy link
Contributor Author

I will push a json_lines codec PR for this shortly to show the usage in a backward compatible way.

@colinsurprenant
Copy link
Contributor Author

the build fail is because of the ruby-maven problem I believe, fixed in #4630. I will merge and rebase here to see.

@andrewvc
Copy link
Contributor

andrewvc commented Feb 4, 2016

@colinsurprenant isn't this a bit strange? Shouldn't we be just rewriting JSON Lines to be a java plugin?

@andrewvc
Copy link
Contributor

andrewvc commented Feb 4, 2016

Just to clarify, we've basically moved one codec into the Event class, isn't this not a pattern we want to encourage?

@colinsurprenant
Copy link
Contributor Author

@andrewvc well, ultimately yes, having the codec in Java will be awesome, until then, this is the best option I believe. In any case, I think that if we have a to_json method, it does somehow make sense to also have a from_json? Since json ingestion support is such an important use-case to support, I think this will provide good value by bringing the performance improvement of today's Java Event implementation to to json ingestion use-case. WDYT?

@colinsurprenant
Copy link
Contributor Author

rebased against master, let's see if it solves the ruby-maven bobo

@colinsurprenant
Copy link
Contributor Author

@andrewvc also, this does not moves the codec into the event class, it moves the equivalent of LogStash::Json.load in Event
The codec still does the line tokenization etc

@colinsurprenant
Copy link
Contributor Author

now the error is

rspec ./logstash-core/spec/logstash/runner_spec.rb:36 # LogStash::Runner argument parsing with no arguments should show help

which seems unrelated to these commits 😕

@colinsurprenant
Copy link
Contributor Author

Ok, I am able to reproduce the build failure when on Java 7 and running

$ SPEC_OPTS="--seed 61716" rake test:core

IMO it is not related to this PR. Will create a separate issue.

The complete error is

Failures:

  1) LogStash::Runner argument parsing with no arguments should show help
     Failure/Error: subject.run(args)
       (#<Cabin::Channel:0x15462876>).warn("You may be interested in the '--configtest' flag which you can use to validate\nlogstash's configuration before you choose to restart a running system.")
           expected: 1 time with any arguments
           received: 2 times with arguments: ("You may be interested in the '--configtest' flag which you can use to validate\nlogstash's configuration before you choose to restart a running system.")
     # ./logstash-core/lib/logstash/runner.rb:164:in `execute'
     # ./vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in `run'
     # ./logstash-core/spec/logstash/runner_spec.rb:40:in `(root)'
     # ./vendor/bundle/jruby/1.9/gems/rspec-wait-0.0.8/lib/rspec/wait.rb:46:in `(root)'
     # ./rakelib/test.rake:42:in `(root)'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/task.rb:240:in `execute'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/task.rb:235:in `execute'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/task.rb:179:in `invoke_with_call_chain'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/task.rb:172:in `invoke_with_call_chain'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/task.rb:165:in `invoke'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:150:in `invoke_task'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:106:in `top_level'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:106:in `top_level'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:115:in `run_with_threads'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:100:in `top_level'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:78:in `run'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:176:in `standard_exception_handling'
     # ./vendor/bundle/jruby/1.9/gems/rake-10.4.2/lib/rake/application.rb:75:in `run'

Finished in 13.78 seconds (files took 3.14 seconds to load)
1202 examples, 1 failure

Failed examples:

rspec ./logstash-core/spec/logstash/runner_spec.rb:36 # LogStash::Runner argument parsing with no arguments should show help

Randomized with seed 61716

@andrewvc
Copy link
Contributor

andrewvc commented Feb 4, 2016

@colinsurprenant makes sense for now! So long as we don't add any more from/to methods I'm 👍

@colinsurprenant
Copy link
Contributor Author

ok, cool. who's next? 😬

try {
return RubyString.newString(context.runtime, event.toJson());
} catch (Exception e) {
throw new RaiseException(context.runtime, context.runtime.getModule("LogStash").defineOrGetModuleUnder("Json").getClass("GeneratorError"), e.getMessage(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this usage of defineOrGetModuleUnder is slightly confusing.

If you end up defining it, then wouldn't the proceeding getClass return null? since nothing is using it after that fact, I suppose it is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is and AFAIK this is the only way to get a reference of an inner module. maybe this could be established earlier on in the Library load so that if this namespace does not exist it will crash on startup and not at exception time. thanks, good observation.

@talevy
Copy link
Contributor

talevy commented Feb 4, 2016

in general, what is our stance on testing for our java-core + ruby integration? It looks like you chose to not introduce java tests in favor of doing ruby testing, should we stick to doing both? or is that only important once more of core becomes java. or did I miss the java tests somewhere?

@colinsurprenant
Copy link
Contributor Author

@tal good point. in this case I did not include specific java tests and probably I should. there is a potential redundancy issue but I think it's fair that some tests overlaps but both are required since they both define different interfaces.

@talevy
Copy link
Contributor

talevy commented Feb 5, 2016

I don't view it as redundant. Since we going to be using direct java more and more, I view it as future-proofing :)

@colinsurprenant
Copy link
Contributor Author

@talevy fair enough, I agree.

@colinsurprenant
Copy link
Contributor Author

@talevy I think I covered what we discussed, initializing the error classes at load time and added Java Event#fromJson tests.

@suyograo
Copy link
Contributor

suyograo commented Feb 5, 2016

heh, tests pass now on Java 7 http://build-eu-00.elastic.co/job/Logstash_PR/2170/

@tal
Copy link

tal commented Feb 5, 2016

Hi

@talevy
Copy link
Contributor

talevy commented Feb 5, 2016

hi @tal, meet @talevy

@suyograo
Copy link
Contributor

suyograo commented Feb 5, 2016

We need a @tal vs @talevy PR showdown

@talevy
Copy link
Contributor

talevy commented Feb 5, 2016

I'm fine with that, we can assign all my tickets to the real @tal.

@colinsurprenant
Copy link
Contributor Author

lol - sorry to have pulled you in @tal and thanks for stopping by and say Hi 😬

@colinsurprenant
Copy link
Contributor Author

the related PR for the json_lines codec is at logstash-plugins/logstash-codec-json_lines#19

@guyboertje
Copy link
Contributor

@colinsurprenant - are you doing the json codec too?

@guyboertje
Copy link
Contributor

@colinsurprenant - I am presuming that the Java tests are only done in your IDE or is there some rspec magic that runs the Java tests too?

@colinsurprenant
Copy link
Contributor Author

@guyboertje gradle build includes running tests, alternatively gradle test will only compile and run tests. If you don't have gradle 2.8+ installed locally you can use the provided wrapper gradlew.

@colinsurprenant
Copy link
Contributor Author

@guyboertje good point, I actually forgot about the plain json codec and there is a subtlety in there... it has 2 behaviours depending on the input json structure type:

  • Object/Hash : it generates an Event just like the json_lines codec, all good.
  • Array : it generates an array of Event, one for each json array element.

I am not sure on the best strategy for this latter case. I could easily change the contract of from_json to output Event[] and deal with the array case in there. I think that would be the only way to deal with that efficiently otherwise we are stuck at doing the deserialization before, in the codec, in Ruby, then checking the resulting object type.

Hm. I believe changing the contract to

public static Event[] fromJson(String json)

or

public static List<Event> fromJson(String json)

would be acceptable? the added benefit would be to also support the array case in json_lines?

Thoughts?

@guyboertje
Copy link
Contributor

@colinsurprenant

gradle test will only compile and run tests.

The reason for my question about the Java tests is: "Can the reviewer rely on the Jenkins test green tick in this PR to know that ALL the tests have been run and pass?".

I guess at the moment the reviewer must assume that if you are committing the jar file - the Java tests must have passed.

Is this the best we can do or can we make Travis do the gradle test and bundle exec rspec?

@guyboertje
Copy link
Contributor

@colinsurprenant - I saw the json codec with its two operations, that is way I asked. 😄

I vote we change the fromJson contract to List<Event> - the jruby wrapper would convert this to a RubyArray of RubyEvent. In the codecs we can expect an array of events even if it has just one element.

@colinsurprenant
Copy link
Contributor Author

@guyboertje

  • yup, agree with List<Event>.
  • the build process actually calls gradlew jar which does not includes running tests. good catch. we should definitely figure what the best strategy is for having the gradle test task invoked somewhere in the build process.
  • «I saw the json codec with its two operations» 😬 I appreciate your politeness in gently bringing this to my attention, maybe this has to do with this very British trait ;) but I wouldn't have been offended if you just stated your observation right there in the first place. But that's fine, we'll get there either way :P

@guyboertje
Copy link
Contributor

@colinsurprenant - you give me more credit than I'm due. I did not leap as far as thinking that what happens in the json codec will affect the Java Event fromJson contract design. I only meant that there is an optimisation opportunity therein.

But I was also thinking, what happens if the JSON is a well formed array of scalars - but not suited for LS Event generation?

@colinsurprenant
Copy link
Contributor Author

@guyboertje in this case I think the best/only thing to do is to honour the legacy behaviour which will be to throw an exception but the new implementation will throw an exception sooner by making sure the json array only contains maps, as for the legacy implementation would crash later when trying to initialize a new Event with a scalar parameter value.

I just finished the refactor to handle the json array of map case and I am a bit unhappy because it causes non negligible overhead in creating collections and converting these to Ruby and then the codec has to iterate the result. I am trying to see how this could be optimized...

@colinsurprenant
Copy link
Contributor Author

@guyboertje I think I'll play the dynamic game and have from_json Ruby interface either return a single Event or Array<Event> and it's up to the method consumer to either know the passed json is a map yielding a single Event object otherwise testing on the returned object type result.is_a?(Array) or force-normalizing into an array Array(result). I will try to see if I can improve by doing this at the cost of a weirder interface. Suggestions welcome 😛

@colinsurprenant
Copy link
Contributor Author

I finally decided on a middle ground, Event#from_json will always return an Array which makes for a more coherent interface but the implementation is using Java arrays instead of collections and this yields acceptable performance.

Note that this last commit introduces a bit more changes because I decided to upgrade Jackson which was really old and it is using a new namespace in v2

@guyboertje
Copy link
Contributor

👍 on returning an array.

@colinsurprenant
Copy link
Contributor Author

@talevy ^^ added test for partial invalid json array

@talevy
Copy link
Contributor

talevy commented Feb 8, 2016

sweet. any other todos? things look good to me

@colinsurprenant
Copy link
Contributor Author

no, all good on my side. wanna merge. 🙏

@talevy
Copy link
Contributor

talevy commented Feb 8, 2016

LGTM

Object o = mapper.readValue(json, Object.class);
// we currently only support Map or Array json objects
if (o instanceof Map) {
result = new Event[]{ new Event((Map)o) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@guyboertje
Copy link
Contributor

LGTM

@colinsurprenant
Copy link
Contributor Author

great! squashing and merging.

from_json POC

add Event#from_json with corresponding specs

pre-inititalize error class constants

tests for Event#from_json

support array of events in from_json, upgrade to latest jackson

add test for partially invalid json array
@colinsurprenant
Copy link
Contributor Author

merged into master and 2.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

introduce Event#from_json to avoid type conversions in Java Event
6 participants