Skip to content

Conversation

@liyafan82
Copy link
Contributor

@liyafan82 liyafan82 commented Aug 8, 2019

We provide benchmarks to evaluate the performance of setting IntVector in 3 different ways:

through a value holder
through a writer
directly set the value through a set method

@tianchen92
Copy link
Contributor

tianchen92 commented Aug 8, 2019

Agreed here with ValueHolder will perform worse. I did a refactor for Jdbc adapter in #4978 (also use writer to write data instead of ValueHolder) which has some overleap with this one

@liyafan82
Copy link
Contributor Author

Agreed here with ValueHolder will perform worse. I did a refactor for Jdbc adapter in #4978 (also use writer to write data instead of ValueHolder) which has some overleap with this one

@tianchen92 I see. Thanks. I think #4978 and this are orthogonal ways of improving the performance of JDBC adapter.

1 similar comment
@liyafan82
Copy link
Contributor Author

Agreed here with ValueHolder will perform worse. I did a refactor for Jdbc adapter in #4978 (also use writer to write data instead of ValueHolder) which has some overleap with this one

@tianchen92 I see. Thanks. I think #4978 and this are orthogonal ways of improving the performance of JDBC adapter.

@emkornfield
Copy link
Contributor

@liyafan82 since these changes are going to conflict, maybe we should just keep the benchmark and update JIRA description and title here? Or do you think the current code layout is better then what is #4978?

@liyafan82
Copy link
Contributor Author

@liyafan82 since these changes are going to conflict, maybe we should just keep the benchmark and update JIRA description and title here? Or do you think the current code layout is better then what is #4978?

@emkornfield I think the code layout of #4978 is better.
I will provide more benchmarks, and change the titles accordingly.

@liyafan82
Copy link
Contributor Author

@liyafan82 since these changes are going to conflict, maybe we should just keep the benchmark and update JIRA description and title here? Or do you think the current code layout is better then what is #4978?

@emkornfield
Performance evaluations show that the performance of writer is even worse than the value holder:

IntBenchmarks.setIntDirectly avgt 5 15.372 ± 0.010 us/op
IntBenchmarks.setWithValueHolder avgt 5 19.192 ± 0.759 us/op
IntBenchmarks.setWithWriter avgt 5 21.040 ± 0.489 us/op

I will do more investigations.

@emkornfield
Copy link
Contributor

@liyafan82 thanks, that is interesting. I'm not seeing the writer benchmark, maybe upload for a sanity check? I wonder why we have the writer if it is actually worse (maybe as some sort of abstraction)?

@tianchen92 this probably affects #4978 and the work on Avro. Sorry I think I suggested writers initially instead of direct writes.

@tianchen92
Copy link
Contributor

@liyafan82 thanks, that is interesting. I'm not seeing the writer benchmark, maybe upload for a sanity check? I wonder why we have the writer if it is actually worse (maybe as some sort of abstraction)?

@tianchen92 this probably affects #4978 and the work on Avro. Sorry I think I suggested writers initially instead of direct writes.

Also surprised,with the IntBenchmarks in this PR,I add the writer benchmark as below:

@benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void setIntWithWriter() {
IntWriter writer = new IntWriterImpl(vector);
for (int i = 0; i < VECTOR_LENGTH; i++) {
boolean isSet = i % 3 == 0;
if (isSet) {
writer.writeInt(i);
}
writer.setPosition(writer.getPosition() + 1);
}
}

These three methods all calls setSafe and the result I got is totally different in my machine.

Benchmark Mode Cnt Score Error Units
IntBenchmarks.setIntDirectly avgt 5 15.791 ± 1.526 us/op
IntBenchmarks.setIntWithWriter avgt 5 9.602 ± 1.244 us/op
IntBenchmarks.setWithValueHolder avgt 5 12.518 ± 1.023 us/op

Looks writer performs best... maybe results are affected by different machines?

@emkornfield Doesn't matter, I guess maybe you could have a test also to see if this is really a problem. If writer actually has performance problem, I will do refactoring for jdbc and avro adapters, its alright.

@emkornfield
Copy link
Contributor

@emkornfield Doesn't matter, I guess maybe you could have a test also to see if this is really a problem. If writer actually has performance problem, I will do refactoring for jdbc and avro adapters, its alright.

I'll test on my machine. Curious if you made sure to turn-off bounds checking?

@tianchen92
Copy link
Contributor

Curious if you made sure to turn-off bounds checking?

No, do you mean NullCheckingForGet? Does it only checks for get method?

@emkornfield
Copy link
Contributor

No, do you mean NullCheckingForGet? Does it only checks for get method?
I meant bounds checking NullCheckingForGet I believe you are correct that it is only used for get.

@tianchen92
Copy link
Contributor

No, do you mean NullCheckingForGet? Does it only checks for get method?
I meant bounds checking NullCheckingForGet I believe you are correct that it is only used for get.

Thanks for clarify, I got the result above without change this flag.

@liyafan82
Copy link
Contributor Author

@tianchen92 sure. I will upload the code later, because it is in another computer.

@liyafan82
Copy link
Contributor Author

@emkornfield @tianchen92 I have uploaded the benchmark. Sorry for the delay.
The newly produced benchmark results:

Benchmark Mode Cnt Score Error Units
IntBenchmarks.setIntDirectly avgt 5 15.373 ± 0.024 us/op
IntBenchmarks.setWithValueHolder avgt 5 19.255 ± 0.719 us/op
IntBenchmarks.setWithWriter avgt 5 21.065 ± 0.433 us/op

@emkornfield
Copy link
Contributor

Running without bounds checking turned, I saw some variations where with Writer was twice as slow as the other implementations. But on other runs I got results consistent with @tianchen92:

Benchmark Mode Cnt Score Error Units
IntBenchmarks.setAWithWriter avgt 5 7.157 ± 0.143 us/op
IntBenchmarks.setAWithWriterInlinedIf avgt 5 7.433 ± 0.610 us/op
IntBenchmarks.setIntDirectly avgt 5 7.560 ± 0.589 us/op
IntBenchmarks.setWithValueHolder avgt 5 7.836 ± 0.011 us/op

Here is the code I ran for the writer tests:

 @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void setAWithWriter() {
    IntWriter writer = new IntWriterImpl(vector);
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      boolean isSet = i % 3 == 0;
      if (isSet) {
        writer.writeInt(i);
      }
      writer.setPosition(writer.getPosition()+1);
    }
  }

  @Benchmark
  @BenchmarkMode(Mode.AverageTime)
  @OutputTimeUnit(TimeUnit.MICROSECONDS)
  public void setAWithWriterInlinedIf() {
    IntWriter writer = new IntWriterImpl(vector);
    for (int i = 0; i < VECTOR_LENGTH; i++) {
      if (i % 3 == 0) {
        writer.writeInt(i);
      }
      writer.setPosition(writer.getPosition()+1);
    }
  }

this is on a Mac with JDK8.

@liyafan82
Copy link
Contributor Author

Below are the results after disabling bound check & null check for get:

Benchmark Mode Cnt Score Error Units
IntBenchmarks.setIntDirectly avgt 5 8.740 ± 0.028 us/op
IntBenchmarks.setWithValueHolder avgt 5 8.939 ± 0.019 us/op
IntBenchmarks.setWithWriter avgt 5 12.636 ± 1.408 us/op

The results are similar to above results.
I was running on a linux server, with JDK 8.

Do you think we need to switch to setting values directly?

@emkornfield
Copy link
Contributor

Do you think we need to switch to setting values directly?

Given the variability in the test I'm not sure it is worth the effort at this point. If we had numbers showing it directly affects either JDBC or Avro performance, I think we should make the change then.

@liyafan82 liyafan82 changed the title ARROW-6172 [Java] Avoid creating value holders repeatedly when reading data from JDBC ARROW-6172 [Java] Provide benchmarks to set IntVector with different methods Aug 14, 2019
@liyafan82
Copy link
Contributor Author

Do you think we need to switch to setting values directly?

Given the variability in the test I'm not sure it is worth the effort at this point. If we had numbers showing it directly affects either JDBC or Avro performance, I think we should make the change then.

@emkornfield Sounds reasonable.

I have updated the PR leaving only the benchmarks, and changed the title of the PR and JIRA accordingly. Please take a look.

@emkornfield
Copy link
Contributor

+1, LGTM.

@emkornfield
Copy link
Contributor

Thank you.

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…methods

We provide benchmarks to evaluate the performance of setting IntVector in 3 different ways:

through a value holder
through a writer
directly set the value through a set method

Closes apache#5042 from liyafan82/fly_0808_jdbc1 and squashes the following commits:

b19693b <liyafan82> ARROW-6172  Provide benchmarks to set IntVector with different methods

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants