ARROW-6458: [Java] Remove value boxing/unboxing for ApproxEqualsVisitor #5304

liyafan82 · 2019-09-06T02:11:38Z

As discussed in #5195 (comment), there are some problems with the current ways of comparing floating point vectors, we solve them in this PR:

there are if statements/duplicated members in ApproxEqualsVisitor, making the code redundant and less clear.
the comparion of float4 and float8 are based on wrapped objects Float and Double, which may have performance penalty.

emkornfield

@liyafan82 I think this change should probably be broken into two different changes. One that avoid boxing/unboxing and a second one that consolidates code. The code consolidation has the potential to actually hurt performance. As part of the boxing/unboxing we should also introduce a performance test to verify it actually improves performance (which can then be used to measure the impact of the code consolidation behind the interface).

liyafan82 · 2019-09-07T04:34:38Z

@liyafan82 I think this change should probably be broken into two different changes. One that avoid boxing/unboxing and a second one that consolidates code. The code consolidation has the potential to actually hurt performance. As part of the boxing/unboxing we should also introduce a performance test to verify it actually improves performance (which can then be used to measure the impact of the code consolidation behind the interface).

It is a good suggestion to split into two issues for the two problems. I will revise the code and provide the performance benchmark accordingly.

liyafan82 · 2019-09-09T13:10:45Z

@emkornfield I have split the issue into two, according to your suggestion.
This is the first part, to remove boxing/unboxing of floating point values.

Performance benchmark show that there is a performance improvement of over 10%:

Before:
Float8Benchmarks.approxEqualsBenchmark avgt 5 7.161 ± 0.026 us/op

After
Float8Benchmarks.approxEqualsBenchmark avgt 5 6.355 ± 0.004 us/op

emkornfield · 2019-09-10T02:42:03Z

java/performance/src/test/java/org/apache/arrow/vector/Float8Benchmarks.java

please add float4 vectors as part of this benchmark.

Sure. I will add benchmarks for float4 to class Float4Benchmarks.

Please consolidate into 1 floatbenchmarks class and a single benchmark. (reasoning behind this: http://insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/)

Sure. Thanks for the article. We have also found some problems with megamorphic methods in the current code base. Will improve them later.

For things not on the hot path it probably isn't worth changing. The only reason I'm bringing it up here is because the code seemed to already written to avoid those calls. If there are a lot of places you intend to fix then having a discussion on the mailing list would be a good idea.

Sure. Sounds reasonable.

emkornfield · 2019-09-10T02:53:14Z

java/vector/src/main/java/org/apache/arrow/vector/compare/ApproxEqualsVisitor.java

I think this change is too much, I think the right thing to do is introduce an interface is to change DiffFunction into two interface.

DiffFunctionFloat {
bool approxEquals(float v1, float v2);
}

DiffFunctionDouble {
bool approxEquals(double v1, double v2);
}

and

I see. Will revise accordingly.

@emkornfield
I have revised the code to make the changes smaller.
And I have merged the benchmarks to a single class and a single benchmark, according to your suggestion.

The benchmark results are as follows:
Before:
FloatingPointBenchmarks.approxEqualsBenchmark avgt 5 14.480 ± 0.023 us/op

After:
FloatingPointBenchmarks.approxEqualsBenchmark avgt 5 13.517 ± 0.041 us/op

By removing boxing/unboxing, we have a performance improvement of about 6.7%

emkornfield

See comments.

codecov-io · 2019-09-16T12:36:46Z

Codecov Report

Merging #5304 into master will increase coverage by 0.98%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5304      +/-   ##
==========================================
+ Coverage   88.59%   89.57%   +0.98%     
==========================================
  Files         950      702     -248     
  Lines      126185   107156   -19029     
  Branches     1495        0    -1495     
==========================================
- Hits       111791    95990   -15801     
+ Misses      14029    11166    -2863     
+ Partials      365        0     -365

Impacted Files	Coverage Δ
cpp/src/arrow/filesystem/s3_internal.h	`90.74% <0%> (-3.71%)`	⬇️
cpp/src/arrow/json/converter.cc	`90.05% <0%> (-1.76%)`	⬇️
cpp/src/arrow/json/chunked_builder.cc	`79.91% <0%> (-1.68%)`	⬇️
cpp/src/plasma/thirdparty/ae/ae.c	`70.75% <0%> (-0.95%)`	⬇️
python/pyarrow/tests/test_parquet.py	`96.09% <0%> (-0.12%)`	⬇️
r/src/recordbatch.cpp
go/arrow/math/uint64_amd64.go
r/R/list.R
r/src/symbols.cpp
r/src/array_to_vector.cpp
... and 244 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abc7860...907c17d. Read the comment docs.

emkornfield · 2019-09-17T04:38:58Z

+1, thank you.

As discussed in apache#5195 (comment), there are some problems with the current ways of comparing floating point vectors, we solve them in this PR: 1. there are if statements/duplicated members in ApproxEqualsVisitor, making the code redundant and less clear. 2. the comparion of float4 and float8 are based on wrapped objects Float and Double, which may have performance penalty. Closes apache#5304 from liyafan82/fly_0905_float and squashes the following commits: 907c17d <liyafan82> Remove value boxing/unboxing for ApproxEqualsVisitor Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>

emkornfield added the Component: Java label Sep 6, 2019

emkornfield requested changes Sep 7, 2019

View reviewed changes

liyafan82 force-pushed the fly_0905_float branch from 530aa0a to 4ce7f8d Compare September 9, 2019 13:06

liyafan82 changed the title ~~ARROW-6458: [Java] Improve the performance and code structure for ApproxEqualsVisitor~~ ARROW-6458: [Java] Remove value boxing/unboxing for ApproxEqualsVisitor Sep 9, 2019

emkornfield reviewed Sep 10, 2019

View reviewed changes

emkornfield requested changes Sep 10, 2019

View reviewed changes

liyafan82 force-pushed the fly_0905_float branch from 4ce7f8d to ce46bcf Compare September 12, 2019 13:24

[ARROW-6458][Java] Remove value boxing/unboxing for ApproxEqualsVisitor

907c17d

liyafan82 force-pushed the fly_0905_float branch from ce46bcf to 907c17d Compare September 16, 2019 11:46

emkornfield closed this in 645307c Sep 17, 2019

asfimport mentioned this pull request Sep 17, 2019

[Java] Remove value boxing/unboxing for ApproxEqualsVisitor #22828

Closed

Uh oh!

ARROW-6458: [Java] Remove value boxing/unboxing for ApproxEqualsVisitor #5304

ARROW-6458: [Java] Remove value boxing/unboxing for ApproxEqualsVisitor #5304

Uh oh!

Conversation

liyafan82 commented Sep 6, 2019

Uh oh!

emkornfield left a comment

Choose a reason for hiding this comment

Uh oh!

liyafan82 commented Sep 7, 2019

Uh oh!

liyafan82 commented Sep 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emkornfield left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Sep 16, 2019

Codecov Report

Uh oh!

emkornfield commented Sep 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants