Skip to content

Conversation

@liyafan82
Copy link
Contributor

This is a follow up Jira for the improvement suggested by Francois Saint-Jacques in the PR for
#4925

After careful analysis and proof, I found @fsaintjacques 's proposal really works. The idea behind the algorithm is tricky. This PR is the implementation based on @fsaintjacques 's proposal.

The benefit of this implementation:

  1. the code is much clearer
  2. there is fewer branches, which is beneficial for performance.

The drawback of this implementation:

  1. there can be dumb iterations of the main loop. That is, the while loop may run a few more times after the "solution" is found, just to wait for the loop termination condition is satisfied.

However, the time complexity of the algorithm is still O(logn), and the number of dumb iterations can not be large.

@codecov-io
Copy link

codecov-io commented Aug 5, 2019

Codecov Report

Merging #5011 into master will increase coverage by 2.1%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5011      +/-   ##
==========================================
+ Coverage   87.57%   89.67%    +2.1%     
==========================================
  Files        1008      670     -338     
  Lines      143790    99342   -44448     
  Branches     1418        0    -1418     
==========================================
- Hits       125924    89088   -36836     
+ Misses      17504    10254    -7250     
+ Partials      362        0     -362
Impacted Files Coverage Δ
cpp/src/arrow/csv/column-builder.cc 95.29% <0%> (-1.77%) ⬇️
cpp/src/plasma/thirdparty/ae/ae.c 70.75% <0%> (-0.95%) ⬇️
r/src/recordbatch.cpp
r/R/Table.R
js/src/util/fn.ts
go/arrow/array/bufferbuilder.go
r/src/symbols.cpp
rust/datafusion/src/execution/projection.rs
rust/datafusion/src/execution/filter.rs
rust/arrow/src/csv/writer.rs
... and 331 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 229fe33...f47a2e7. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method name seems odd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. thank you so much.

@emkornfield
Copy link
Contributor

Small nit, otherwise LGTM.

@emkornfield
Copy link
Contributor

+1, LGTM

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…angeSearcher

This is a follow up Jira for the improvement suggested by Francois Saint-Jacques in the PR for
apache#4925

After careful analysis and proof, I found @fsaintjacques 's proposal really works. The idea behind the algorithm is tricky. This PR is the implementation based on @fsaintjacques 's proposal.

The benefit of this implementation:
1. the code is much clearer
2. there is fewer branches, which is beneficial for performance.

The drawback of this implementation:
1. there can be dumb iterations of the main loop. That is, the while loop may run a few more times after the "solution" is found, just to wait for the loop termination condition is satisfied.

However, the time complexity of the algorithm is still O(logn), and the number of dumb iterations can not be large.

Closes apache#5011 from liyafan82/fly_0805_range and squashes the following commits:

f47a2e7 <liyafan82>  reduce branches in algo for first match in VectorRangeSearcher

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants