Skip to content

Conversation

@TheNeuralBit
Copy link
Member

@TheNeuralBit TheNeuralBit commented Feb 10, 2019

It seems that using Object.create in the bind method slows down proxy generation substantially. I separated the Row and RowProxyGenerator concepts - now the generator creates a prototype based on Row and the parent vector when it is created, and constructs instances based on that prototype repeatedly in bind.

This change improved the included benchmark from 1,114.78 ms to 154.96 ms on my laptop (Intel i5-7200U).

new a prototype rather than using \`Object.create\`
@TheNeuralBit TheNeuralBit changed the title ARROW-4523: [JS] Add row proxy generation benchmark ARROW-{4523,4524}: [JS] Add row proxy generation benchmark, improve performance Feb 10, 2019
@codecov-io
Copy link

codecov-io commented Feb 10, 2019

Codecov Report

Merging #3601 into master will increase coverage by 3.11%.
The diff coverage is 94.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3601      +/-   ##
==========================================
+ Coverage   87.76%   90.87%   +3.11%     
==========================================
  Files         673       71     -602     
  Lines       82837     4965   -77872     
  Branches     1069     1081      +12     
==========================================
- Hits        72702     4512   -68190     
+ Misses      10020      447    -9573     
+ Partials      115        6     -109
Impacted Files Coverage Δ
js/src/vector/map.ts 77.77% <100%> (ø) ⬆️
js/src/vector/struct.ts 77.77% <100%> (ø) ⬆️
js/src/visitor/get.ts 100% <100%> (ø) ⬆️
js/src/vector/row.ts 78.57% <88.37%> (+1.64%) ⬆️
js/src/util/vector.ts 95.45% <98.3%> (+1.9%) ⬆️
python/pyarrow/ipc.pxi
cpp/src/arrow/csv/chunker-test.cc
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
cpp/src/arrow/array/builder_decimal.cc
... and 599 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f0bcd8...2442545. Read the comment docs.

@trxcllnt
Copy link
Contributor

@TheNeuralBit I could be wrong, but this seems like a bigger change than necessary to fix the problem? Isn't the issue with the second argument to Object.create() here? If so, couldn't we remove the extra property descriptor map, and set bound.rowIndex=index?

@trxcllnt
Copy link
Contributor

trxcllnt commented Feb 11, 2019

@TheNeuralBit PR here: TheNeuralBit#2. I made the internal Row properties Symbols to avoid naming collisions, and switched the new BoundRow() constructor invocation back to a bare Object.create(this) call, yielding the following results:

Current version of proxy-bench:

Table Iterate "tracks":
length: 1000000
 x 6.88 ops/sec ±0.82% (21 runs sampled)
   avg: 145.4ms
   872.4% of a frame @ 60FPS 

This PR:

Table Iterate "tracks":
length: 1000000
 x 8.36 ops/sec ±1.02% (25 runs sampled)
   avg: 119.58ms
   717.48% of a frame @ 60FPS 

@TheNeuralBit
Copy link
Member Author

@trxcllnt yeah that's a fair point, I didn't think to try Object.create without the extra property descriptor map - definitely fine switching to that since it's faster!

I still am partial to separating out the RowProxyGenerator and the Rows themselves though. I think it's less confusing, and also they won't pollute each other with methods that are only relevant for one or the other (e.g. with the Object.create(this) approach, table.get(0).bind refers to Row.bind).

I pushed up some commits to demonstrate it, perf seems comparable:

Table Iterate "tracks":
length: 1000000
 x 8.56 ops/sec ±1.15% (25 runs sampled)
   avg: 116.78ms
   700.68% of a frame @ 60FPS

Still needs some cleanup and a switch to Symbols if we go with it

Copy link
Contributor

@trxcllnt trxcllnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. use Symbols for internal names
  2. No extra inner Row class generation
  3. Update comparator for symbols/clarity

@trxcllnt
Copy link
Contributor

@TheNeuralBit looks good to me 👍. last question is do we want to benchmark changing the column index lookups as Symbols?

@TheNeuralBit
Copy link
Member Author

I think I'd rather merge this as is, and do that in another PR, I made ARROW-4551 to track it

@TheNeuralBit TheNeuralBit changed the title ARROW-{4523,4524}: [JS] Add row proxy generation benchmark, improve performance ARROW-4523,ARROW-4524: [JS] Add row proxy generation benchmark, improve performance Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants