Commit e327747
ARROW-1990: [JS] C++ Refactor, Add DataFrame
This PR moves the `Table` class out of the Vector hierarchy and adds optimized dataframe operations to it. Currently implements an optimized `scan()` method, `filter(predicate)`, `count()`, and `countBy(column_name)` (only works on dictionary-encoded columns).
Some usage examples, based on the file generated by `js/test/data/tables/generate.py`:
``` js
> let table = Table.from(...);
> table.count()
1000000
> table.filter(col('lat').gteq(0)).count()
499718
> table.countBy('origin').toJSON()
{ Charlottesville: 166839,
'New York': 166251,
'San Francisco': 166642,
Seattle: 166659,
'Terre Haute': 166756,
'Washington, DC': 166853 }
> table.filter(col('lng').gteq(0)).countBy('origin').toJSON()
{ Charlottesville: 83109,
'New York': 83221,
'San Francisco': 83515,
Seattle: 83362,
'Terre Haute': 83314,
'Washington, DC': 83479 }
```
There are performance tests for the dataframe operations, to run them you must first generate the test data by running `npm run create:perfdata`.
The PR also includes @trxcllnt's refactor of the JS implementation to make it more closely resemble the C++ implementation. This refactor resolves multiple JIRAs: ARROW-1903, ARROW-1898, ARROW-1502, ARROW-1952 (partially), and ARROW-1985
Author: Paul Taylor <paul.e.taylor@me.com>
Author: Brian Hulette <brian.hulette@ccri.com>
Author: Brian Hulette <hulettbh@gmail.com>
Closes apache#1482 from TheNeuralBit/table-scan-perf and squashes the following commits:
52f1e0e [Brian Hulette] <, > are not commutative, misc cleanup
04b1838 [Brian Hulette] even more table tests
16b9ccb [Brian Hulette] Merge pull request #4 from trxcllnt/js-cpp-refactor
fe300df [Paul Taylor] fix closure es5/umd toString() iterator
3d5240a [Paul Taylor] fix more externs
10c48ad [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
dbe7f81 [Brian Hulette] Add more Table unit tests
1910962 [Brian Hulette] Add optional bind callback to scan
5bdf17f [Brian Hulette] Fix perf
8cf2473 [Brian Hulette] Merge remote-tracking branch 'origin/master' into table-scan-perf
4a41b18 [Paul Taylor] add src/predicate to the list of exports we should save from uglify
5a91fab [Paul Taylor] add more view, predicate externs
f6adfb3 [Brian Hulette] Create predicate namespace
f7bb0ed [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
e148ee4 [Paul Taylor] Merge branch 'extern-woes' into js-cpp-refactor
25cdc4a [Paul Taylor] add src/predicate to the list of exports we should save from uglify
dc7c728 [Paul Taylor] add more view, predicate externs
25e6af7 [Brian Hulette] Create predicate namespace
579ab1f [Brian Hulette] Merge pull request #2 from trxcllnt/js-cpp-refactor
f3cde1a [Paul Taylor] fix lint
9769773 [Paul Taylor] fix vector perf tests
016ba78 [Brian Hulette] Merge pull request #1 from trxcllnt/js-cpp-refactor
272d293 [Paul Taylor] Merge pull request #4 from ccri/empty-table
7bc7363 [Brian Hulette] Fix exception for empty Table
8ddce0a [Paul Taylor] check bounds in getChildAt(i) to avoid NPEs
f1dead0 [Paul Taylor] compute chunked nested childData list correctly
18807c6 [Paul Taylor] rename ChunkData's fields so it's more clear they're not semantically similar to other similarly named fields
7e43b78 [Paul Taylor] add test:integration npm script
a5f200f [Paul Taylor] Merge pull request #3 from ccri/table-from-struct
c8cd286 [Brian Hulette] Add Table.fromStruct
a00415e [Brian Hulette] Fix perf
54d4f5b [Paul Taylor] lazily allocate table and recordbatch columns, support NestedView's getChildAt(i) method in ChunkedView
40b3638 [Paul Taylor] run integration tests with local data for coverage stats
fe31ee0 [Paul Taylor] slice the flat data values before returning an iterator of them
e537789 [Paul Taylor] make it easier to run all integration tests from local data
c0fd2f9 [Paul Taylor] use the dictionary of the last chunked vector list for chunked dictionary vectors
e33c068 [Paul Taylor] Merge pull request #2 from ccri/fixed-size-list
5bb63af [Brian Hulette] Don't read OFFSET vector for FixedSizeList
614b688 [Paul Taylor] add asEpochMs to date and timestamp vectors
87334a5 [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
b7f5bfb [Paul Taylor] rename numRows to length, add table.getColumn()
e81082f [Paul Taylor] export vector views, allow cloning data as another type
700a47c [Paul Taylor] export visitors
e859e13 [Paul Taylor] fix package.json bin entry
0620cfd [Brian Hulette] use Math.fround
0126dc4 [Brian Hulette] Don't recompute total length
e761eee [Brian Hulette] Rename asJSON to toJSON
6c91ed4 [Paul Taylor] Merge branch 'master' of github.com:apache/arrow into js-cpp-refactor-merge_with-table-scan-perf
d2b18d5 [Paul Taylor] Merge remote-tracking branch 'ccri/table-scan-perf' into js-cpp-refactor-merge_with-table-scan-perf
f3f3b86 [Paul Taylor] rename table.ts to recordbatch.ts in preparation for merging latest
e3f629d [Paul Taylor] fix rest of the mangling issues
fa7c17a [Paul Taylor] passing all tests except es5 umd mangler ones
e20decd [Brian Hulette] Add license headers
edcbdbe [Brian Hulette] cleanup
20717d5 [Brian Hulette] Fixed countBy(string)
7244887 [Brian Hulette] Add table unit tests...
6719147 [Brian Hulette] Add DataFrame.countBy operation
2f4a349 [Brian Hulette] Minor tweaks
2e118ab [Brian Hulette] linter
a788db3 [Brian Hulette] Cleanup
a9fff89 [Brian Hulette] Move Table out of the Vector hierarchy
1d60aa1 [Brian Hulette] Moved DataFrame ops to Table. DataFrame is now an interface
e8979ba [Brian Hulette] Refactor DataFrame to extend Vector<StructRow>
6a41d68 [Brian Hulette] clean up table benchmarks
2744c63 [Brian Hulette] Remove Chunked/Simple DataFrame distinction
aa999f8 [Brian Hulette] Add DictionaryVector optimization for equals predicate
4d9e8c0 [Brian Hulette] Add concept of predicates for filtering dataframes
796f45d [Brian Hulette] add DataFrame filter and count ops
30f0330 [Brian Hulette] Add basic DataFrame impl ...
a1edac2 [Brian Hulette] Add perf tests for table scans
d18d915 [Paul Taylor] fix struct and map rows
61dc699 [Paul Taylor] WIP -- refactor types to closer match arrow-cpp
62db338 [Paul Taylor] update dependencies and add es6+ umd targets to jest transform ignore patterns to fix ci
6ff18e9 [Paul Taylor] ship es2015 commonJS in main package to avoid confusion
74e828a [Paul Taylor] fix typings issues (ARROW-1903)1 parent f84af8f commit e327747
File tree
67 files changed
+6025
-3065
lines changed- js
- bin
- gulp
- perf
- src
- bin
- fb
- format
- ipc
- reader
- reader
- util
- vector
- traits
- test
- data/tables
- integration
- unit
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
67 files changed
+6025
-3065
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
| |||
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
32 | | - | |
| 34 | + | |
| 35 | + | |
33 | 36 | | |
34 | 37 | | |
35 | 38 | | |
36 | 39 | | |
37 | | - | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| |||
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
69 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
70 | 100 | | |
71 | 101 | | |
72 | 102 | | |
73 | | - | |
| 103 | + | |
74 | 104 | | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
75 | 109 | | |
76 | | - | |
77 | | - | |
| 110 | + | |
78 | 111 | | |
79 | 112 | | |
80 | 113 | | |
81 | 114 | | |
82 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
83 | 127 | | |
84 | 128 | | |
85 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
18 | 22 | | |
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
24 | 28 | | |
25 | | - | |
26 | | - | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | | - | |
31 | | - | |
| 32 | + | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
41 | 64 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | 49 | | |
51 | 50 | | |
52 | 51 | | |
| |||
60 | 59 | | |
61 | 60 | | |
62 | 61 | | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
68 | 67 | | |
69 | 68 | | |
70 | | - | |
| 69 | + | |
| 70 | + | |
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
51 | | - | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
72 | 75 | | |
73 | 76 | | |
74 | 77 | | |
75 | 78 | | |
76 | | - | |
77 | | - | |
78 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
79 | 82 | | |
80 | 83 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
51 | | - | |
52 | | - | |
53 | 53 | | |
54 | | - | |
55 | | - | |
| 54 | + | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
91 | 97 | | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
92 | 101 | | |
93 | 102 | | |
94 | 103 | | |
| |||
104 | 113 | | |
105 | 114 | | |
106 | 115 | | |
107 | | - | |
108 | | - | |
| 116 | + | |
| 117 | + | |
109 | 118 | | |
110 | 119 | | |
111 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | 111 | | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
116 | 115 | | |
117 | | - | |
118 | | - | |
| 116 | + | |
| 117 | + | |
119 | 118 | | |
120 | 119 | | |
121 | 120 | | |
| |||
0 commit comments