Skip to content

Conversation

@trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Feb 13, 2019

This PR closes the following JIRAs:

  • ARROW-4552 - Add Table and Schema assign(other) implementations
  • ARROW-2764 - Easy way to create a new Table with an additional column
  • ARROW-4553 - Implement Schema/Field/DataType comparators
  • ARROW-4554 - Implement logic for combining Vectors with different lengths/chunksizes
  • ARROW-4555 - Add high-level Table and Column creation methods
  • ARROW-4557 - Add Table/Schema/RecordBatch selectAt(...indices) method

I extracted a few more high-level helper methods I've had laying around for creating, selecting, or manipulating Tables/Columns/Schemas/RecordBatches.

  1. We currently have a table.select(...colNames) implementation, so I also added a table.selectAt(...colIndices) method to complement. Super handy when you have duplicates.
  2. I added a basic table.assign(otherTable) impl. I added logic to compare Schemas/Fields/DataTypes in order to de-dupe reliably, which lives in the TypeComparator Visitor. I expose this via compareTo() methods on the Schema, Field, and DataType for ease of use. Bonus: the Writer can now discern between RecordBatches of the same stream whose Schemas aren't reference-equal.
  3. I've also added logic to distribute Vectors of different lengths (or different internal chunk sizes) evenly across RecordBatches, to support a nearly zero-copy Table#assign() impl. I say nearly zero-copy, because there's a bit of allocation/copying to backfill null bitmaps if chunks don't exactly line up. But this also means it's a bit easier now to create Tables or RecordBatches from values in-memory whose lengths may not exactly line up:
const table = Table.new(
  Column.new('foo', IntVector.from(arange(new Int32Array(10))),
  Column.new('bar', FloatVector.from(arange(new Float32Array(100))))
);
  1. And lastly, I added some more more tests to ensure various combinations of select/slice/concat/assign can round-trip through IPC and back again.
const table1 = Table.new(
    Column.new('a', Int32Vector.from(i32s)),
    Column.new('b', Float32Vector.from(f32s)),
    Column.new('c', Float64Vector.from(f64s))
);

const table2 = Table.new(
    Column.new('d', Utf8Vector.from(strs)),
    Column.new('d', BoolVector.from(bools)),
    Column.new('d', Int32Vector.from(i32s)),
);

const table3 = table1.select('b', 'c').assign(table2.selectAt(0, 1));

console.log(table3.schema.fields)
// > [
// >     ('b', Float32), 
// >     ('c', Float64), 
// >     ('d', Utf8), 
// >     ('d', Bool)
// > ]

(cc: @domoritz)

…r [Vector[], name | Field[]] arguments

Supports creating tables from variable-length columns. Distributes inner chunks uniformly across RecordBatches without copying data (null bitmaps may be copied)
@codecov-io
Copy link

codecov-io commented Feb 13, 2019

Codecov Report

Merging #3634 into master will increase coverage by 2.65%.
The diff coverage is 82.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3634      +/-   ##
==========================================
+ Coverage    87.7%   90.36%   +2.65%     
==========================================
  Files         685       74     -611     
  Lines       83743     5325   -78418     
  Branches     1081     1200     +119     
==========================================
- Hits        73449     4812   -68637     
+ Misses      10183      504    -9679     
+ Partials      111        9     -102
Impacted Files Coverage Δ
js/src/vector/float.ts 78.57% <ø> (+21.42%) ⬆️
js/src/ipc/reader.ts 88.06% <ø> (ø) ⬆️
js/src/vector/int.ts 69.23% <0%> (+3.84%) ⬆️
js/src/compute/dataframe.ts 92.15% <100%> (ø) ⬆️
js/src/ipc/writer.ts 90.29% <100%> (ø) ⬆️
js/src/vector.ts 100% <100%> (ø) ⬆️
js/src/ipc/node/reader.ts 100% <100%> (ø) ⬆️
js/src/visitor/vectorassembler.ts 82.17% <100%> (-0.85%) ⬇️
js/src/vector/chunked.ts 83.56% <100%> (ø) ⬆️
js/src/ipc/node/writer.ts 100% <100%> (ø) ⬆️
... and 639 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ed6fb5...bfbcc8b. Read the comment docs.

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

I'm a little concerned about allowing users to create Tables from different length vectors though. Is that something that's allowed in the other implementations? It feels like a case where the user is probably making a mistake and we should throw an error.

js/src/table.ts Outdated
public static fromVectors<T extends { [key: string]: DataType; } = any>(vectors: VType<T[keyof T]>[], names?: (keyof T)[]) {
return new Table(RecordBatch.from(vectors, names));
public static fromVectors<T extends { [key: string]: DataType; } = any>(vectors: Vector<T[keyof T]>[], fields?: (keyof T | Field<T[keyof T]>)[]) {
return Table.new<T>(vectors, fields);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the pattern where from methods are intended for conversions from other types and new methods are effectively constructors where we can use generics.

Do you think we should just go ahead and get rid of this to follow that pattern? or mark it deprecated or something? It's only been part of 0.4.0 for a couple of days I don't think anyone's using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down for either. I couldn't remember if this method predated 0.4 or not, so I left it unchanged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyone would expect constructors instead of .new methods. Since we are pre 1.0 everything is fair game and you can be backwards incompatible in minor version bumps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domoritz agreed, I say let's just rename this (and RecordBatch.from) as an overload of new, we haven't even put out a blog post or anything touting these functions in 0.4.0. We can just tout the new versions instead.

vectors.reduce((len, vec) => Math.max(len, vec.length), 0),
vectors
);
public static from<T extends { [key: string]: DataType } = any>(chunks: (Data<T[keyof T]> | Vector<T[keyof T]>)[], names: (keyof T)[] = []) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also be deprecated in favor of a new version?

@trxcllnt
Copy link
Contributor Author

trxcllnt commented Feb 13, 2019

@TheNeuralBit: I'm a little concerned about allowing users to create Tables from different length vectors though. Is that something that's allowed in the other implementations? It feels like a case where the user is probably making a mistake and we should throw an error.

I don't agree that error-throwing would be appropriate here for two main reasons:

  1. Practically speaking, aligning Column lengths before creating Tables is an unnecessary burden on library consumers. It involves first knowing or finding the max column length, then re-scanning all the columns to concatenate null chunks to the shorter ones. Presently this a non-trivial task (considering the internal Vector buffer layout) and/or would be slower to do externally (copies, suboptimal null bitmap creation, etc). I think it's quite a bit more neighborly of us to do this automatically.

  2. On principle, node idioms suggest Errors should be used to communicate catastrophic unrecoverable failure modes, indicating program[mer] errors instead of logic errors. Tools like TS enforce most of these (e.g. it won't compile if you pass a string to a function that accepts numbers), but there are others (null reference, garbage data, etc.) that can happen at runtime, and error-throwing is appropriate then.

I'll clarify the behavior in this PR a bit further, since it's is really two features rolled into one.

First, there's the re-chunking of the Table based on the chunk layout of the Columns we're joining. This features says if we're combining two columns of equal length into a table, the table will contain as many RecordBatches as the child with the most chunks. This is 100% zero-copy, since it's only slicing the Vectors and distributing them evenly by lowest common chunk size:

a = Column.new('a', [1, 2, 3, 4, 5, 6])
b = Column.new('b', [1, 2, 3], [4, 5, 6])
assert(Table.new(a, b).chunks.length === 2)
// row_id | a | b   ___
//      0 | 1 | 1      |
//      1 | 2 | 2      | <-- RecordBatch 1
//      2 | 3 | 3   ___|
//      3 | 4 | 4      |
//      4 | 5 | 5      | <-- RecordBatch 2
//      5 | 6 | 6   ___|

I'm not certain whether it uses the same algorithm, but I think I recall @wesm mentioning the C++ implementation does support re-chunking.

The second part deals with Columns of different lengths. For assign(), this means re-chunking up to the length of each column, then filling any remaining slots with nulls:

a = Column.new('a', [1, 2, 3, 4])
b = Column.new('b', [1, 2, 3], [4, 5, 6], [7, 8, 9])
assert(Table.new(a, b).chunks.length === 4)
// row_id |    a | b   ___
//      0 |    1 | 1      |
//      1 |    2 | 2      | <-- RecordBatch 1
//      2 |    3 | 3   ___|
//      3 |    4 | 4   ___| <-- RecordBatch 2
//      4 | null | 5      |
//      5 | null | 6   ___| <-- RecordBatch 3
//      6 | null | 7      |
//      7 | null | 8      | <-- RecordBatch 4
//      8 | null | 9   ___|

This example illustrates both features working in concert:

  1. Column A is split into two chunks (sizes [3, 1]), because B's first chunk size is 3
  2. Column A's second chunk has 1 data value, so Column B's second chunk is split into two, since the LCD of A and B's second chunk is 1
  3. Column B's third chunk is the leftover from the previous split, and its length is 2. Column A has no more chunks, so a null-filled chunk of length 2 is created as a placeholder
  4. Column B's last chunk is left un-split, and another all-null chunk is created for A

@TheNeuralBit
Copy link
Member

Yeah I agree that the ability to pad short columns with nulls up to the length of the longest column is a nice bit of code to give our users when they need it, I'm just not sure it's always the right thing to do. We could still give this functionality to users somehow without making it the default.

I really think most of the time someone is trying to create a Table from columns with an unequal length it's because they messed up somewhere, not because they want the shorter ones padded with nulls.

All that being said, I did some tests with pandas and pyarrow to see how they handle this. pandas will let you create a DataFrame from unequal length Series:

  In [14]: pd.DataFrame({'a': pd.Series([1,2,3]), 'b': pd.Series([1,2,3,4,5,6])})
Out[14]: 
     a  b
0  1.0  1
1  2.0  2
2  3.0  3
3  NaN  4
4  NaN  5
5  NaN  6

pyarrow does not, but pyarrow is lower-level than I think we are trying to target with arrow JS:

pa.RecordBatch.from_arrays([pa.Array.from_pandas(pd.Series([1,2,3])), pa.Array.from_pandas(pd.Series([1,2,3,4]))], ['a ', 'b'])
// ValueError

Given the precedent in pandas I could relent on this point.

@trxcllnt
Copy link
Contributor Author

@TheNeuralBit Yeah this behavior was inspired by pandas df.assign(). I do feel like padding with nulls is the friendliest way to go considering the alternatives -- if you accidentally combine some columns with uneven lengths and didn't mean to, you could always table.slice() after the fact to remove any unwanted rows from the end. Perhaps we should add table.setColumnAt(idx, col), to make it easier to correct on a per-column basis:

let vals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
let table = Table.new(Column.new('a', vals.slice(0, 4)),  Column.new('b', vals))
// correct the mistake
let a = table.getColumn('a').slice(0, 4)
let a2 = IntVector.new(vals.slice(4))
table.setColumn('a', a.concat(a2))

@trxcllnt
Copy link
Contributor Author

trxcllnt commented Feb 15, 2019

@domoritz I borrowed the Ctor.new() initialization pattern from tensorflow-js. It's a bit more flexible, since in TS constructors can't have generics of their own and can't be typed to return anything other than the exact type for which they're a constructor. I first used it in Vector.new() as an easy way to create the concrete Vector subclass for any Arrow DataType:

const data = Data.Int(new Int64(), ...rest));
const vec = Vector.new(data); // vec type is inferred to Int64Vector
assert(vec instanceof Int64Vector) // true
assert(typeof vec.toBigInt64Array == 'function') // has any methods specific to the Int64Vector
assert(vec.toBigInt64Array() instanceof BigInt64Array) // true

@domoritz
Copy link
Member

I see, that makes sense. I also think it's better not to have two different ways of initializing objects (constructors and .new) depending on whether you need to use generics in the constructor or not.

@trxcllnt
Copy link
Contributor Author

@domoritz I forgot to mention the distinction between Ctor.from() vs. Ctor.new(). Ctor.new is meant to be a more flexible constructor, since it's easier to define polymorphic argument lists and narrow the return signatures. Ctor.from is meant to initialize an instance from non-library types or data sources, which may require parsing, compute, or copies. An example in the core library is Array.from(), which takes Array-likes, Iterables, and an optional map function.

@TheNeuralBit
Copy link
Member

On the null-padding point: Maybe the right thing to do is to add a docstring and put a note about padding unequal length vectors.

Then at least people browsing the API docs would know it's happening.

@domoritz
Copy link
Member

What do you think about using console.warn?

@TheNeuralBit
Copy link
Member

I'd be fine with it if @trxcllnt is

@trxcllnt
Copy link
Contributor Author

👍 for adding a note in the doc description

👎 for console.warn() because it doesn't play nice with production log strategies. It's difficult in general for libraries to log output test-ably/cross-platform/efficiently without taking on external dependencies, exposing hooks, creating/maintaining custom debugging tools, or dictating something else about the user's setup.

Relying on globals like console forces people into awful workarounds like patching the console Object. debug also requires patching and doesn't produce logs that are easy to ingest. warning used by React expects the warnings to be compiled out via Webpack and would require us to start building/shipping dev versions of our UMD bundles.

@domoritz
Copy link
Member

Vega uses a tiny logger that users can override (https://github.com/vega/vega/blob/master/packages/vega-util/src/logger.js). Would it make sense for arrow to add a logger as well?

@trxcllnt
Copy link
Contributor Author

@domoritz it looks like that's using the global console, which causes pains for production log ingestion setups. In general I'd prefer not to reimplement logging in a dataframe library.

Furthermore, I'm not convinced this is even warn-level behavior since it's only allocating a few bitmaps. Pandas doesn't warn here either.

@domoritz
Copy link
Member

First, this is just the default logger and you can override it to print to something else. Second, I somewhat disagree that filling nulls when sizes mismatch is something that should go unnoticed. I think it is valuable to fill nulls but not showing any warnings will lead to errors that are hard to debug. I am okay with deferring this discussion but I feel that at some point arrow will need warnings.

@trxcllnt
Copy link
Contributor Author

trxcllnt commented Feb 15, 2019

@domoritz I agree with you on principle, we will probably want to log things. I do think doing it in a way that's easy to configure externally is important, but outside the scope of this PR. Could you add a JIRA issue so someone can pick it up after this is merged?

@domoritz
Copy link
Member

Here it is https://issues.apache.org/jira/browse/ARROW-4588

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this once you address these items:

  • Remove Table.fromVectors, we'll direct users to Table.new instead
  • Rename RecordBatch.from to new (technically this did exist prior to 0.4, so if you'd rather just indicate it's deprecated somehow that's fine with me)
  • Add a docstring to Table.new describing that it handles unequal length columns, but will allocate memory

@trxcllnt trxcllnt force-pushed the js/high-level-table-column-fns branch from 053db43 to 1758063 Compare February 20, 2019 19:45
@trxcllnt
Copy link
Contributor Author

@TheNeuralBit ok, I removed Table.fromVectors added RecordBatch.new, and added the following description the Table.new tsdoc:
screenshot from 2019-02-20 11-46-42

@TheNeuralBit
Copy link
Member

Perfect! thank you

TheNeuralBit pushed a commit that referenced this pull request Feb 23, 2019
…support

This started as a continuation of #3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt#8. I'll rebase this PR after #3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f7 <ptaylor> cleanup after rebase
f44e97b <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081a <ptaylor> fix lint
6046e66 <ptaylor> remove more getters in favor of readonly direct property accesses
94d5633 <ptaylor> support BigInt in comparitor/indexOf
760a219 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd40 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204 <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66 <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea5 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf40 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d <ptaylor> guard against out-of-bounds selections
a4222f8 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1f <ptaylor> add Table and Schema assign() impls
79f9db1 <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
@asfimport asfimport mentioned this pull request Aug 17, 2019
kou pushed a commit to apache/arrow-js that referenced this pull request May 14, 2025
…support

This started as a continuation of apache/arrow#3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt/arrow#8. I'll rebase this PR after apache/arrow#3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f77 <ptaylor> cleanup after rebase
f44e97b3 <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081ad <ptaylor> fix lint
6046e660 <ptaylor> remove more getters in favor of readonly direct property accesses
94d56334 <ptaylor> support BigInt in comparitor/indexOf
760a2199 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd402 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204e <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66f <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea55 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf406 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d4 <ptaylor> guard against out-of-bounds selections
a4222f81 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c0 <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1fd <ptaylor> add Table and Schema assign() impls
79f9db1c <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
QuietCraftsmanship pushed a commit to QuietCraftsmanship/arrow that referenced this pull request Jul 7, 2025
…support

This started as a continuation of apache/arrow#3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt/arrow#8. I'll rebase this PR after apache/arrow#3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f77 <ptaylor> cleanup after rebase
f44e97b3 <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081ad <ptaylor> fix lint
6046e660 <ptaylor> remove more getters in favor of readonly direct property accesses
94d56334 <ptaylor> support BigInt in comparitor/indexOf
760a2199 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd402 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204e <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66f <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea55 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf406 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d4 <ptaylor> guard against out-of-bounds selections
a4222f81 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c0 <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1fd <ptaylor> add Table and Schema assign() impls
79f9db1c <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants