Skip to content

Conversation

@trxcllnt
Copy link
Owner

@trxcllnt trxcllnt commented Feb 14, 2019

  • Fix IntVector.from() and FloatVector.from() type signatures
    • Allow them to accept Iterable<number> as data if called on the more specific subclasses
    • Convert float32/float64 to uint16s when creating a Float16Vector
  • Make Float16Vector#toArray() return a subarray slice of the Uint16Array data
    • Adds Float16Vector#toFloat32Array() and Float16Vector#toFloat64Array() to enable the up-casting copy behavior
  • Add zero-copy Int64Vector#toBigInt64Array() and Uint64Vector#toBigUint64Array() for envs where available
  • Use BigInt and related arrays to print int64/uint64s inside the BigNumNMixin
  • Support BigInt in comparator and Int64Vector#indexOf() and Uint64Vector#indexOf() signatures

TheNeuralBit and others added 10 commits February 12, 2019 19:55
…ve performance

It seems that using `Object.create` in the `bind` method slows down proxy generation substantially. I separated the `Row` and `RowProxyGenerator` concepts - now the generator creates a prototype based on `Row` and the parent vector when it is created, and constructs instances based on that prototype repeatedly in `bind`.

This change improved the included benchmark from 1,114.78 ms to 154.96 ms on my laptop (Intel i5-7200U).

Author: Brian Hulette <hulettbh@gmail.com>

Closes apache#3601 from TheNeuralBit/proxy-bench and squashes the following commits:

2442545 <Brian Hulette> Remove inner class, don't re-define columns
7bb6e4f <Brian Hulette> Update comparator, switch to Symbols for internal variables in Row
4079439 <Brian Hulette> linter fixes
da4d97c <Brian Hulette> Switch back to Object.create with no extra property descriptor
8a5d162 <Brian Hulette> Improve Row proxy performance
fb7a0f0 <Brian Hulette> add row proxy benchmark
Untested but proposed fix to apache#3629 using the standard UMD pattern as generated by Rollup.

Author: Mike Bostock <mbostock@gmail.com>

Closes apache#3630 from mbostock/fix-amd and squashes the following commits:

0930385 <Mike Bostock> Single quotes.
a6ed870 <Mike Bostock> Fix AMD pattern.
Author: Paddy Horan <paddyhoran@hotmail.com>

Closes apache#3632 from paddyhoran/bitmap-and-or and squashes the following commits:

b32b6d0 <Paddy Horan> Implemented `BitAnd` and `BitOr` for `Bitmap`
This is the start of a Scalar object model suitable for static and dynamic dispatch to correspond with the existing array and array builder types.

I modified the first aggregation kernel (sum) to use these types for outputs.

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3604 from wesm/ARROW-47 and squashes the following commits:

0d01bb3 <Wes McKinney> Fix unit test on MSVC for small integer types
d57f7aa <Wes McKinney> Remove ARROW_GTEST_VENDORED
03ca01c <Wes McKinney> Changes because MSVC tries to instantiate NumericScalar for Time/Timestamp types
271d602 <Wes McKinney> Add notes that Scalar API is experimental
e4a13b4 <Wes McKinney> flake
6626035 <Wes McKinney> Fix up date/time scalars, add tests
704daee <Wes McKinney> Code review comments
d922bd2 <Wes McKinney> Use new Scalar objects in aggregation code
fa89bd0 <Wes McKinney> Drafting, untested
94d5e62 <Wes McKinney> start
* ARROW-4539: [Java] Fix child vector count for lists.

- Child vector count was not set correctly for lists. Fixed to use the right count.

* ARROW-4539: [Java] Add license header.
I admit this feels gross, but it's less gross than what was there before. I can do some more clean up but wanted to get feedback before spending any more time on it

So, the problem partially lies with the gRPC C++ library. The obvious thing, and first thing I tried, was to specialize `SerializationTraits<protocol::FlightData>` and do casts between `FlightData` and `protocol::FlightData` (the proto) at the last possible moment. Unfortunately, this seems to not be possible because of this:

https://github.com/grpc/grpc/blob/master/include/grpcpp/impl/codegen/proto_utils.h#L100

So I had to override that Googly hack and go to some shenanigans (see protocol.h/protocol.cc) to make sure the same templates are always visible both in `Flight.grpc.pb.cc` as well as our client.cc/server.cc

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3633 from wesm/flight-cpp-avoid-ub and squashes the following commits:

ed6eb80 <Wes McKinney> Further refinements, make protocol.h an internal header. Comments per feedback
b3609d4 <Wes McKinney> Add comments about the purpose of protocol.cc
ac405b3 <Wes McKinney> Ensure .proto file is compiled before anything else
23fe416 <Wes McKinney> Implement gRPC customizations another way without calling reinterpret_cast on the client and server C++ types
Following the instructions in docker-compose.yml gives me a working IWYU build now

```
export PYTHON_VERSION=3.6
docker-compose build cpp
docker-compose build python
docker-compose build lint
docker-compose run iwyu
```

Author: François Saint-Jacques <fsaintjacques@gmail.com>
Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3643 from wesm/ARROW-4340 and squashes the following commits:

56733fc <François Saint-Jacques> Refactor iwyu build into docker install script. (#8)
e1c46c7 <Wes McKinney> Build IWYU for LLVM 7 in iwyu docker-compose job
Author: Pindikura Ravindra <ravindra@dremio.com>

Closes apache#3636 from pravindra/dsub and squashes the following commits:

ee60c00 <Pindikura Ravindra> ARROW-4204:  add support for decimal subtract
- [ ] Add docs
- [ ] Format code
- [ ] Include Python in integration tests (requires binding the JSON reader/writer from C++)
- [ ] Validate performance?
- [ ] Complete server bindings if approach makes sense

Author: David Li <David.M.Li@twosigma.com>
Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3566 from lihalite/flight-python and squashes the following commits:

ac29ab8 <David Li> Clean up to-be-implemented parts of Flight Python bindings
9d5442a <David Li> Clarify various RecordBatchStream{Reader,Writer} wrappers
e1c298a <David Li> Lint CMake files
7764444 <Wes McKinney> Reformat cmake
c6b02aa <David Li> Add basic Python bindings for Flight
Also add a debug check on the C++ side.

Author: Antoine Pitrou <antoine@python.org>

Closes apache#3647 from pitrou/ARROW-4563-py-validate-decimal128-inputs and squashes the following commits:

5a4cd6a <Antoine Pitrou> ARROW-4563:  Validate decimal128() precision input
wesm and others added 19 commits February 14, 2019 22:56
…template instantiation to not generate dead identity cast code

Also resolves ARROW-4110, which has been on my list for some time.

This ended up being a huge pain.

* `detail::PrimitiveAllocatingUnaryKernel` can now allocate memory for any kind of fixed width type.
* I factored out simple bitmap propagation into `detail::PropagateNulls`
* I moved the null count resolution code one level down into `ArrayData`, since there are cases where it may be set to `kUnknownNullCount` (e.g. after a slice) and you need to know what it is. This isn't tested but I suggest addressing this in a follow up patch

I also moved hand-maintained macro spaghetti for instantiating CastFunctors into a Python code-generation script. This might be the most controversial change in this patch, but the problem here is that we needed to exclude 1 macro case for each numeric type -- currently they were relying on `NUMERIC_CASES`. This means the list of generated types is slightly different for each type, lending to poor code reuse. Rather than maintaining this code by hand, it is _so much simpler_ to generate it with a script.

Speaking of code generation, I think we should continue to invest in code generation scripts to make generating mundane C++ code for pre-compiled kernels simpler. I checked the file in but I'm not opposed to auto-generating the files as part of the CMake build -- we could do that in a follow up PR.

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3642 from wesm/ARROW-1896 and squashes the following commits:

57d1084 <Wes McKinney> Fix another clang warning
0d3a7b3 <Wes McKinney> Fix clang warning on macOS
8aeaf96 <Wes McKinney> Code review
ab534d1 <Wes McKinney> Fix dictionary->dense conversion for Decimal128
7a178a4 <Wes McKinney> Refactoring around kernel memory allocation, do not allocate memory inside CastKernel. Use code generation to avoid instantiating CastFunctors for identity casts that are never used
Author: Antoine Pitrou <antoine@python.org>

Closes apache#3650 from pitrou/ARROW-4576-py-benchmarks-fix and squashes the following commits:

67e8069 <Antoine Pitrou> ARROW-4576:  Fix error during benchmarks
Previously we would return an incorrect result.

Author: Antoine Pitrou <antoine@python.org>

Closes apache#3648 from pitrou/ARROW-3669-numpy-byteswapped-arrays and squashes the following commits:

1e0e10d <Antoine Pitrou> ARROW-3669:  Raise error on Numpy byte-swapped array
Author: François Saint-Jacques <fsaintjacques@gmail.com>

Closes apache#3646 from fsaintjacques/ARROW-4529-rounddown-test and squashes the following commits:

8f3116b <François Saint-Jacques> reformat.
271d3fd <François Saint-Jacques> ARROW-4529:  Add test for BitUtil::RoundDown
Author: David Li <li.davidm96@gmail.com>

Closes apache#3555 from lihalite/arrow-4474 and squashes the following commits:

36ed4df <David Li> Use signed integers in FlightInfo payload size fields
Minimal reproducing example:

```
import dask
import pandas as pd
import pyarrow as pa
import numpy as np

def segfault_me(df):
    pa.Table.from_pandas(df, nthreads=1)

while True:
    df = pd.DataFrame(
        {"P": np.arange(0, 10), "L": np.arange(0, 10), "TARGET": np.arange(10, 20)}
    )
    dask.compute([
        dask.delayed(segfault_me)(df),
        dask.delayed(segfault_me)(df),
        dask.delayed(segfault_me)(df),
        dask.delayed(segfault_me)(df),
        dask.delayed(segfault_me)(df),
    ])
```

Segfaults are more likely when run in AddressSanitizer or otherwise slow system with many cores. It is important that always the same df is passed into the functions.

The issue was that the reference count of the underlying NumPy array was increased at the same time by multiple threads. The decrease happend then with a GIL, so the array was sometimes destroyed while still used.

Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes apache#3655 from xhochy/ARROW-4582 and squashes the following commits:

7f9838d <Korn, Uwe> docker-compose run clang-format
3d6e5ee <Korn, Uwe> ARROW-4582:  Acquire the GIL on Py_INCREF
This leads to the necessary change in `build.ninja`
```diff
--- build.ninja.orig	2019-02-15 16:07:48.000000000 +0100
+++ build.ninja	2019-02-15 16:10:25.000000000 +0100
@@ -4863,7 +4863,7 @@
 #############################################
 # Order-only phony target for arrow_flight_testing_objlib

-build cmake_object_order_depends_target_arrow_flight_testing_objlib: phony || src/arrow/flight/CMakeFiles/arrow_flight_testing_objlib.dir
+build cmake_object_order_depends_target_arrow_flight_testing_objlib: phony || src/arrow/flight/flight_grpc_gen
```

Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes apache#3658 from xhochy/ARROW-4585 and squashes the following commits:

b71ff92 <Korn, Uwe> ARROW-4585:  Add protoc dependency to flight_testing
… there are none

Author: Uwe L. Korn <uwelk@xhochy.com>

Closes apache#3652 from xhochy/ARROW-4577 and squashes the following commits:

e9d6b14 <Uwe L. Korn> ARROW-4577:  Don't set interface link libs on arrow_shared where there are none
`mod.rs` is not needed as `lib.rs` imports sub-modules directly.  In fact, it's not compiled at all from what I can see...

Author: Paddy Horan <paddyhoran@hotmail.com>

Closes apache#3659 from paddyhoran/remove-mod and squashes the following commits:

513eaa4 <Paddy Horan> Removed `arrow/mod.rs`
… "array_ops"

This PR adds explicit SIMD for boolean ops, `and`, `or` and `not`.

I moved `array_ops` into the new `compute` module.  From the outside this module serves the same purpose as the previous `array_ops` module (all kernels will be accessible from this namespace) and the remaining `array_ops` implementations are exposed via the `compute` module currently.  As I add explicit SIMD for more kernels they will migrate from `array_ops` into their own modules under `compute`.  I am keeping sub-modules under `compute` (as apposed to compute.rs) as SIMD can get rather verbose and it seems that `compute` may expand in the future.

I have included benchmarks where I re-create the old default implementations for others to take a look at the speed improvement.  It's not clear whether we need the non-SIMD versions in the benchmarks long term but I left them in for now to make the non-SIMD/SIMD comparison.

There are likely more optimizations possible (processing the values and null bit buffers in a single loop for instance) but I wanted to get the cleanest impl first and add further optimizations later if needed.

Author: Paddy Horan <paddyhoran@hotmail.com>

Closes apache#3641 from paddyhoran/boolean-kernels and squashes the following commits:

89588e6 <Paddy Horan> Removed `compute` from `mod.rs`
f9ae58a <Paddy Horan> Updated benchmarks
e321cec <Paddy Horan> Updated `not` to use trait impls
da16486 <Paddy Horan> Implemented `Not` for `Buffer`
f21253d <Paddy Horan> Updated datafusion and comments
a3c01c8 <Paddy Horan> Added SIMD binary boolean kernels
…se TypedBufferBuilder<T>

This reduces code duplication.

Author: Benjamin Kietzman <bengilgit@gmail.com>
Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3575 from bkietz/ARROW-4341-primitive-builders-use-bufferbuilder and squashes the following commits:

3ef2972 <Wes McKinney> Fix BooleanBuilder::AppendNulls, remove valid_bytes argument from AppendNulls methods
40c4d8d <Benjamin Kietzman> TypedBufferBuilder<bool>'s output was not correctly sized
b389c13 <Wes McKinney> Revert changes to arrow/util/logging.h
daf5244 <Wes McKinney> Revert change to UnsafeAppend that broke Python unit test
3cc5a0c <Wes McKinney> Restore memory zeroing. Add missing override
21ce285 <Wes McKinney> Fix RETURN_NOT_OK usages
d4ab3b5 <Wes McKinney> Move NumericBuilder implementation to headers to avoid symbol visibility concerns
6c1e99d <Wes McKinney> Add TypedBufferBuilder<bool> UnsafeAppend compile-time option to not track falses. Restore faster code from before this patch for appending C arrays and vector<bool>
09d2bfe <Benjamin Kietzman> reduce unnecessary zeroing in BufferBuilder
bd736c3 <Benjamin Kietzman> add ArrowLogIgnore and use for release mode DCHECK*
7ba692c <Benjamin Kietzman> moving to iterator append in NumericBuilder
188b7b9 <Benjamin Kietzman> fix format
8934573 <Benjamin Kietzman> add explicit cast
88e57fe <Benjamin Kietzman> remove PrimitiveBuilder
9c050b4 <Benjamin Kietzman> Use TypedBufferBuilder for PrimitiveBuilder
078497a <Benjamin Kietzman> fix BooleanBuilder::AppendNull
88eb71c <Benjamin Kietzman> Use TypedBufferBuilder<bool> in BooleanBuilder
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3665 from kou/glib-exit-on-error and squashes the following commits:

d8bc073 <Kouhei Sutou>  Stop configure immediately when GLib isn't available
…d of Arrow::Array

This is a compatibility breaking change but we warn about this change
in 0.12.0.

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3667 from kou/ruby-struct-array-ref and squashes the following commits:

5b30449 <Kouhei Sutou>  Arrow::StructArray# returns Arrow::Struct instead of Arrow::Array
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3666 from kou/ruby-array-ref-out-of-range and squashes the following commits:

f64df5b <Kouhei Sutou>  Array# returns nil
Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes apache#3657 from xhochy/ARROW-4584 and squashes the following commits:

00a53a8 <Korn, Uwe> ARROW-4584:  Add built wheel to manylinux1 dockerignore
Author: Nicolas Trinquier <nstq@protonmail.ch>

Closes apache#3663 from ntrinquier/arrow-4377 and squashes the following commits:

0dbf90c <Nicolas Trinquier> Propagate Err
bceddd0 <Nicolas Trinquier> Display arrays vertically
c0e7d55 <Nicolas Trinquier> Handle null case
802501a <Nicolas Trinquier> :xImplement Debug for PrimitiveArrays
We can detect LLVM installed by Homebrew automatically in CMake.

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3674 from kou/travis-remove-needless-llvm-dir and squashes the following commits:

70d4bf6 <Kouhei Sutou>  Remove needless LLVM_DIR for macOS
Error message:

    In file included from C:/msys64/mingw64/include/thrift/TApplicationException.h:23,
                     from C:/Users/kou/work/arrow/arrow.kou/cpp/src/parquet/thrift.h:35,
                     from C:/Users/kou/work/arrow/arrow.kou/cpp/src/parquet/column_reader.cc:36:
    C:/msys64/mingw64/include/thrift/Thrift.h:32:10: fatal error: netinet/in.h: No such file or directory
     #include <netinet/in.h>
              ^~~~~~~~~~~~~~

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3676 from kou/cpp-parquet-fix-build-error-with-mingw and squashes the following commits:

248063c <Kouhei Sutou>  Fix build error with MinGW
This PR adds the first query optimizer rule, which rewrites a logical plan to push the projection down to the TableScan.

Once this is merged, I will create a follow up PR to integrate this into the query engine so that only the necessary columns are loaded from disk.

Author: Andy Grove <andygrove73@gmail.com>

Closes apache#3664 from andygrove/ARROW-4589-wip and squashes the following commits:

b876f28 <Andy Grove> revert formatting change that broke the tests
2051deb <Andy Grove> formatting comments and test strings to be < 90 columns wide
8effde3 <Andy Grove> Address PR feedback, fix bug, add extra unit test
ecdd32a <Andy Grove> refactor code to reduce duplication
6229b32 <Andy Grove> refactor code to reduce duplication
f959500 <Andy Grove> implement projection push down for rest of logical plan variants
5fd5382 <Andy Grove> implement collect_expr and rewrite_expr for all expression types
bd49f17 <Andy Grove> improve error handling
92918dd <Andy Grove> Implement projection push-down for selection and make projection deterministic
a80cfdf <Andy Grove> Implement mapping and expression rewrite logic
26fd3b4 <Andy Grove> revert change
d7c4822 <Andy Grove> formatting and add assertion to test
e81af14 <Andy Grove> Roughing out projection push down rule
@trxcllnt trxcllnt force-pushed the js/int-and-float-fixes branch from 6350e9e to 2557277 Compare February 21, 2019 07:28
pitrou and others added 4 commits February 21, 2019 11:06
…heir names

Author: Antoine Pitrou <antoine@python.org>

Closes apache#3718 from pitrou/ARROW-4559-special-chars-filename and squashes the following commits:

2e85442 <Antoine Pitrou> ARROW-4559:  Allow Parquet files with special characters in their names
As it's only meant for integration testing, rename it to json-integration.h.

Author: Antoine Pitrou <antoine@python.org>

Closes apache#3716 from pitrou/ARROW-3981-rename-json-h and squashes the following commits:

1ea70bf <Antoine Pitrou> ARROW-3981:  Rename json.h
We were not running the pyarrow tests after installing the manylinux wheels, which can lead to uncaught issues, like: https://travis-ci.org/kszucs/crossbow/builds/484284104

Author: Krisztián Szűcs <szucs.krisztian@gmail.com>

Closes apache#3484 from kszucs/manylinux_tests and squashes the following commits:

3b1da30 <Krisztián Szűcs> use sudo
c573a56 <Krisztián Szűcs> use env variables insude the container
fd5e3fe <Krisztián Szűcs> use latest docker image tag
d5531d9 <Krisztián Szűcs> test imports inside the wheel container
1aa19f1 <Krisztián Szűcs> reenable travis builds
b399496 <Krisztián Szűcs> test py27mu and py36m wheels
71233c7 <Krisztián Szűcs> test 2.7,16 wheel
2372f3d <Krisztián Szűcs> fix requirements path; disable other CI tests
3e4ec2a <Krisztián Szűcs> unterminated llvm:MemoryBuffer; fix check_import.py path
7c88d61 <Krisztián Szűcs> only build python 3.6 wheel
18c5488 <Krisztián Szűcs> install wheel from dist dir
0bb07a7 <Krisztián Szűcs> better bash split
54fc653 <Krisztián Szűcs> don't export
d3cb058 <Krisztián Szűcs> fix wheel building script
0d29b31 <Krisztián Szűcs> remove not existing resources from gandiva's pom
5d75adb <Krisztián Szűcs> initialize jni loader
09d829a <Krisztián Szűcs> build wheels for a single python distribution at a time; adjust travis and crossbow scripts
79abc0e <Krisztián Szűcs> mark .cc file as generated
af78be2 <Krisztián Szűcs> don't bundle irhelpers in the jar
a88cd37 <Krisztián Szűcs> cmake format
7deb359 <Krisztián Szűcs> fix REGEX; remove byteCodeFilePath from java configuration object
fa19529 <Krisztián Szűcs> properly construct llvm:StringRef
5841dcd <Krisztián Szűcs> remove commented code
42391b1 <Krisztián Szűcs> don't pass precompiled bitcode all around the constructors
d480c83 <Krisztián Szűcs> use const string ref for now
b0b1117 <Krisztián Szűcs> conda llvmdev
169f43a <Krisztián Szűcs> build gandiva in cpp docker image
cb69625 <Krisztián Szűcs> silent maven download msgs
19200c3 <Krisztián Szűcs> don't run wheel tests twice; cmake format
f2205d0 <Krisztián Szűcs> gandiva jni
dbf5b1c <Krisztián Szűcs> embed precompiled bitcode as char array; load precompiled IR from string
00d98e0 <Krisztián Szűcs> try to bundle bytecode files
97fe94b <Krisztián Szűcs> fix requirements-test.txt path
86e7e5b <Krisztián Szűcs> run pyarrow tests in manylinux CI build
This PR closes the following JIRAs:

* [ARROW-4552](https://issues.apache.org/jira/browse/ARROW-4552) - Add Table and Schema `assign(other)` implementations
* [ARROW-2764](https://issues.apache.org/jira/browse/ARROW-2764) - Easy way to create a new Table with an additional column
* [ARROW-4553](https://issues.apache.org/jira/browse/ARROW-4553) - Implement Schema/Field/DataType comparators
* [ARROW-4554](https://issues.apache.org/jira/browse/ARROW-4554) - Implement logic for combining Vectors with different lengths/chunksizes
* [ARROW-4555](https://issues.apache.org/jira/browse/ARROW-4555) - Add high-level Table and Column creation methods
* [ARROW-4557](https://issues.apache.org/jira/browse/ARROW-4557) - Add Table/Schema/RecordBatch `selectAt(...indices)` method

I extracted a few more  high-level helper methods I've had laying around for creating, selecting, or manipulating Tables/Columns/Schemas/RecordBatches.

1. We currently have a `table.select(...colNames)` implementation, so I also added a `table.selectAt(...colIndices)` method to complement. Super handy when you have duplicates.
2. I added a basic `table.assign(otherTable)` impl. I added logic to compare Schemas/Fields/DataTypes in order to de-dupe reliably, which lives in the [`TypeComparator` Visitor](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/src/visitor/typecomparator.ts#L83). I expose this via `compareTo()` methods on the Schema, Field, and DataType for ease of use. Bonus: the Writer [can now discern](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/src/ipc/writer.ts#L129) between RecordBatches of the same stream whose Schemas aren't reference-equal.
3. I've also added logic to distribute Vectors of different lengths (or different internal chunk sizes) evenly across RecordBatches, to support a nearly zero-copy `Table#assign()` impl. I say nearly zero-copy, because there's a bit of allocation/copying to backfill null bitmaps if chunks don't exactly line up. But this also means [it's a bit easier](https://github.com/trxcllnt/arrow/blob/a67bd562cf6c4860bdce027981df859398e41b6d/js/test/unit/table-tests.ts#L178) now to create Tables or RecordBatches from values in-memory whose lengths may not exactly line up:
```ts
const table = Table.new(
  Column.new('foo', IntVector.from(arange(new Int32Array(10))),
  Column.new('bar', FloatVector.from(arange(new Float32Array(100))))
);
```
4. And lastly, I added [some more more tests](https://github.com/trxcllnt/arrow/blob/js/high-level-table-column-fns/js/test/unit/table/serialize-tests.ts#L38) to ensure various combinations of select/slice/concat/assign can round-trip through IPC and back again.

```ts

const table1 = Table.new(
    Column.new('a', Int32Vector.from(i32s)),
    Column.new('b', Float32Vector.from(f32s)),
    Column.new('c', Float64Vector.from(f64s))
);

const table2 = Table.new(
    Column.new('d', Utf8Vector.from(strs)),
    Column.new('d', BoolVector.from(bools)),
    Column.new('d', Int32Vector.from(i32s)),
);

const table3 = table1.select('b', 'c').assign(table2.selectAt(0, 1));

console.log(table3.schema.fields)
// > [
// >     ('b', Float32),
// >     ('c', Float64),
// >     ('d', Utf8),
// >     ('d', Bool)
// > ]
```
(cc: @domoritz)

Author: ptaylor <paul.e.taylor@me.com>

Closes apache#3634 from trxcllnt/js/high-level-table-column-fns and squashes the following commits:

9943d9c <ptaylor> fix lint
4b8fb54 <ptaylor> add a test for table and recordbatch with a single column
1758063 <ptaylor> add Table.new docstring
bfbcc8b <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
5b6d938 <ptaylor> cleanup
98c8e52 <ptaylor> add initial RecordBatch.new and select tests
dc80143 <ptaylor> remove Table.fromVectors in favor of Table.new
73b8af7 <ptaylor> fix Int64Vector typings
83de5ed <ptaylor> guard against out-of-bounds selections
a67bd56 <ptaylor> clean up: eliminate more getters in favor of read-only properties
7a8daad <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
41aa902 <ptaylor> Add more tests to ensure Tables can serialize through various slice, concat, assign steps
07a2c96 <ptaylor> add basic Table#assign tests
e4a5d87 <ptaylor> split out the generated data validators for reuse
99e8888 <ptaylor> add Table and Schema assign() impls
0ac786c <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
cf6f97a <ptaylor> add TypeComparator visitor so we can compare Schemas, Fields, and DataTypes
b2153aa <ptaylor> ensure the Vector map types always fall back to BaseVector
9d8f493 <ptaylor> cleanup: use the specialized typed array casting functions
85d0e00 <ptaylor> fix uniform chunk distribution when the new chunks are longer than the current chunks
8218f40 <ptaylor> Ensure Chunked#slice() range end is correct when there's only a single chunk
3f16c81 <ptaylor> fix typo
c9eeb05 <ptaylor> fix lint
933b531 <ptaylor> Narrow the signature of Schema#fields to Field<T>, cleanup
bdb23b8 <ptaylor> ensure uniform chunk lengths in RecordBatch.from()
9d1f2ad <ptaylor> add Table.new() convenience method for creating Tables from Columns or , name | Field] arguments
0407cd7 <ptaylor> add Column.new() convenience method for creating Columns with string names and polymorphic chunk types
db39031 <ptaylor> add public Field#clone impl for convenience
97c349e <ptaylor> add nullable and metadata getters to the Column class
5dfc100 <ptaylor> make the abstract Vector a type alias to trick TS into letting us override static methods
6ddfaf8 <ptaylor> narrow the FloatVector.from() return signatures

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow this syntax, with the this argument, could you clarify what's going on here?

I assume it somehow makes it so that both FloatVector.from(new Float32Array()) and Float32Vector.from([1,2,3]) yield a Float32Vector but I don't really understand how.

Copy link
Owner Author

@trxcllnt trxcllnt Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit yep! In plain English it'd be something like, "If this == Float16Vector constructor, accept data as either Uint16Array or any Iterable<number>, and always return an instance of Float16Vector." It's there to ensure that Float16Vector.from([1, 2, 3]) instanceof Float16Vector is always true at compile and run-times.

TypeScript doesn't grant much latitude when it comes to subclasses overriding inherited static functions, so we have to hoist the narrower type definitions up to the base. You can also see here how I figured out how to check this case in the unit tests.

@trxcllnt trxcllnt force-pushed the js/int-and-float-fixes branch from 2557277 to 69ee6f7 Compare February 21, 2019 17:51
@trxcllnt trxcllnt closed this Feb 21, 2019
TheNeuralBit pushed a commit to apache/arrow that referenced this pull request Feb 23, 2019
…support

This started as a continuation of #3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt#8. I'll rebase this PR after #3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f7 <ptaylor> cleanup after rebase
f44e97b <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081a <ptaylor> fix lint
6046e66 <ptaylor> remove more getters in favor of readonly direct property accesses
94d5633 <ptaylor> support BigInt in comparitor/indexOf
760a219 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd40 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204 <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66 <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea5 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf40 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d <ptaylor> guard against out-of-bounds selections
a4222f8 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1f <ptaylor> add Table and Schema assign() impls
79f9db1 <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
trxcllnt pushed a commit that referenced this pull request Jun 5, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test apache#15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test apache#16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test apache#17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test apache#18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test apache#19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test apache#20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test apache#21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test apache#22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test apache#23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test apache#24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test apache#25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test apache#26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test apache#27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test apache#28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test apache#29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test apache#30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test apache#31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test apache#32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test apache#33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test apache#34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test apache#35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test apache#36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test apache#37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test apache#38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test apache#39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test apache#40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test apache#41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test apache#42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test apache#43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test apache#44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test apache#45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test apache#46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test apache#47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test apache#48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test apache#49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test apache#50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test apache#51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes apache#7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
trxcllnt pushed a commit that referenced this pull request Apr 7, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
domoritz pushed a commit that referenced this pull request Dec 24, 2021
The conda-integration job is currenly failing on Github Actions (but I'm not able to reproduce locally), being unable to find a correct solution when installing the conda dependencies for Archery:
https://github.com/apache/arrow/runs/4107211303?check_suite_focus=true

Log excerpt:
```
#8 [3/6] RUN conda install -q         --file arrow/ci/conda_env_archery.txt         numpy         compilers         maven=3.5         nodejs=14         yarn         openjdk=8 &&     conda clean --all --force-pkgs-dirs
#8 sha256:c96c59f55397d6e90bff7d2897eb1247ddfa19b8ffab8019be5ec0bbfdab7dc8
#8 0.450 mesg: ttyname failed: Inappropriate ioctl for device
#8 2.279 Collecting package metadata (current_repodata.json): ...working... done
#8 10.18 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#8 10.19 Collecting package metadata (repodata.json): ...working... done
#8 41.80 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#8 79.28
#8 79.28 PackagesNotFoundError: The following packages are not available from current channels:
#8 79.28
#8 79.28   - python=3.1
#8 79.28
```

Work around by forcing a reasonable minimum Python version.

Closes apache#11609 from pitrou/conda-integration-fix

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
kou pushed a commit to apache/arrow-js that referenced this pull request May 14, 2025
…support

This started as a continuation of apache/arrow#3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt/arrow#8. I'll rebase this PR after apache/arrow#3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f77 <ptaylor> cleanup after rebase
f44e97b3 <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081ad <ptaylor> fix lint
6046e660 <ptaylor> remove more getters in favor of readonly direct property accesses
94d56334 <ptaylor> support BigInt in comparitor/indexOf
760a2199 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd402 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204e <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66f <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea55 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf406 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d4 <ptaylor> guard against out-of-bounds selections
a4222f81 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c0 <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1fd <ptaylor> add Table and Schema assign() impls
79f9db1c <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
QuietCraftsmanship pushed a commit to QuietCraftsmanship/arrow that referenced this pull request Jul 7, 2025
…support

This started as a continuation of apache/arrow#3634, but grew enough to deserve its own PR. I've made a PR to my own fork that highlights just the changes here: trxcllnt/arrow#8. I'll rebase this PR after apache/arrow#3634 is merged so only these changes are included.

This PR reverts the behavior of `Float16Vector#toArray()` back to returning a zero-copy slice of the underlying `Uint16Array` data, and exposes the copying behavior via new `toFloat32Array()` and `toFloat64Array()` methods. `Float16Array.from()` will also convert any incoming 32 or 64-bit floats to Uint16s if necessary.

It also adds tighter integration with the new `BigInt`, `BigInt64Array`, and `BigUint64Array` primitives (if available):
1. Use the native `BigInt` to convert/stringify i64s/u64s
2. Support the `BigInt` type in element comparator and `indexOf()`
3. Add zero-copy `toBigInt64Array()` and `toBigUint64Array()` methods to `Int64Vector` and `Uint64Vector`, respectively

0.4.0 added support for basic conversion to the native `BigInt` when available, but would only create positive `BigInts`, and was slower than necessary. This PR uses the native Arrays to create the BigInts, so we should see some speed ups there. Ex:

```ts
const vec = Int64Vector.from(new Int32Array([-1, 2147483647]))
const big = vec.get(0)
assert(big[0] === -1) // true
assert(big[1] === 2147483647) // true
const num = 0n + big // or BigInt(big)
assert(num === (2n ** 63n - 1n)) // true
```

JIRAs associated with this PR are:
* [ARROW-4578](https://issues.apache.org/jira/browse/ARROW-4578) - Float16Vector toArray should be zero-copy
* [ARROW-4579](https://issues.apache.org/jira/browse/ARROW-4579) - Add more interop with BigInt/BigInt64Array/BigUint64Array
* [ARROW-4580](https://issues.apache.org/jira/browse/ARROW-4580) - Accept Iterables in IntVector/FloatVector from() signatures

Author: ptaylor <paul.e.taylor@me.com>

Closes #3653 from trxcllnt/js/int-and-float-fixes and squashes the following commits:

69ee6f77 <ptaylor> cleanup after rebase
f44e97b3 <ptaylor> ensure truncated bitmap size isn't larger than it should be
7ac081ad <ptaylor> fix lint
6046e660 <ptaylor> remove more getters in favor of readonly direct property accesses
94d56334 <ptaylor> support BigInt in comparitor/indexOf
760a2199 <ptaylor> update BN to use BigIntArrays for signed/unsigned 64bit integers if possible
77fcd402 <ptaylor> add initial BigInt64Array and BigUint64Array support
d561204e <ptaylor> ensure Float16Vector.toArray() is zero-copy again, add toFloat32Array() and toFloat64Array() methods instead
854ae66f <ptaylor> ensure Int/FloatVector.from return signatures are as specific as possible, and accept Iterable<number>
4656ea55 <ptaylor> cleanup/rename Table + Schema + RecordBatch from -> new, cleanup argument extraction util fns
69abf406 <ptaylor> add initial RecordBatch.new and select tests
9c7ed3d4 <ptaylor> guard against out-of-bounds selections
a4222f81 <ptaylor> clean up: eliminate more getters in favor of read-only properties
8eabb1c0 <ptaylor> clean up/speed up: move common argument flattening methods into a utility file
b3b4f1fd <ptaylor> add Table and Schema assign() impls
79f9db1c <ptaylor> add selectAt() method to Table, Schema, and RecordBatch for selecting columns by index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.