Feature/add decimal in expression #1

projjal · 2021-03-23T15:24:46Z

@frank400 @jpedroantunes
creating this dummy PR so that I can add comments.

projjal · 2021-03-23T15:34:29Z

cpp/src/gandiva/basic_decimal_scalar.h

I think you should not add streams to this class. See this and this comments

projjal · 2021-03-23T15:36:26Z

cpp/src/gandiva/node.h

Instead of adding redundant precision and scale properties, since this is a templated class its better to separate out the Decimal class and only add precision/scale to them

github-actions · 2021-03-23T15:40:13Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

projjal · 2021-03-23T15:47:59Z

cpp/src/gandiva/tree_expr_builder.cc

i think it is better to use gandiva::DecimalScalar128 instead of arrow::Decimal128 since in gandiva we are using gandiva::DecimalScalar128 as literal

Substituted

projjal · 2021-03-23T15:51:58Z

cpp/src/gandiva/basic_decimal_scalar.h

Again, it is better to implement hash for gandiva::DecimalScalar instead of arrow::Decimal128. Implemnting hash for arrow::Decimal128 might conflict with any implemtations with arrow code later

…expose options to python This adds ReadOptions to CsvFileFormat and exposes ReadOptions, ConvertOptions, and CsvFragmentScanOptions to Python. ReadOptions was added to CsvFileFormat as its options can affect the discovered schema. For the block size, which does not need to be global, a field was added to CsvFragmentScanOptions. Closes apache#9725 from lidavidm/arrow-8631 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>

projjal · 2021-03-23T16:04:26Z

cpp/src/gandiva/llvm_generator.cc

-> bool is_decimal = false;

projjal · 2021-03-23T16:14:04Z

cpp/src/gandiva/llvm_types.h

is it required?

I should already removed it. done

This patch adds basic building blocks for grouped aggregation: - `Grouper` for producing integer arrays encoding group id from batches of keys - `HashAggregateKernel` for consuming batches of arguments and group ids, updating internal sums/counts/... For testing purposes, a one-shot grouped aggregation function is provided: ```c++ std::shared_ptr<arrow::Array> needs_sum = ...; std::shared_ptr<arrow::Array> needs_min_max = ...; std::shared_ptr<arrow::Array> key_0 = ...; std::shared_ptr<arrow::Array> key_1 = ...; ARROW_ASSIGN_OR_RAISE(arrow::Datum out, arrow::compute::internal::GroupBy({ needs_sum, needs_min_max, }, { key_0, key_1, }, { {"sum", nullptr}, // first argument will be summed {"min_max", &min_max_options}, // second argument's extrema will be found })); // Unpack struct array result (a four-field array) auto out_array = out.array_as<StructArray>(); std::shared_ptr<arrow::Array> sums = out_array->field(0); std::shared_ptr<arrow::Array> mins_and_maxes = out_array->field(1); std::shared_ptr<arrow::Array> group_key_0 = out_array->field(2); std::shared_ptr<arrow::Array> group_key_1 = out_array->field(3); ``` Closes apache#9621 from bkietz/groupby1 Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: michalursa <michal@ursacomputing.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>

The given input stream should be alive while a new GArrowCSVReader is alive. Closes apache#9777 from kou/glib-csv-reader-refer-input Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

…le namespacing This is an implementation of catalog and schema providers to support table namespacing (see the [design doc](https://docs.google.com/document/d/1_bCP_tjVRLJyOrMBOezSFNpF0hwPa1ZS_qMWv1uvtS4/edit?usp=sharing)). I'm creating this draft PR as a supporting implementation for the proposal, to prove out that the work can be done whilst minimising API churn and still allowing for use cases that don't care at all about the notion of catalogs or schemas; in this new setup, the default namespace is `datafusion.public`, which will be created automatically with the default execution context config and allow for table registration. ## Highlights - Datasource map removed in execution context state, replaced with catalog map - Execution context allows for registering new catalog providers - Catalog providers can be queried for their constituent schema providers - Schema providers can be queried for table providers, similarly to the old datasource map - Includes basic implementations of `CatalogProvider` and `SchemaProvider` backed by hashmaps - New `TableReference` enum maps to various ways of referring to a table in sql - Bare: `my_table` - Partial: `schema.my_table` - Full: `catalog.schema.my_table` - Given a default catalog and schema, `TableReference` instances of any variant can be converted to a `ResolvedTableReference`, which always include all three components Closes apache#9762 from returnString/catalog Lead-authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Co-authored-by: Ruan Pearce-Authers <ruan@reservoirdb.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

Closes apache#9757 from pachamaltese/ARROW-11912 Lead-authored-by: Mauricio Vargas <mvargas@dcc.uchile.cl> Co-authored-by: Pachamaltese <mvargas@dcc.uchile.cl> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>

[ARROW-12012](https://issues.apache.org/jira/browse/ARROW-12012) An exception will be thrown when BinaryConsumer consumes a large amount or a lot of data. Closes apache#9744 from zxf/fix/jdbc-binary-consumer Authored-by: Felix Zhu <felix.zhu@netis.com.cn> Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>

…ag for python binding Hi, I am making a PR following the discussion in [ARROW-11497](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11497) This is my first PR to this project, please let me know if I'm missing something, I will try to address all problem as much as I can. Cheers, Truc Closes apache#9489 from trucnguyenlam/provide-parquet-enable-compliant-nested-type-flag Authored-by: Truc Lam Nguyen <truc.nguyen@jagex.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>

…rsion of Map These items can all stand on their own and they are used by the async datasets conversion. MergeMap - Given AsyncGenerator<AsyncGenerator<T>> return AsyncGenerator<T>. This method flattens a generator of generators into a generator of items. It may reorder the items. ConcatMap - Same as MergeMap but it will only pull items from one inner subscription at a time. This reduced parallelism allows items to be returned in-order. Async-reentrant Map - In some cases the map function is slow. Even if the source is not async-reentrant this map can still be async-reentrant by allowing multiple instances of the map function to run at once. The resulting mapped generator is async reentrant but it will not pull reentrantly from the source. Vector utilities - In order to make migrating from Iterator code to vector code easier I added some map style utilities. These copy the vectors (where an iterator wouldn't) so some care should be taken but they can still be useful. Moved Future/AsyncGenerator into top level type_fwd. This is needed for the RecordBatchGenerator alias in the same way Iterator is needed at the top level. Added `IsEnd` to `IterationTraits`. This allows non-comparable types to be iterated on. It allows us to create an AsyncGenerator<AsyncGenerator<T>> since AsyncGenerator is std::function and we can use an empty instance as an end token even though std::function is not comaprable. Closes apache#9643 from westonpace/feature/arrow-11883 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>

A segfault would occur when a field is inferred as null in a first block and then as list in a second block. Also re-enable `chunked_builder_test.cc`, which wasn't compiled. Closes apache#9783 from pitrou/ARROW-12065-json-segfault Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

Closes apache#9775 from jorgecarleitao/clippy_clean Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

This adds a function `from_trusted_len_iter_bool` to speed up the creation of an array for booleans. Benchmarks are a bit noisy, but seems to be ~10-20% faster for comparison kernels. This also has some positive effect on DataFusion queries, as they contain quite some (nested) comparisons in filters. For example, executing tpch query 6 in memory is ~7% faster. ``` Gnuplot not found, using plotters backend eq Float32 time: [54.204 us 54.284 us 54.364 us] change: [-29.087% -28.838% -28.581%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) low mild 1 (1.00%) high mild eq scalar Float32 time: [43.660 us 43.743 us 43.830 us] change: [-30.819% -30.545% -30.269%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe neq Float32 time: [68.726 us 68.893 us 69.048 us] change: [-14.045% -13.772% -13.490%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild neq scalar Float32 time: [46.251 us 46.322 us 46.395 us] change: [-12.204% -11.952% -11.702%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 5 (5.00%) high mild lt Float32 time: [50.264 us 50.438 us 50.613 us] change: [-21.300% -20.964% -20.649%] (p = 0.00 < 0.05) Performance has improved. lt scalar Float32 time: [48.847 us 48.929 us 49.013 us] change: [-10.132% -9.9180% -9.6910%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe lt_eq Float32 time: [46.105 us 46.198 us 46.282 us] change: [-21.276% -20.966% -20.703%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) low severe 13 (13.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe lt_eq scalar Float32 time: [47.359 us 47.456 us 47.593 us] change: [+0.2766% +0.5240% +0.7821%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 8 (8.00%) high mild 2 (2.00%) high severe gt Float32 time: [57.313 us 57.363 us 57.412 us] change: [-18.328% -18.177% -18.031%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) low severe 1 (1.00%) low mild gt scalar Float32 time: [44.091 us 44.132 us 44.175 us] change: [-9.4233% -9.2747% -9.1273%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low mild 3 (3.00%) high mild gt_eq Float32 time: [55.856 us 55.932 us 56.007 us] change: [-7.4997% -7.2656% -7.0334%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild gt_eq scalar Float32 time: [42.365 us 42.419 us 42.482 us] change: [+0.5289% +0.7174% +0.9116%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe ``` Closes apache#9759 from Dandandan/optimize_comparison Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

This adds support for CTE syntax: ```sql WITH name AS (SELECT ...) [, name2 AS (SELECT ...)] SELECT ... FROM ... ``` Before this PR, the CTE syntax was ignored. This PR supports CTEs referening a previous CTE within the same query (but no forward references) Closes apache#9776 from Dandandan/cte_support Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

This likely needs more testing, especially where I had to implement functionality in (Basic)Decimal256. Also, we may want to extend the scalar cast benchmarks to cover decimals. There's also potentially some redundancy to eliminate in the tests. Closes apache#9751 from lidavidm/arrow-10606 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

This fixes a `NOTE` from `R CMD check` caused by ARROW-11700 Closes apache#9793 from ianmcook/ARROW-12073 Authored-by: Ian Cook <ianmcook@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>

This PR adds the *utf8_length* compute kernel to the string scalar functions to support calculating the string length (as number of characters) for UTF-8 encoded STRINGs and LARGE STRINGs. The implementation makes use of utf8proc (utf8proc_iterate) to perform the calculation. Closes apache#9786 from edponce/ARROW-11693-Add-string-length-kernel Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

Users should use Meson. Closes apache#9787 from kou/glib-remove-autotools Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

Closes apache#9788 from kou/glib-json-reader-refer Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

@Dandandan

There was a logical conflict between apache@eebf64b which removed the Arc in `ArrayData` and apache@8dd6abb which optimized the compute kernels. FYI @Dandandan and @nevi-me Closes apache#9796 from alamb/alamb/fix-build Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

…l Interator<Item=Expr> rather than &[Expr] # NOTE: Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in apache#9692 and this is the PR I propose for merge. I'll leave this open for several days and also send a note to the mailing lists for additional comment It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying [ARROW-11689](https://issues.apache.org/jira/browse/ARROW-11689) # Rationale: All callsites currently need an owned `Vec` (or equivalent) so they can pass in `&[Expr]` and then Datafusion copies all the `Expr`s. Many times the original `Vec<Expr>` is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over Expr # Changes 1. Change `Dataframe` so it takes `Vec<Expr>` rather than `&[Expr]` 2. Change `LogicalPlanBuilder` so it takes `impl Iterator<Item=Expr>` rather than `&[Expr]` I couldn't figure out how to allow the `Dataframe` API (which is a Trait) to take an `impl Iterator<Item=Expr>` Closes apache#9703 from alamb/alamb/less_copy_in_plan_builder_final Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

…ed code Closes apache#9789 from jorisvandenbossche/ARROW-11983 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…rom arrow::Decimal128 to gandiva::DecimalScalar128

From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 apache#2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

Before change: ``` Direct leak of 65536 byte(s) in 1 object(s) allocated from: #0 0x522f09 in #1 0x7f28ae5826f4 in apache#2 0x7f28ae57fa5d in apache#3 0x7f28ae58cb0f in apache#4 0x7f28ae58bda0 in ... ``` After change: ``` Direct leak of 65536 byte(s) in 1 object(s) allocated from: #0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09) #1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24 apache#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5 apache#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9 apache#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7 ... ``` Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

Error log of Valgrind failure: ``` [----------] 3 tests from TestArrowReadDeltaEncoding [ RUN ] TestArrowReadDeltaEncoding.DeltaBinaryPacked [ OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms) [ RUN ] TestArrowReadDeltaEncoding.DeltaByteArray ==12587== Conditional jump or move depends on uninitialised value(s) ==12587== at 0x4F12C57: Advance (bit_stream_utils.h:426) ==12587== by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216) ==12587== by 0x4F13823: Decode (encoding.cc:2091) ==12587== by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360) ==12587== by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797) ==12587== by 0x4E9AE63: ReadNewPage (column_reader.cc:614) ==12587== by 0x4E9AE63: HasNextInternal (column_reader.cc:576) ==12587== by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228) ==12587== by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467) ==12587== by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108) ==12587== by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273) ==12587== by 0x4E11FDA: operator() (reader.cc:1180) ==12587== by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95) ==12587== by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198) ==12587== by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160) ==12587== by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198) ==12587== by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289) ==12587== by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174) ==12587== by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209) ==12587== by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607) ==12587== by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643) ==12587== by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682) ==12587== by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861) ==12587== by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015) ==12587== by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855) ==12587== by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607) ==12587== by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643) ==12587== by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438) ==12587== by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490) ==12587== by 0x421895B: main (gtest_main.cc:52) ``` Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

TODOs: Convert cheat sheet to PDF and hide slide #1. Closes apache#12445 from pachadotdev/patch-4 Lead-authored-by: Stephanie Hazlitt <stephhazlitt@gmail.com> Co-authored-by: Pachá <mvargas@dcc.uchile.cl> Co-authored-by: Mauricio Vargas <mavargas11@uc.cl> Co-authored-by: Pachá <mavargas11@uc.cl> Signed-off-by: Jonathan Keane <jkeane@gmail.com>

projjal commented Mar 23, 2021

View reviewed changes

github-actions bot added lang-c++ gandiva labels Mar 23, 2021

projjal commented Mar 23, 2021

View reviewed changes

bkietz and others added 12 commits March 23, 2021 16:06

ARROW-12029: [R] Remove args from FeatherReader$create v2

496d7b3

Closes apache#9757 from pachamaltese/ARROW-11912 Lead-authored-by: Mauricio Vargas <mvargas@dcc.uchile.cl> Co-authored-by: Pachamaltese <mvargas@dcc.uchile.cl> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>

ARROW-12047: [Rust] [Parquet] Cleanup clippy

fe6ef70

Closes apache#9775 from jorgecarleitao/clippy_clean Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

github-actions bot added the lang-java label Mar 24, 2021

ianmcook and others added 7 commits March 24, 2021 14:04

ARROW-12073: [R] Fix R CMD check NOTE about ‘X_____X’

70b7330

This fixes a `NOTE` from `R CMD check` caused by ARROW-11700 Closes apache#9793 from ianmcook/ARROW-12073 Authored-by: Ian Cook <ianmcook@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>

ARROW-12070: [GLib] Drop support for GNU Autotools

ece4606

Users should use Meson. Closes apache#9787 from kou/glib-remove-autotools Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

ARROW-12071: [GLib] Keep input stream reference of GArrowJSONReader

41833d3

Closes apache#9788 from kou/glib-json-reader-refer Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>

ARROW-11983: [Python] Avoid ImportError calling from_pandas in thread…

47287fd

…ed code Closes apache#9789 from jorisvandenbossche/ARROW-11983 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jvictorhuguenin and others added 22 commits April 1, 2021 09:33

Implement overwrite function

b31a526

Implemented disassembled decimal types at stub functions

8facda0

Add decimalIR to VisitInExpression method

72a6ef7

Change decimal type to Arrows' decimal128

f8f3fa6

Correct test for in decimal expressions

699763e

Remove unused llvm decimal type

6fe4c0b

Apply quality changes

1939553

Implements hash for the class DecimalScalar128 and change type used f…

c8ebb44

…rom arrow::Decimal128 to gandiva::DecimalScalar128

Add custom structures to IN expressions looking for decimal numbers

6a4762d

Add JNI bind for IN expressions for Decimal type

21b05fe

Remove unnecessary properties from InExpressionNode

c9dd98d

Implement support for JNI functions

959d121

Add BigDecimal support

63ba5c2

Fix JNI support

31c80ed

Fix extern include in gdv_function_stubs.h

26616ba

Fix unused const statement from dex.h

8711ae7

Fix unused const statement from dex.h

81df39f

Add test cases for -0.0, +inf and -inf

73c85fb

Fix problems with backward compatibility

3fcc629

Implement hash_combine for hash function of gandiva::DecimalScalar128

9e2ba58

Fix parameters for hash_combine

006054a

Fix decimal parameters names at MakeDecimalVector function

89b2172

jvictorhuguenin force-pushed the feature/add-decimal-in-expression branch from 1d7bdcd to 89b2172 Compare April 1, 2021 12:34

github-actions bot added the flight label Apr 1, 2021

Fix checkstyle problem

c1a8206

projjal closed this Apr 8, 2021

Uh oh!

Feature/add decimal in expression #1

Feature/add decimal in expression #1

Uh oh!

Conversation

projjal commented Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 23, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants

projjal commented Mar 23, 2021 •

edited

Loading