-
Couldn't load subscription status.
- Fork 0
Feature/add decimal in expression #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add decimal in expression #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cpp/src/gandiva/node.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding redundant precision and scale properties, since this is a templated class its better to separate out the Decimal class and only add precision/scale to them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
cpp/src/gandiva/tree_expr_builder.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is better to use gandiva::DecimalScalar128 instead of arrow::Decimal128 since in gandiva we are using gandiva::DecimalScalar128 as literal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substituted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it is better to implement hash for gandiva::DecimalScalar instead of arrow::Decimal128. Implemnting hash for arrow::Decimal128 might conflict with any implemtations with arrow code later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…expose options to python This adds ReadOptions to CsvFileFormat and exposes ReadOptions, ConvertOptions, and CsvFragmentScanOptions to Python. ReadOptions was added to CsvFileFormat as its options can affect the discovered schema. For the block size, which does not need to be global, a field was added to CsvFragmentScanOptions. Closes apache#9725 from lidavidm/arrow-8631 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
cpp/src/gandiva/llvm_generator.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> bool is_decimal = false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/gandiva/llvm_types.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should already removed it. done
This patch adds basic building blocks for grouped aggregation:
- `Grouper` for producing integer arrays encoding group id from batches of keys
- `HashAggregateKernel` for consuming batches of arguments and group ids, updating internal sums/counts/...
For testing purposes, a one-shot grouped aggregation function is provided:
```c++
std::shared_ptr<arrow::Array> needs_sum = ...;
std::shared_ptr<arrow::Array> needs_min_max = ...;
std::shared_ptr<arrow::Array> key_0 = ...;
std::shared_ptr<arrow::Array> key_1 = ...;
ARROW_ASSIGN_OR_RAISE(arrow::Datum out,
arrow::compute::internal::GroupBy({
needs_sum,
needs_min_max,
}, {
key_0,
key_1,
}, {
{"sum", nullptr}, // first argument will be summed
{"min_max", &min_max_options}, // second argument's extrema will be found
}));
// Unpack struct array result (a four-field array)
auto out_array = out.array_as<StructArray>();
std::shared_ptr<arrow::Array> sums = out_array->field(0);
std::shared_ptr<arrow::Array> mins_and_maxes = out_array->field(1);
std::shared_ptr<arrow::Array> group_key_0 = out_array->field(2);
std::shared_ptr<arrow::Array> group_key_1 = out_array->field(3);
```
Closes apache#9621 from bkietz/groupby1
Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: michalursa <michal@ursacomputing.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
The given input stream should be alive while a new GArrowCSVReader is alive. Closes apache#9777 from kou/glib-csv-reader-refer-input Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…le namespacing This is an implementation of catalog and schema providers to support table namespacing (see the [design doc](https://docs.google.com/document/d/1_bCP_tjVRLJyOrMBOezSFNpF0hwPa1ZS_qMWv1uvtS4/edit?usp=sharing)). I'm creating this draft PR as a supporting implementation for the proposal, to prove out that the work can be done whilst minimising API churn and still allowing for use cases that don't care at all about the notion of catalogs or schemas; in this new setup, the default namespace is `datafusion.public`, which will be created automatically with the default execution context config and allow for table registration. ## Highlights - Datasource map removed in execution context state, replaced with catalog map - Execution context allows for registering new catalog providers - Catalog providers can be queried for their constituent schema providers - Schema providers can be queried for table providers, similarly to the old datasource map - Includes basic implementations of `CatalogProvider` and `SchemaProvider` backed by hashmaps - New `TableReference` enum maps to various ways of referring to a table in sql - Bare: `my_table` - Partial: `schema.my_table` - Full: `catalog.schema.my_table` - Given a default catalog and schema, `TableReference` instances of any variant can be converted to a `ResolvedTableReference`, which always include all three components Closes apache#9762 from returnString/catalog Lead-authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Co-authored-by: Ruan Pearce-Authers <ruan@reservoirdb.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Closes apache#9757 from pachamaltese/ARROW-11912 Lead-authored-by: Mauricio Vargas <mvargas@dcc.uchile.cl> Co-authored-by: Pachamaltese <mvargas@dcc.uchile.cl> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
[ARROW-12012](https://issues.apache.org/jira/browse/ARROW-12012) An exception will be thrown when BinaryConsumer consumes a large amount or a lot of data. Closes apache#9744 from zxf/fix/jdbc-binary-consumer Authored-by: Felix Zhu <felix.zhu@netis.com.cn> Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
…ag for python binding Hi, I am making a PR following the discussion in [ARROW-11497](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11497) This is my first PR to this project, please let me know if I'm missing something, I will try to address all problem as much as I can. Cheers, Truc Closes apache#9489 from trucnguyenlam/provide-parquet-enable-compliant-nested-type-flag Authored-by: Truc Lam Nguyen <truc.nguyen@jagex.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…rsion of Map These items can all stand on their own and they are used by the async datasets conversion. MergeMap - Given AsyncGenerator<AsyncGenerator<T>> return AsyncGenerator<T>. This method flattens a generator of generators into a generator of items. It may reorder the items. ConcatMap - Same as MergeMap but it will only pull items from one inner subscription at a time. This reduced parallelism allows items to be returned in-order. Async-reentrant Map - In some cases the map function is slow. Even if the source is not async-reentrant this map can still be async-reentrant by allowing multiple instances of the map function to run at once. The resulting mapped generator is async reentrant but it will not pull reentrantly from the source. Vector utilities - In order to make migrating from Iterator code to vector code easier I added some map style utilities. These copy the vectors (where an iterator wouldn't) so some care should be taken but they can still be useful. Moved Future/AsyncGenerator into top level type_fwd. This is needed for the RecordBatchGenerator alias in the same way Iterator is needed at the top level. Added `IsEnd` to `IterationTraits`. This allows non-comparable types to be iterated on. It allows us to create an AsyncGenerator<AsyncGenerator<T>> since AsyncGenerator is std::function and we can use an empty instance as an end token even though std::function is not comaprable. Closes apache#9643 from westonpace/feature/arrow-11883 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
A segfault would occur when a field is inferred as null in a first block and then as list in a second block. Also re-enable `chunked_builder_test.cc`, which wasn't compiled. Closes apache#9783 from pitrou/ARROW-12065-json-segfault Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#9775 from jorgecarleitao/clippy_clean Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This adds a function `from_trusted_len_iter_bool` to speed up the creation of an array for booleans.
Benchmarks are a bit noisy, but seems to be ~10-20% faster for comparison kernels. This also has some positive effect on DataFusion queries, as they contain quite some (nested) comparisons in filters. For example, executing tpch query 6 in memory is ~7% faster.
```
Gnuplot not found, using plotters backend
eq Float32 time: [54.204 us 54.284 us 54.364 us]
change: [-29.087% -28.838% -28.581%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) low mild
1 (1.00%) high mild
eq scalar Float32 time: [43.660 us 43.743 us 43.830 us]
change: [-30.819% -30.545% -30.269%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
neq Float32 time: [68.726 us 68.893 us 69.048 us]
change: [-14.045% -13.772% -13.490%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
neq scalar Float32 time: [46.251 us 46.322 us 46.395 us]
change: [-12.204% -11.952% -11.702%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
5 (5.00%) high mild
lt Float32 time: [50.264 us 50.438 us 50.613 us]
change: [-21.300% -20.964% -20.649%] (p = 0.00 < 0.05)
Performance has improved.
lt scalar Float32 time: [48.847 us 48.929 us 49.013 us]
change: [-10.132% -9.9180% -9.6910%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
lt_eq Float32 time: [46.105 us 46.198 us 46.282 us]
change: [-21.276% -20.966% -20.703%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
2 (2.00%) low severe
13 (13.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
lt_eq scalar Float32 time: [47.359 us 47.456 us 47.593 us]
change: [+0.2766% +0.5240% +0.7821%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
gt Float32 time: [57.313 us 57.363 us 57.412 us]
change: [-18.328% -18.177% -18.031%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low severe
1 (1.00%) low mild
gt scalar Float32 time: [44.091 us 44.132 us 44.175 us]
change: [-9.4233% -9.2747% -9.1273%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) low mild
3 (3.00%) high mild
gt_eq Float32 time: [55.856 us 55.932 us 56.007 us]
change: [-7.4997% -7.2656% -7.0334%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
gt_eq scalar Float32 time: [42.365 us 42.419 us 42.482 us]
change: [+0.5289% +0.7174% +0.9116%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
```
Closes apache#9759 from Dandandan/optimize_comparison
Authored-by: Heres, Daniel <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This adds support for CTE syntax: ```sql WITH name AS (SELECT ...) [, name2 AS (SELECT ...)] SELECT ... FROM ... ``` Before this PR, the CTE syntax was ignored. This PR supports CTEs referening a previous CTE within the same query (but no forward references) Closes apache#9776 from Dandandan/cte_support Authored-by: Heres, Daniel <danielheres@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This likely needs more testing, especially where I had to implement functionality in (Basic)Decimal256. Also, we may want to extend the scalar cast benchmarks to cover decimals. There's also potentially some redundancy to eliminate in the tests. Closes apache#9751 from lidavidm/arrow-10606 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
This fixes a `NOTE` from `R CMD check` caused by ARROW-11700 Closes apache#9793 from ianmcook/ARROW-12073 Authored-by: Ian Cook <ianmcook@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
This PR adds the *utf8_length* compute kernel to the string scalar functions to support calculating the string length (as number of characters) for UTF-8 encoded STRINGs and LARGE STRINGs. The implementation makes use of utf8proc (utf8proc_iterate) to perform the calculation. Closes apache#9786 from edponce/ARROW-11693-Add-string-length-kernel Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Users should use Meson. Closes apache#9787 from kou/glib-remove-autotools Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Closes apache#9788 from kou/glib-json-reader-refer Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
There was a logical conflict between apache@eebf64b which removed the Arc in `ArrayData` and apache@8dd6abb which optimized the compute kernels. FYI @Dandandan and @nevi-me Closes apache#9796 from alamb/alamb/fix-build Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
…l Interator<Item=Expr> rather than &[Expr] # NOTE: Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in apache#9692 and this is the PR I propose for merge. I'll leave this open for several days and also send a note to the mailing lists for additional comment It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying [ARROW-11689](https://issues.apache.org/jira/browse/ARROW-11689) # Rationale: All callsites currently need an owned `Vec` (or equivalent) so they can pass in `&[Expr]` and then Datafusion copies all the `Expr`s. Many times the original `Vec<Expr>` is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over Expr # Changes 1. Change `Dataframe` so it takes `Vec<Expr>` rather than `&[Expr]` 2. Change `LogicalPlanBuilder` so it takes `impl Iterator<Item=Expr>` rather than `&[Expr]` I couldn't figure out how to allow the `Dataframe` API (which is a Trait) to take an `impl Iterator<Item=Expr>` Closes apache#9703 from alamb/alamb/less_copy_in_plan_builder_final Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…ed code Closes apache#9789 from jorisvandenbossche/ARROW-11983 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…rom arrow::Decimal128 to gandiva::DecimalScalar128
1d7bdcd to
89b2172
Compare
From a deadlocked run... ``` #0 0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0 apache#2 0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#3 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#4 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#5 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#6 0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#7 0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#8 0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#9 0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so ``` The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock. To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests. Closes apache#9842 from westonpace/bugfix/arrow-12040 Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Before change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in
#1 0x7f28ae5826f4 in
apache#2 0x7f28ae57fa5d in
apache#3 0x7f28ae58cb0f in
apache#4 0x7f28ae58bda0 in
...
```
After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
#0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
#1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
apache#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
apache#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
apache#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
...
```
Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci
Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Error log of Valgrind failure:
```
[----------] 3 tests from TestArrowReadDeltaEncoding
[ RUN ] TestArrowReadDeltaEncoding.DeltaBinaryPacked
[ OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms)
[ RUN ] TestArrowReadDeltaEncoding.DeltaByteArray
==12587== Conditional jump or move depends on uninitialised value(s)
==12587== at 0x4F12C57: Advance (bit_stream_utils.h:426)
==12587== by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216)
==12587== by 0x4F13823: Decode (encoding.cc:2091)
==12587== by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360)
==12587== by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797)
==12587== by 0x4E9AE63: ReadNewPage (column_reader.cc:614)
==12587== by 0x4E9AE63: HasNextInternal (column_reader.cc:576)
==12587== by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228)
==12587== by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467)
==12587== by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108)
==12587== by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273)
==12587== by 0x4E11FDA: operator() (reader.cc:1180)
==12587== by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95)
==12587== by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198)
==12587== by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160)
==12587== by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198)
==12587== by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289)
==12587== by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174)
==12587== by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209)
==12587== by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==12587== by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==12587== by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682)
==12587== by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861)
==12587== by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015)
==12587== by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==12587== by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==12587== by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
==12587== by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438)
==12587== by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490)
==12587== by 0x421895B: main (gtest_main.cc:52)
```
Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind
Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
TODOs: Convert cheat sheet to PDF and hide slide #1. Closes apache#12445 from pachadotdev/patch-4 Lead-authored-by: Stephanie Hazlitt <stephhazlitt@gmail.com> Co-authored-by: Pachá <mvargas@dcc.uchile.cl> Co-authored-by: Mauricio Vargas <mavargas11@uc.cl> Co-authored-by: Pachá <mavargas11@uc.cl> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
@frank400 @jpedroantunes
creating this dummy PR so that I can add comments.