Skip to content

Conversation

@projjal
Copy link
Owner

@projjal projjal commented Mar 23, 2021

@frank400 @jpedroantunes
creating this dummy PR so that I can add comments.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not add streams to this class. See this and this comments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding redundant precision and scale properties, since this is a templated class its better to separate out the Decimal class and only add precision/scale to them

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is better to use gandiva::DecimalScalar128 instead of arrow::Decimal128 since in gandiva we are using gandiva::DecimalScalar128 as literal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substituted

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it is better to implement hash for gandiva::DecimalScalar instead of arrow::Decimal128. Implemnting hash for arrow::Decimal128 might conflict with any implemtations with arrow code later

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…expose options to python

This adds ReadOptions to CsvFileFormat and exposes ReadOptions, ConvertOptions, and CsvFragmentScanOptions to Python.

ReadOptions was added to CsvFileFormat as its options can affect the discovered schema. For the block size, which does not need to be global, a field was added to CsvFragmentScanOptions.

Closes apache#9725 from lidavidm/arrow-8631

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> bool is_decimal = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it required?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should already removed it. done

bkietz and others added 12 commits March 23, 2021 16:06
This patch adds basic building blocks for grouped aggregation:

- `Grouper` for producing integer arrays encoding group id from batches of keys
- `HashAggregateKernel` for consuming batches of arguments and group ids, updating internal sums/counts/...

For testing purposes, a one-shot grouped aggregation function is provided:
```c++
std::shared_ptr<arrow::Array> needs_sum = ...;
std::shared_ptr<arrow::Array> needs_min_max = ...;
std::shared_ptr<arrow::Array> key_0 = ...;
std::shared_ptr<arrow::Array> key_1 = ...;

ARROW_ASSIGN_OR_RAISE(arrow::Datum out,
  arrow::compute::internal::GroupBy({
    needs_sum,
    needs_min_max,
  }, {
    key_0,
    key_1,
  }, {
    {"sum", nullptr},  // first argument will be summed
    {"min_max", &min_max_options},  // second argument's extrema will be found
}));

// Unpack struct array result (a four-field array)
auto out_array = out.array_as<StructArray>();
std::shared_ptr<arrow::Array> sums = out_array->field(0);
std::shared_ptr<arrow::Array> mins_and_maxes = out_array->field(1);
std::shared_ptr<arrow::Array> group_key_0 = out_array->field(2);
std::shared_ptr<arrow::Array> group_key_1 = out_array->field(3);
```

Closes apache#9621 from bkietz/groupby1

Lead-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: michalursa <michal@ursacomputing.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
The given input stream should be alive while a new GArrowCSVReader is
alive.

Closes apache#9777 from kou/glib-csv-reader-refer-input

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…le namespacing

This is an implementation of catalog and schema providers to support table namespacing (see the [design doc](https://docs.google.com/document/d/1_bCP_tjVRLJyOrMBOezSFNpF0hwPa1ZS_qMWv1uvtS4/edit?usp=sharing)).

I'm creating this draft PR as a supporting implementation for the proposal, to prove out that the work can be done whilst minimising API churn and still allowing for use cases that don't care at all about the notion of catalogs or schemas; in this new setup, the default namespace is `datafusion.public`, which will be created automatically with the default execution context config and allow for table registration.

## Highlights
- Datasource map removed in execution context state, replaced with catalog map
- Execution context allows for registering new catalog providers
- Catalog providers can be queried for their constituent schema providers
- Schema providers can be queried for table providers, similarly to the old datasource map
- Includes basic implementations of `CatalogProvider` and `SchemaProvider` backed by hashmaps
- New `TableReference` enum maps to various ways of referring to a table in sql
  - Bare: `my_table`
  - Partial: `schema.my_table`
  - Full: `catalog.schema.my_table`
- Given a default catalog and schema, `TableReference` instances of any variant can be converted to a `ResolvedTableReference`, which always include all three components

Closes apache#9762 from returnString/catalog

Lead-authored-by: Ruan Pearce-Authers <ruanpa@outlook.com>
Co-authored-by: Ruan Pearce-Authers <ruan@reservoirdb.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Closes apache#9757 from pachamaltese/ARROW-11912

Lead-authored-by: Mauricio Vargas <mvargas@dcc.uchile.cl>
Co-authored-by: Pachamaltese <mvargas@dcc.uchile.cl>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
[ARROW-12012](https://issues.apache.org/jira/browse/ARROW-12012)
An exception will be thrown when BinaryConsumer consumes a large amount or a lot of data.

Closes apache#9744 from zxf/fix/jdbc-binary-consumer

Authored-by: Felix Zhu <felix.zhu@netis.com.cn>
Signed-off-by: liyafan82 <fan_li_ya@foxmail.com>
…ag for python binding

Hi,

I am making a PR following the discussion in [ARROW-11497](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11497)

This is my first PR to this project, please let me know if I'm missing something, I will try to address all problem as much as I can.

Cheers,
Truc

Closes apache#9489 from trucnguyenlam/provide-parquet-enable-compliant-nested-type-flag

Authored-by: Truc Lam Nguyen <truc.nguyen@jagex.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…rsion of Map

These items can all stand on their own and they are used by the async datasets conversion.

MergeMap - Given AsyncGenerator<AsyncGenerator<T>> return AsyncGenerator<T>.  This method flattens a generator of generators into a generator of items.  It may reorder the items.

ConcatMap - Same as MergeMap but it will only pull items from one inner subscription at a time.  This reduced parallelism allows items to be returned in-order.

Async-reentrant Map - In some cases the map function is slow.  Even if the source is not async-reentrant this map can still be async-reentrant by allowing multiple instances of the map function to run at once.  The resulting mapped generator is async reentrant but it will not pull reentrantly from the source.

Vector utilities - In order to make migrating from Iterator code to vector code easier I added some map style utilities.  These copy the vectors (where an iterator wouldn't) so some care should be taken but they can still be useful.

Moved Future/AsyncGenerator into top level type_fwd.  This is needed for the RecordBatchGenerator alias in the same way Iterator is needed at the top level.

Added `IsEnd` to `IterationTraits`.  This allows non-comparable types to be iterated on.  It allows us to create an AsyncGenerator<AsyncGenerator<T>> since AsyncGenerator is std::function and we can use an empty instance as an end token even though std::function is not comaprable.

Closes apache#9643 from westonpace/feature/arrow-11883

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
A segfault would occur when a field is inferred as null in a first block and then as list in a second block.

Also re-enable `chunked_builder_test.cc`, which wasn't compiled.

Closes apache#9783 from pitrou/ARROW-12065-json-segfault

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Closes apache#9775 from jorgecarleitao/clippy_clean

Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This adds a function `from_trusted_len_iter_bool` to speed up the creation of an array for booleans.

Benchmarks are a bit noisy, but seems to be ~10-20% faster for comparison kernels. This also has some positive effect on DataFusion queries, as they contain quite some (nested) comparisons in filters. For example, executing tpch query 6 in memory is ~7% faster.

```
Gnuplot not found, using plotters backend
eq Float32              time:   [54.204 us 54.284 us 54.364 us]
                        change: [-29.087% -28.838% -28.581%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild

eq scalar Float32       time:   [43.660 us 43.743 us 43.830 us]
                        change: [-30.819% -30.545% -30.269%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

neq Float32             time:   [68.726 us 68.893 us 69.048 us]
                        change: [-14.045% -13.772% -13.490%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

neq scalar Float32      time:   [46.251 us 46.322 us 46.395 us]
                        change: [-12.204% -11.952% -11.702%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

lt Float32              time:   [50.264 us 50.438 us 50.613 us]
                        change: [-21.300% -20.964% -20.649%] (p = 0.00 < 0.05)
                        Performance has improved.

lt scalar Float32       time:   [48.847 us 48.929 us 49.013 us]
                        change: [-10.132% -9.9180% -9.6910%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

lt_eq Float32           time:   [46.105 us 46.198 us 46.282 us]
                        change: [-21.276% -20.966% -20.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low severe
  13 (13.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

lt_eq scalar Float32    time:   [47.359 us 47.456 us 47.593 us]
                        change: [+0.2766% +0.5240% +0.7821%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

gt Float32              time:   [57.313 us 57.363 us 57.412 us]
                        change: [-18.328% -18.177% -18.031%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild

gt scalar Float32       time:   [44.091 us 44.132 us 44.175 us]
                        change: [-9.4233% -9.2747% -9.1273%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild

gt_eq Float32           time:   [55.856 us 55.932 us 56.007 us]
                        change: [-7.4997% -7.2656% -7.0334%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

gt_eq scalar Float32    time:   [42.365 us 42.419 us 42.482 us]
                        change: [+0.5289% +0.7174% +0.9116%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
```

Closes apache#9759 from Dandandan/optimize_comparison

Authored-by: Heres, Daniel <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This adds support for CTE syntax:

```sql
WITH
   name AS (SELECT ...)
   [, name2 AS (SELECT ...)]
SELECT ...
FROM ...
```

Before this PR, the CTE syntax was ignored.

This PR supports CTEs referening a previous CTE within the same query (but no forward references)

Closes apache#9776 from Dandandan/cte_support

Authored-by: Heres, Daniel <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This likely needs more testing, especially where I had to implement functionality in (Basic)Decimal256. Also, we may want to extend the scalar cast benchmarks to cover decimals. There's also potentially some redundancy to eliminate in the tests.

Closes apache#9751 from lidavidm/arrow-10606

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
ianmcook and others added 7 commits March 24, 2021 14:04
This fixes a `NOTE` from `R CMD check` caused by ARROW-11700

Closes apache#9793 from ianmcook/ARROW-12073

Authored-by: Ian Cook <ianmcook@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
This PR adds the *utf8_length* compute kernel to the string scalar functions to support calculating the string length (as number of characters) for UTF-8 encoded STRINGs and LARGE STRINGs. The implementation makes use of utf8proc (utf8proc_iterate) to perform the calculation.

Closes apache#9786 from edponce/ARROW-11693-Add-string-length-kernel

Authored-by: Eduardo Ponce <edponce00@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Users should use Meson.

Closes apache#9787 from kou/glib-remove-autotools

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Closes apache#9788 from kou/glib-json-reader-refer

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
There was a logical conflict between apache@eebf64b which removed the Arc in `ArrayData` and  apache@8dd6abb which optimized the compute kernels.

FYI @Dandandan  and @nevi-me

Closes apache#9796 from alamb/alamb/fix-build

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
…l Interator<Item=Expr> rather than &[Expr]

# NOTE:
Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in apache#9692 and this is the PR I propose for merge.

I'll leave this open for several days and also send a note to the mailing lists for additional comment

It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying [ARROW-11689](https://issues.apache.org/jira/browse/ARROW-11689)

# Rationale:
All callsites currently need an owned `Vec` (or equivalent) so they can pass in `&[Expr]` and then Datafusion copies all the `Expr`s. Many times the original `Vec<Expr>` is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over Expr

# Changes
1. Change `Dataframe` so it takes `Vec<Expr>` rather than `&[Expr]`
2. Change `LogicalPlanBuilder` so it takes `impl Iterator<Item=Expr>` rather than `&[Expr]`

I couldn't figure out how to allow the `Dataframe` API (which is a Trait) to take an `impl Iterator<Item=Expr>`

Closes apache#9703 from alamb/alamb/less_copy_in_plan_builder_final

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
…ed code

Closes apache#9789 from jorisvandenbossche/ARROW-11983

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jvictorhuguenin jvictorhuguenin force-pushed the feature/add-decimal-in-expression branch from 1d7bdcd to 89b2172 Compare April 1, 2021 12:34
@github-actions github-actions bot added the flight label Apr 1, 2021
projjal pushed a commit that referenced this pull request Apr 7, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
apache#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@projjal projjal closed this Apr 8, 2021
projjal pushed a commit that referenced this pull request Jun 17, 2021
Before change:

```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in
    #1 0x7f28ae5826f4 in
    apache#2 0x7f28ae57fa5d in
    apache#3 0x7f28ae58cb0f in
    apache#4 0x7f28ae58bda0 in
    ...
```

After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
    #1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
    apache#2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
    apache#3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
    apache#4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
    ...
```

Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci

Authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
projjal pushed a commit that referenced this pull request Dec 1, 2021
Error log of Valgrind failure:
```
[----------] 3 tests from TestArrowReadDeltaEncoding
[ RUN      ] TestArrowReadDeltaEncoding.DeltaBinaryPacked
[       OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms)
[ RUN      ] TestArrowReadDeltaEncoding.DeltaByteArray
==12587== Conditional jump or move depends on uninitialised value(s)
==12587==    at 0x4F12C57: Advance (bit_stream_utils.h:426)
==12587==    by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216)
==12587==    by 0x4F13823: Decode (encoding.cc:2091)
==12587==    by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360)
==12587==    by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797)
==12587==    by 0x4E9AE63: ReadNewPage (column_reader.cc:614)
==12587==    by 0x4E9AE63: HasNextInternal (column_reader.cc:576)
==12587==    by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228)
==12587==    by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467)
==12587==    by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108)
==12587==    by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273)
==12587==    by 0x4E11FDA: operator() (reader.cc:1180)
==12587==    by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95)
==12587==    by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198)
==12587==    by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160)
==12587==    by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198)
==12587==    by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289)
==12587==    by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174)
==12587==    by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209)
==12587==    by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682)
==12587==    by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861)
==12587==    by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015)
==12587==    by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==12587==    by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438)
==12587==    by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490)
==12587==    by 0x421895B: main (gtest_main.cc:52)
```

Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
projjal pushed a commit that referenced this pull request Mar 29, 2022
TODOs:
Convert cheat sheet to PDF and hide slide #1.

Closes apache#12445 from pachadotdev/patch-4

Lead-authored-by: Stephanie Hazlitt <stephhazlitt@gmail.com>
Co-authored-by: Pachá <mvargas@dcc.uchile.cl>
Co-authored-by: Mauricio Vargas <mavargas11@uc.cl>
Co-authored-by: Pachá <mavargas11@uc.cl>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.