Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle buffered dwrf write exception to avoid server crash #9784

Closed
wants to merge 2,021 commits into from

Conversation

xiaoxmeng
Copy link
Contributor

Catch exception during BufferedWriter processing to avoid server crash as ~BufferedWriter
expect either the object is aborted or all the data mutations have been flushed.

mbasmanova and others added 30 commits April 5, 2024 12:18
Summary:
Pull Request resolved: facebookincubator#9386

CAST expression used to handle INTERVAL DAY TO SECOND values as BIGINT. This allowed invalid casts (e.g. from varchar, to double, etc.) and produced incorrect results when casting to varchar.

Casting 1 second interval used to return '1000' instead of '0 00:00:01.000'.

Fixes facebookincubator#9384

Reviewed By: xiaoxmeng

Differential Revision: D55796600

fbshipit-source-id: 3d51a8eb18b434315bf9285f58b8f2cdbedca63d
…lid (facebookincubator#8977)

Summary:
`buildSelectedType()` takes a constant typeWithId type tree and applies the selector on it to return a selected new type tree. However, it modifies the input typeWithId unexpectedly because it duplicates the internal type nodes, but not the leaves(primitive types). The leaf nodes were directly moved to the result instead of copied, thus making in the original input typeWithId incomplete.
```
std::shared_ptr<const TypeWithId> typeutils::buildSelectedType(
const std::shared_ptr<const TypeWithId>& typeWithId,
const std::function<bool(size_t)>& selector)  {
  return visit(typeWithId, selector);
}
```
```
std::shared_ptr<const TypeWithId> visit(
    const std::shared_ptr<const TypeWithId>& typeWithId,
    const std::function<bool(size_t)>& selector) {
  if (typeWithId->type()->isPrimitiveType()) {
    return typeWithId;   // directly returned
  }
  if (typeWithId->type()->isRow()) {
    std::vector<std::string> names;
    std::vector<std::shared_ptr<const TypeWithId>> typesWithId;  // constructing a new children vector
    std::vector<std::shared_ptr<const Type>> types;
    auto& row = typeWithId->type()->asRow();
    for (auto i = 0; i < typeWithId->size(); ++i) {
      auto& child = typeWithId->childAt(i);
      if (selector(child->id())) {
        names.push_back(row.nameOf(i));
        std::shared_ptr<const TypeWithId> twid;
        twid = visit(child, selector);   // twid was pointing to the original typeWithId leaf
        typesWithId.push_back(twid);   // twid pushed to the new vector typesWithId
        types.push_back(twid->type());
      }
    }
    VELOX_USER_CHECK(
        !types.empty(), "selected nothing from row: " + row.toString());
    return std::make_shared<TypeWithId>(.   // Creating a new type node
        ROW(std::move(names), std::move(types)),
        std::move(typesWithId),      // But with the children vector that contains the original leaves. And the original ones are now nullptr.
        typeWithId->id(),
        typeWithId->maxId(),
        typeWithId->column());
```

This PR fixes the problem by making a copy of the leaf nodes as well. After this change, the original input typeWithId would still be valid.

Pull Request resolved: facebookincubator#8977

Reviewed By: pedroerp

Differential Revision: D55756100

Pulled By: Yuhta

fbshipit-source-id: 135ac4432adef69aa240f5c0b6e550683981e84a
Summary:
Fix reduceAgg bencnmark test setup by calling OperatorTestBase::SetUpTestCase
in benchmark main instead of its constructor. SetUpTestCase is supposed to call
once before any unit test cases run (like OperatorTestBase constructor). Correspondingly
move SetUpTestCase/TearDownTestCase to public section to use for non-google unit
test use case such as reduceAgg benchmark which use it for benchmark testing setup

Pull Request resolved: facebookincubator#9394

Reviewed By: mbasmanova

Differential Revision: D55820274

Pulled By: xiaoxmeng

fbshipit-source-id: 60e72837e6c867367da72c7c90feb664ccdef49a
…#9364)

Summary:
Pull Request resolved: facebookincubator#9364

When evaluating shared subexpressions with TRY, we only copy the values that are valid (did not encounter exceptions) into
the sharedSubexprValues.  This means that the size of sharedSubexprValues may not have values for all the selected rows (in
particular it may have a smaller length than rows).

We need to handle this when copying values from sharedSubexprValues into the result Vector, we should only copy values
that did not encounter exceptions.  TRY will handle this upstream by appending nulls for the exceptions.

I encountered this in fuzzing, the map function produces a result Vector that is only large enough to hold values that did not
encounter exceptions and this causes an error in context.moveOrCopyResult because it was trying to copy more elements than exist.

Reviewed By: Yuhta

Differential Revision: D55719677

fbshipit-source-id: bfc0cc67f86902d742fcf5fd34798f2ba1eaf342
…cebookincubator#9396)

Summary:
Pull Request resolved: facebookincubator#9396

facebookincubator#9067 introduced an initialized flag that we include alongside the null flag
for aggregations in the rows in RowContainer.

Global aggregations do not use a RowContainer to create rows, rather a single row is constructed manually inside
GroupingSets.  When that PR was landed this logic was not properly updated, we did not allocate enough space in the row
for the additional bits which could lead to null/initialized flags overwriting the aggregate values when there were a lot of
them.

This change fixes it so that we allocate the correct amount of space for flags for global aggregations as an immediate fix.

Longer term it would be better to centralize the logic for creating rows in RowContainer (I'd like to get us back to a working
state before attempting such a refactor).

Reviewed By: Yuhta

Differential Revision: D55823098

fbshipit-source-id: 9a15f257669273c64dc0361bf735c4472e45957d
Summary:
Add early flushed bytes in writer that is due to memory reclaiming.

Pull Request resolved: facebookincubator#9378

Reviewed By: xiaoxmeng

Differential Revision: D55786890

Pulled By: tanjialiang

fbshipit-source-id: f81dd83390032bd1b6e944bfee9ee56c0072a2ea
Summary:
This PR is addressing this issue facebookincubator#9135

Pull Request resolved: facebookincubator#9228

Reviewed By: Yuhta

Differential Revision: D55609322

Pulled By: xiaoxmeng

fbshipit-source-id: c1921fd5551c553384fde4ffddf80ad659471094
…incubator#9377)

Summary:
Only collect task pause time if it is not zero
Include reclaim duration in task reclamation log

Pull Request resolved: facebookincubator#9377

Reviewed By: tanjialiang

Differential Revision: D55786597

Pulled By: xiaoxmeng

fbshipit-source-id: e35aeff84a6abee602e8ccdb5546310e2f1f762f
facebookincubator#9308)

Summary:
Refactor the greatest/least functions using simple function API.

Also, add support for NaN comparisons for DOUBLE and REAL. NaN is the biggest according to prestodb/presto#22391

Fixes facebookincubator#3728

Pull Request resolved: facebookincubator#9308

Reviewed By: xiaoxmeng

Differential Revision: D55793910

Pulled By: mbasmanova

fbshipit-source-id: c389bad91197f00ced549d816a15efab5a2dd910
…okincubator#9407)

Summary:
Pull Request resolved: facebookincubator#9407

The approx_distinct result verifier for window fuzzer is giving false positive signals. It need to be fixed as described in facebookincubator#9347. Disable this result verifier until it is fixed.

Reviewed By: kgpai

Differential Revision: D55879765

fbshipit-source-id: 23d3c30cc4660ccb111ef641ed22ef111f79e6a4
…9408)

Summary:
When memory arbitrator aborts a query, it waits for all the driver threads
of the query to finish, and then abort the query operators by freeing their
memory resources. The query operator or driver thread which is currently
under memory arbitration is under suspended driver state which is not counted
as running. We shouldn't abort such operator as it might be under memory
processing within a component such as row container in hash build, and the
component might be reset by operator abort. Even though we won't continue the
query procesing but the excception throw might need to access the component
state on the exception return path.
This PR fixes this problem by avoiding abort an operator which driver is running
and under suspension state.

Pull Request resolved: facebookincubator#9408

Reviewed By: bikramSingh91, oerling

Differential Revision: D55883111

Pulled By: xiaoxmeng

fbshipit-source-id: c5fd68401bf82be92db2b9f34b8e8a7d1a7f13c2
Summary:
1. Fix hash probe spill hang for right semi join types
For right semi join (project and filter), it produce output from the probed rows in hash table
after processing all the probe inputs. A probe input vector (set in Operator::input_) might be
processed in multiple output calls until all the input rows have been processed. The probe side
spill needs to spill output from the current probe input vector before clear the hash table and
the current logic wait until the output is null. This is not correct for right semi joins as it always
return null, and instead we shall check on if input_ has been reset.
Add unit test to verify the fix plus join fuzzer run.

2. Disable hash probe spill if dynamic filters have been generated
Hash probe might generate dynamic filters based on the hash join key ranges from build side and
pushdown to upstream operators. If this has been triggered, then we can't spill from hash probe
side as it might produce the incorrect result as detected by join fuzzer test. The fix is to disable
hash probe spill if dynamic filters have been generated.
Add unit test to verif the fix plus join fuzzer run.

Pull Request resolved: facebookincubator#9400

Reviewed By: kagamiori, oerling

Differential Revision: D55828946

Pulled By: xiaoxmeng

fbshipit-source-id: ea4cd62abf0f9d169ad61cd5718bae4c44429e68
…r#9373)

Summary:
Pull Request resolved: facebookincubator#9373

ArgumentTypeFuzzer serves two purposes.

1. Given a generic function signature, generate a random valid return type.
2. Given a generic function signature and a concrete return type, generate a list of valid argument types.

Consider a signature of the map_keys function:

```
  map(K, V) -> array(K)
```

This signature has 2 type variables: K and V. This signature doesn’t fully specify the return type, just that it must be an array. There are many possible valid return types, but all should be of the form array(K). When asked to generate a valid return type for this signature ArgumentTypeFuzzer may return array(bigint) or array(row(...)), but it should not return ‘varchar’.

Now, if we fix the return type to array(bigint) and ask ArgumentTypeFuzzer to generate valid argument types, we may get back a map(bigint, varchar) or a map(bigint, double), but we do not expect a ‘varchar’ or a map(integer, float). By specifying the return type as array(bigint) we effectively bind one of the type variables: K = bigint. At the same time we leave V unspecified and ArgumentTypeFuzzer is free to choose any type for it.

To generate a return type, create an ArgumentTypeFuzzer by specifying a function signature and a random number generator, then call fuzzReturnType() method.

```
ArgumentTypeFuzzer fuzzer(signature, rng)
auto returnType = fuzzer.fuzzReturnType()
```

To generate argument types for a given return type, create an ArgumentTypeFuzzer by specifying a function signature, a return type and a random number generator, then call fuzzArgumentTypes() method.

```
ArgumentTypeFuzzer fuzzer(signature, returnType, rng)
if (fuzzer.fuzzArgumentTypes()) {
  auto argumentTypes = fuzzer.argumentTypes();
}
```

Notice that fuzzArgumentTypes may fail to generate valid signatures. This can happen if specified 'returnType' is not a valid return type for this signature.

This change extends ArgumentTypeFuzzer to support signatures that use generic decimal types.

Consider a signature of least function:

```
(decimal(p, s),...) -> decimal(p, s)
```

This signature has 2 integer variables: p and s. The return type is not fully specified. It can be any valid decimal type. ArgumentTypeFuzzer::fuzzReturnType needs to generate values for ‘p’ and ‘s’ and return a decimal(p, s) type. If return type is fixed, say decimal(10, 7), ArgumentTypeFuzzer::fuzzArgumentTypes() needs to figure out that p=10 and s=7 and return a random number of argument types all of which are decimal(10, 7).

Consider slightly different function: between

```
(decimal(p, s), decimal(p, s)) -> boolean
```

This signature also has 2 integer variables: p and s. The return type is fully specified though. Hence, ArgumentTypeFuzzer::fuzzReturnType should always return ‘boolean’. However, when return type is fixed to the only possible value, ‘boolean’, ArgumentTypeFuzzer::fuzzArgumentTypes() may generate any valid values for p and s and return any decimal type for the arguments as long as both argument types are the same. A pair of {decimal(10, 7), decimal(10, 7)} is a valid response, as well as {decimal(18, 15), decimal(18, 15)}.

Let’s also look at a the signature of the ‘floor’ function:

```
(decimal(p, s)) -> decimal(rp, 0)
```

This function has 3 integer variables: p, s, and rp. The ‘rp’ variable has a constraint:

```
rp  = min(38, p - s + min(s, 1))
```

The return type can be any decimal with scale 0. ArgumentTypeFuzzer::fuzzReturnType may return decimal(10, 0) or decimal(7, 0), but it should not return decimal(5, 2).

If we fix return type and ask ArgumentTypeFuzzer to generate valid argument types, it will need to figure out how to generate values p and s such that rp = min(38, p - s + min(s, 1)). This is a pretty challenging task. Hence, ArgumentTypeFuzzer::fuzzArgumentTypes() doesn’t support signatures with constraints on integer variables.

It should be noted that ArgumentTypeFuzzer::fuzzReturnType() may also need to make sure that generated ‘rp’ is such that there exist ‘p’ and ‘s’ for which the formula above is true. For this particular formula this is easy because a solution exists for any rp: p = rp, s = 0. However, this is not true in general. It might be better to not support ArgumentTypeFuzzer::fuzzReturnType() for signatures with constraints on integer variables.

To fuzz argument or return types, ArgumentTypeFuzzer needs to generate valid values for integer variables. Unlike type variables, integer variables have implicit constraints. A variable that represents a precision must have a value in [1, 38] range. A variable that represents scale must have a value in [1, precision] range. The fuzzer needs a way to determine which variable represents precision, which represents scale and for scale variables it needs to figure out what is the corresponding precision. The fuzzer infers these properties from the way variables are used. It examines the types of arguments and return type to figure out what each variable represents. When encountering decimal(p, s) type, the fuzzer determines that p is precision and s is scale. When encountering decimal(p, 5) type, the fuzzer determines that p is precision that must be >= 5. When encountering decimal(10, s), the fuzzer determines that s is scale that must be in [0, 5] range.

This logic is implemented in the ArgumentTypeFuzzer::determineUnboundedIntegerVariables method.

Reviewed By: bikramSingh91

Differential Revision: D55772808

fbshipit-source-id: 708f202fc7270aaeaa59e28175aea5147b4a7981
…facebookincubator#9286)

Summary:
When the result scale of decimal multiply exceeds the precision, exception
occurs during the decimal type creation.

Pull Request resolved: facebookincubator#9286

Reviewed By: xiaoxmeng

Differential Revision: D55915055

Pulled By: mbasmanova

fbshipit-source-id: b7bd9eea73cdb6a863d955da85d4b10fd9668be4
Summary:
Fixes accounting of kMetricArbitratorLocalArbitrationCountwhich was
previously sometimes incremented for global arbitration.

Also adds additional operator level metrics for keeping track of
global and local arbitration attempts initiated by them.

Pull Request resolved: facebookincubator#9398

Reviewed By: xiaoxmeng

Differential Revision: D55826917

Pulled By: bikramSingh91

fbshipit-source-id: 4f7bd112ea1bc6ba07f7ce5501a0143f4fd7631f
Summary:
The output size was incorrectly estimated in certain cases due to a bug. This was causing flush to happen either too early or too late depending on a query shape.

Pull Request resolved: facebookincubator#9311

Reviewed By: xiaoxmeng

Differential Revision: D55804404

Pulled By: arhimondr

fbshipit-source-id: 6ca83c74e206a06090327855d544eb231cdbba1f
…tor#9399)

Summary:
Pull Request resolved: facebookincubator#9399

In Presto, the default standard error for `approx_set`
(and `empty_approx_set`) is
[0.01625](https://prestodb.io/docs/current/functions/hyperloglog.html#approx_set),
which is different from `approx_distinct`
([0.023](https://prestodb.io/docs/current/functions/aggregate.html#approx_distinct)).
In Velox we are using 0.023 for both, this causes problem (in addition to the
precision loss) that when `empty_approx_set` is constant folded by the
coordinator, it has a different standard error and failing the sanity check.
Fix this by aligning the default error value with Presto.

Reviewed By: mbasmanova

Differential Revision: D55814871

fbshipit-source-id: 4ecb650d02ba004fe5b1a6369d3d44464f3427ec
…or#9140)

Summary:
This per test sanity check allows test to reveal failures on test level instead of on the entire suite level. This will greatly reduce the time engineers spent in locating failed tests due to memory leak.

Pull Request resolved: facebookincubator#9140

Reviewed By: xiaoxmeng

Differential Revision: D55043514

Pulled By: tanjialiang

fbshipit-source-id: d6c37bc56fe3c802f49ad6a36ae0daa2c708f4b4
…bator#9416)

Summary:
Add velox.memory_pool_initial_capacity_bytes as histogram to track the query memory pool's
initial capacity distribution. This indicates if the overall memory arbitration is under high query
memory usage pressure. If a small query starts with sufficient amount of memory, then it can
run through completion without the interference of the slow memory arbitration by some large
query.

Add elox.memory_pool_capacity_growth_count to track the number of query memory pool
capacity growth attempts through memory arbitrations. The less number of growth, the less
interference with the memory arbitration by the other queries.

Both metrics can be used to evaluate the effectiveness of the followup optimizations: reserved
query capacity and memory capacity growth control within memory arbitrator

Pull Request resolved: facebookincubator#9416

Reviewed By: bikramSingh91, oerling

Differential Revision: D55925034

Pulled By: xiaoxmeng

fbshipit-source-id: fe289e43f1eb99e4b798d32cdb82103dd4d8d04b
Summary:
Pull Request resolved: facebookincubator#9410

Currently the key type of `MAP` is specified as `knownTypeVariable` which refuses to bind `UNKNOWN` type, while in Presto `MAP(UNKNOWN, UNKNOWN)` is valid.

Reviewed By: kagamiori

Differential Revision: D55902535

fbshipit-source-id: 8c57736859d9f726b1cb7aa43cec74d60990c630
…incubator#8648)

Summary:
Currently we only have a warning log for double free in mmap allocator
and we shall also add a metric for production alert.

Pull Request resolved: facebookincubator#8648

Reviewed By: bikramSingh91

Differential Revision: D53341067

Pulled By: xiaoxmeng

fbshipit-source-id: 3d0979a1621ebae1180b3d59cd98345fd0c4dff8
…#9387)

Summary:
This PR makes a change to remove the globally applied `-Wno-return-type`, fixes some violations. This PR originally facebookincubator#4618 disabled it. In the case of a code bug, it may become difficult to diagnose the actual cause. An example of this is

```
#include <stdio.h>

bool test1()
{
    printf("Called inner\n");
}
void test()
{
    test1();
}

int main(int argc, char* argv[])
{
    printf("Called test\n");
    test();
    printf("Finished test\n");
    return 0;
}
```
which crashes when compiled with -O2 due to bad codegen.

Pull Request resolved: facebookincubator#9387

Reviewed By: pedroerp

Differential Revision: D55933467

Pulled By: mbasmanova

fbshipit-source-id: da748fc5df19adb3d6e295973cc12e75ac43f2c1
Summary: Pull Request resolved: facebookincubator#9343

Reviewed By: xiaoxmeng, amitkdutta

Differential Revision: D55954222

Pulled By: kgpai

fbshipit-source-id: 0cafeb35f98c642e2845d114700e5214dd321327
Summary:
Pull Request resolved: facebookincubator#9421

For the queries that read HLL digests from table and call `merge` on
them, we used to create empty accumulator and merge it with the serialized
digest, and then serialize the merged accumulator again, resulting in wasted
CPU.  Fix this by passing the serialized digests directly in case of abandon
partial aggregation.

Reviewed By: mbasmanova

Differential Revision: D55935663

fbshipit-source-id: 35ace659fde3490040c52238fad41895e9f759b1
Summary:
Pull Request resolved: facebookincubator#9423

Logically, HLL digest represents an array of 2^n numbers. Merging two such arrays consists of computing element-wise max of the numbers. The sizes of the arrays are always the same.

Example,

```
	HLL1: [1, 2, 3, 4, 5]
	HLL2: [5, 4, 3, 2, 1]

	Combined HLL: [5,4, 3, 4, 5]
```

This should be a fairly quick operation that easily lends itself to vectorization. But there is a catch.

The numbers stored in HLL digest are 8-bit integers that represent the number of leading zeros in a 64-bit hash. These numbers are from [0, 63] range. Hence, require at most 6 bits.

The size of the HLL array is a power of 2 (2^n), where n is a function of max standard error specified in approx_distinct. Larger values of 'n' allow for higher accuracy at the expense of using more memory.

In production workloads, we often see n = 16.

Storing 2^16 8-bit numbers requires 65KB (each number uses 1 byte). However, Presto uses a compact representation of the HLL digest that requires about half the memory.

Compact representation includes:

* One 8-bit baseline number: minimum value of all the numbers in the array
* 2^n 4-bit deltas from baseline capped at 15 (maximum value representable in 4 bits): delta = min(value - baseline, 15).
* A small list of overflow values.
  * A list of indices where overflow has occurred (value > baseline + 15)
  * A matching list of overflows: overflow = value - baseline - 15.

For example, given HLL: [10, 2, 3, 18, 5], its compact representation is

* baseline: 2
* deltas: [8, 0, 1, 15, 3]
*	overflow indices: [3]
*	overflow values: [1]

There is only one value that has an overflow: 18 = 2 (baseline) + 15 (max delta) + 1 (overflow). Hence, overflow indices and values arrays have just one entry each.

In practice, for HLL of size 2^16, we see ~20 overflows.

Compact representation helps save memory and network bandwidth (shuffle), but requires a more sophisticated and therefore expensive processing. Merging compactly represented HLLs is a lot slower.

It is possible to speed up merging of compact HLLs using SIMD instructions. The implementation here is 2-3x faster than the original scalar implementation.

There are a few challenges in SIMD-zing HLL merge.

First, there are no instructions to operate on 4-bit integers. The smallest supported integer is 8-bit. There is also no instruction to load 4-bit integers into SIMD registers.

The approach implemented here is to process deltas in even slots separately from deltas in odd slots.

We process deltas in batches of 64. Each batch is located in a 32 byte-buffer, each byte storing 2 values. Overflows are processed separately using scalar code path.

Add microbenchmark to measure the speed of DenseHll::mergeWith(serialized) for 3 different values of the hash bits: 11 (default), 12 and 16. The benchmark run on  Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz shows speed up of 4x, 2.5x, 25x respectively. However, end-to-end measurements show much smaller gains of 2-3x.

Before:

```
============================================================================
[...]n/hyperloglog/benchmarks/DenseHll.cpp     relative  time/iter   iters/s
============================================================================
mergeSerialized11                                          25.10us    39.84K
mergeSerialized12                                          54.29us    18.42K
mergeSerialized16                                         569.50us     1.76K
```

After:

```
============================================================================
[...]n/hyperloglog/benchmarks/DenseHll.cpp     relative  time/iter   iters/s
============================================================================
mergeSerialized11                                           6.33us   158.02K
mergeSerialized12                                          20.04us    49.91K
mergeSerialized16                                          22.37us    44.70K
```

Reviewed By: Yuhta

Differential Revision: D55936471

fbshipit-source-id: 61f44e5e0573907ba9bf10ae5aca00f70363ec8e
…ncubator#9433)

Summary:
Saving ccache is resulting in some errors finalizing the upload
which is preventing scheduled jobs from running. Removing this
step temporarily till its resolved.

Pull Request resolved: facebookincubator#9433

Reviewed By: kgpai

Differential Revision: D55987674

Pulled By: bikramSingh91

fbshipit-source-id: bc5655df1ef96dbd710871b0f1ed30b06cf41e4a
Summary:
To prepare for the upcoming equality delete file read, we need to refactor the SplitReader a bit.

Pull Request resolved: facebookincubator#8995

Reviewed By: xiaoxmeng

Differential Revision: D55072998

Pulled By: Yuhta

fbshipit-source-id: 662459041b947d51ffaa98b57a50e4ebdd5b36e3
…incubator#9434)

Summary:
Aggregation test is flaky on circle ci which has seen unexpected spill stats with zero partial
input rows: facebookincubator#9292
Since we can't reproduce this offline, add one more log to print out the number of input rows
for final aggregation. If there is non-zero final aggregation input rows, then we need to look
into if task stats collection is not reliable. Eventually, we shall check spill stats based on whether
spill injections have been triggered or not which is more reliable. But let's first figure out if
anything else in the test is wrong first.

Pull Request resolved: facebookincubator#9434

Reviewed By: mbasmanova

Differential Revision: D55987679

Pulled By: xiaoxmeng

fbshipit-source-id: 8789f077adc877d1762f20b391c77ab87f4e6a61
Summary:
Allow specifying the file length via `FileOptions`.
When opening a file for reading, this helps avoid a call to remote storage to fetch file metadata.
This is one of the changes needed to support facebookincubator#9314

Pull Request resolved: facebookincubator#9313

Reviewed By: Yuhta

Differential Revision: D55982134

Pulled By: mbasmanova

fbshipit-source-id: b358622a974ed05eeebd0d9e63a29e8ec933c40b
…tor#9409)

Summary:
Pull Request resolved: facebookincubator#9409

The custom result verification for approx_distinct in window
operations requires a different handling from aggregation. The reason
is that the error bound for approx_distinc only applies to independent
input sets, while input sets for rows in the same partition in a window
operation are correlated. Therefore, this diff changes
ApproxDistinctResultVerifier to only take the max error in each
partition when verifying the error bound for results of window
operations.

This diff fixes facebookincubator#9347.

Reviewed By: Yuhta

Differential Revision: D55898634

fbshipit-source-id: 847a70b1bfc3870bedb768f8f86711073b32e0d2
xiaoxmeng and others added 25 commits May 6, 2024 23:46
Summary: Pull Request resolved: facebookincubator#9724

Reviewed By: zacw7

Differential Revision: D57012132

Pulled By: xiaoxmeng

fbshipit-source-id: 390ec69d19206a7dac4aabc915db4b3b140076f4
…rSpec.h (facebookincubator#9720)

Summary:
Pull Request resolved: facebookincubator#9720

C++20 has [eliminated](https://en.cppreference.com/w/cpp/types/result_of) `result_of` in favour of `invoke_result`. It's mysterious that this code even still works, but, nevertheless, I'm fixing it.

Differential Revision: D56987447

fbshipit-source-id: 0cf99659c63739911b503a3c0d97be641f98ace9
…#9737)

Summary:
Avoid unnecessary reclaim on non-reclaimable query which might spent time on waiting
for query task to pause.
Also allows arbitration to reclaim reserved capacity from the request pool itself.

Pull Request resolved: facebookincubator#9737

Reviewed By: tanjialiang

Differential Revision: D57055391

Pulled By: xiaoxmeng

fbshipit-source-id: 34fd61a343e445c2e5a4e228f3ce54d630c65f24
Summary:
Tested on MacOS 14.4.1 w/ M1 chip.

* Per facebookincubator#1446 (comment), `CPU_TARGET="arm64"` is not needed any more.
  ```
  ++ arch
  + '[' arm64 == arm64 ']'
  ```
* Add `INSTALL_PREFIX` to address dependency installation permission issues in facebookincubator#2016

CC: majetideepak

Pull Request resolved: facebookincubator#9717

Reviewed By: xiaoxmeng

Differential Revision: D57072389

Pulled By: bikramSingh91

fbshipit-source-id: 4a48333b5034afc4db8ffac411769323b4f77a82
Summary:
When cmake 3.16 is used, build velox fails with the below error.

```
CMake Error at CMake/resolve_dependency_modules/stemmer.cmake:48 (add_library):
  add_library cannot create ALIAS target "stemmer::stemmer" because target
  "stemmer" is imported but not globally visible.
Call Stack (most recent call first):
  CMake/ResolveDependency.cmake:43 (include)
  CMake/ResolveDependency.cmake:81 (build_dependency)
  CMakeLists.txt:557 (resolve_dependency)
```

I note [cmake doc](https://cmake.org/cmake/help/latest/command/add_library.html#alias-libraries) says, "New in version 3.18: An ALIAS can target a non-GLOBAL Imported Target".
So to allow using cmake before version 3.18, this pr proposes to explicitly set GLOBAL for the imported
stemmer target.

Pull Request resolved: facebookincubator#9670

Reviewed By: xiaoxmeng

Differential Revision: D57072511

Pulled By: bikramSingh91

fbshipit-source-id: b7a4a6221900317393ac95051355b0a70466966c
…ator#9738)

Summary:
The lock order issue detected by tsan is as follows:
T1: task startup creates driver and constructor operators which holds the task lock
T2: value operator constructor allocates memory from memory pool for parallelized value node mode
      which triggers memory arbitration
T3: the memory arbitration grabs arbitration lock for arbitration
T4: the memory reclaim tries to grabs the task lock for reclaim such as to pause the task execution

This forms a deadlock and we have the logic to detect the memory pool allocation from the operator
constructor in driver execution framework. However the memory pool of vectors used by value nodes
is not the query memory pool but owns by the test framework. This PR moves the memory pool allocation
from constructor to initialize() method as the other operator does.

Pull Request resolved: facebookincubator#9738

Test Plan:
The test passed 100 stress test iterations:
https://www.internalfb.com/intern/testinfra/testrun/10414574171435627
https://www.internalfb.com/intern/testinfra/testrun/13229323938244819

Reviewed By: bikramSingh91

Differential Revision: D57063553

Pulled By: xiaoxmeng

fbshipit-source-id: 3ad661c35bfa46693729e60420f22920f83a7ff0
…9699)

Summary:
facebookincubator#8715 made a modification to the hash tag in the hash value. The hash tag was changed from (32,38] to (38,44]. We should also update the documentation of the hash table to reflect this change.

Pull Request resolved: facebookincubator#9699

Reviewed By: xiaoxmeng

Differential Revision: D57072452

Pulled By: bikramSingh91

fbshipit-source-id: 4048ee5e536ca90d7145efad59f85cc5c33d1d0f
…facebookincubator#9713)

Summary:
When having N columns, PartitionIdGenerator relies on N VectorHashers
 to generate ValueIds, with each VectorHasher’s multiplier correctly set up.

However when having multiple bool columns, it triggers a corner case that
VectorHasher’s multipliers won’t be set at all.

Add multiplierInitialized flag and properly set it to fix this corner case.

Pull Request resolved: facebookincubator#9713

Reviewed By: xiaoxmeng, spershin

Differential Revision: D57062896

Pulled By: kewang1024

fbshipit-source-id: b02002e1e920735606eb830be2a025fde5a8d101
…tor#9663)

Summary:
Pull Request resolved: facebookincubator#9663

- Change Driver::toString() to log more relevant info.
- Change Task::toString() to log more relevant info.
- Change Operator::toString() to log more relevant info.

Reviewed By: xiaoxmeng

Differential Revision: D56734557

fbshipit-source-id: ef21e2c0d589ce257db176ef7942283bba7b8861
Summary:
When SsdFileTracker looks for regions to evict, it calculates average
score of all regions and picks regions with no pins and whose scores no
larger than the average as candidates. It first adds up all the scores
then divided it by the number of regions to calculate the average.
However, if the scores are large, the sum can overflow to a small
integer, leading to a scenario where none of the individual region
scores is less or equal to the average. When this happens, even when
all regions are unpinned, the ssd cache file will not be able to find
any candidates, blocking all cache writes.
Therefore, overflow needs to be handled properly.

Pull Request resolved: facebookincubator#9709

Reviewed By: xiaoxmeng

Differential Revision: D56965266

Pulled By: zacw7

fbshipit-source-id: 414e67975568618a0d6ba48b947c0de73cb8103c
…r#8723)

Summary:
This change introduces decimal signature for min_by/max_by functions.
For 2-arg version of min_by/max_by, the existing signature definition is generic and no additional changes needed.
For 3-arg version of min_by/max_by, an explicit signature for decimal as compare type need to be added. The 3-arg version doesn't support complex types as compare type and this we cannot define a generic signature.

**Segmentation Fault in priority_queue STL:**
For min_by/max_by aggregates with 3 arguments (specifies N for top N results), we need to maintain a priority_queue (PQ).
This PQ is part of the accumulator. We take care of Accumulator to be memory aligned with int128_t type but PQ itself
must be 128-bit aligned or else we hit a segfault. To avoid this, we can use AlignedStlAllocator to allocate PQ elements whenever value and compare types are int128_t. This seems to fix this test failure:
```
280: [ RUN      ] MinMaxByGlobalByAggregationTest.decimalSignatureTest
280: *** Aborted at 1707784514 (Unix time, try 'date -d 1707784514') ***
280: *** Signal 11 (SIGSEGV) (0x0) received by PID 27168 (pthread TID 0x7f75aa623700) (linux TID 85440) (code: 128), stack trace: ***
280: (error retrieving stack trace)
```

Testing:
Added unit tests for decimals.
Also, extended the existing tests to include HUGEINT type.
Some of the min_by/max_by unit tests also test plan with tableScans. But, due to this issue: facebookincubator#7775, HUGEINT type is not tested for tableScans. This PR: facebookincubator#8910 already skips testing tableScans if type is not supported by DWRF.

Pull Request resolved: facebookincubator#8723

Reviewed By: amitkdutta

Differential Revision: D57087211

Pulled By: kagamiori

fbshipit-source-id: 42b3f2af75544f70f6484e9160e0378575eb9687
…r sizes is nullptr (facebookincubator#9725)

Summary:
Pull Request resolved: facebookincubator#9725

`offset_` or `sizes_` can be `nullptr` if it is multi-referenced and cleared during `prepareForReuse`, in that case the old code results in crash.  Reallocate them using `mutableOffsets` or `mutableSizes` in `copyRangesImpl`.

Reviewed By: kevinwilfong

Differential Revision: D57012135

fbshipit-source-id: 85df434f3086ac0362b220245476b88dd54fd83b
…#9425)

Summary:
In Gluten, both presto functions & spark functions are registered. But, there are some conflict issues. For example, when we expect to call a vector function from spark registry, a simple function with same signature from Presto registry can be called unexpectedly. So it's better to let upper framework register its own scalar function set. If some presto functions are re-usable, we need to explicitly register it for spark as this PR proposed.

Pull Request resolved: facebookincubator#9425

Reviewed By: kagamiori

Differential Revision: D57111008

Pulled By: pedroerp

fbshipit-source-id: 8abc99989d03318b3e6925e9738883c8eb3d63d3
Summary:
Return of kernel without __syncthreads() does not imply immediate visibility of writes to next kernel or host accessing unified memory. Add a barrier at kernel ends. Add GPU implementation of scatterBits.

Pull Request resolved: facebookincubator#9745

Reviewed By: Yuhta

Differential Revision: D57109409

Pulled By: oerling

fbshipit-source-id: ea29cd8fc81dacf02f41562ec945a163d217f170
Summary:
Pull Request resolved: facebookincubator#9703

X-link: facebookincubator/nimble#48

Modify interface to accept rowsToSkip.

Readers will soon support it.

Reviewed By: Sullivan-Patrick

Differential Revision: D56953541

fbshipit-source-id: 8280a139347c60e5c918eb3bd817eb8295b2b19f
Summary:
Pull Request resolved: facebookincubator#9706

I'll use it from somewhere else too.

Reviewed By: kevinwilfong, Sullivan-Patrick

Differential Revision: D56962043

fbshipit-source-id: b2dbe8fd09e21a77878941eeed330aec941bd669
Summary:
When TTL control ages out old cache entries and evict out the entire regions from a ssd file, it tries to add those evicted region back to the writable regions and this potentially could cause duplicate writable regions as the evicted region here might not necessarily be non-writable region.

Pull Request resolved: facebookincubator#9746

Reviewed By: xiaoxmeng

Differential Revision: D57114817

Pulled By: zacw7

fbshipit-source-id: 7cc41c186d78fc94b5751368e12e2d38e3c226d7
…9718)

Summary:
We now use alwaysFalse of Filter.h and some method not implemented causing runtime error. So, we'd like to implment some alwaysFalse method to avoid the error.

Pull Request resolved: facebookincubator#9718

Reviewed By: Yuhta

Differential Revision: D57072475

Pulled By: bikramSingh91

fbshipit-source-id: a8986387f044cdf6f21096d06c5fdfc070aba09a
Summary:
Pull Request resolved: facebookincubator#9751

To have visibility into any possibly hanging drivers.

Reviewed By: xiaoxmeng

Differential Revision: D57120114

fbshipit-source-id: 6df2f8c33d028d428ecb1776aa444ece564bbc78
…r#9723)

Summary:
Added memory allocator and cache stats to PeriodicStatsReporter.
Added documentations to monitoring doc.

Pull Request resolved: facebookincubator#9723

Reviewed By: xiaoxmeng, zacw7

Differential Revision: D57085696

Pulled By: tanjialiang

fbshipit-source-id: 7dbc3791d314212e0c7658d0d868dac074f34fcf
…kincubator#9734)

Summary:
Hash join run spill in parallel by using async worker. The spill work might trigger memory allocation
from non-spill memory pool such as lazy io triggered when materializing the column vector to write
out. We do bypass memory arbitration if the memory allocation is for arbitration by thread-local context.
However, async worker doesn't set it when running on background executor. This causes the recursive
arbitration which will deadlock.
This PR fixes the issue by providing createAsyncMemoryReclaimTask utility which helps setup memory
arbitration context and uses it hash join spill. Unit test is added for verification.

Pull Request resolved: facebookincubator#9734

Reviewed By: bikramSingh91, tanjialiang, tonyxug, oerling

Differential Revision: D57092581

Pulled By: xiaoxmeng

fbshipit-source-id: 8c3ab0d838214cf8bb8b657be395c4690c954dd1
…ebookincubator#9533)

Summary:
According to the spec of list backward compatibility [link](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists).
`If the repeated field is a group with multiple fields, then its type is the element type and elements are required.`

Currently Parquet reader assumes list type only has one child, which would fail for this backward compatibility case. This change makes it that when there are multiple fields, creating a new row type instance which has all the fields as its children.

This diff is co-authored with qqibrow .

Pull Request resolved: facebookincubator#9533

Reviewed By: mbasmanova

Differential Revision: D56783371

Pulled By: Yuhta

fbshipit-source-id: 50a03e55e8b49ae6f82ca8ae4101b953b60ba5e8
…ookincubator#9749)

Summary:
Pull Request resolved: facebookincubator#9749

In order to reuse this base class in nimble, we need to move it to
`tests/utils` which will be included in `VELOX_BUILD_MINIMAL_WITH_DWIO`.  Also
factor out `ColumnSelector::fromScanSpec` so that we can reuse this logic in
nimble without depending on `HiveConnectorUtil`.

Reviewed By: pedroerp

Differential Revision: D57114400

fbshipit-source-id: 406413afb924a85d1a6c93c59c8ca5d755f0b725
Summary: Pull Request resolved: facebookincubator#9760

Reviewed By: mbasmanova

Differential Revision: D57162713

fbshipit-source-id: 5a19fdd0cc0917cfde5cd8fe7654bd79aac36523
)

Summary:
Spark support people query row_index metadata column of parquet file. Checking this spark part implement -apache/spark@95aebcb

however, velox doesn't support row_index metadata, below spark query would return null for row index column

```
select a, _tmp_metadata_row_index from table;
```

related issue facebookincubator#9165

The PR introduces a new column handle type `kRowIndex` which can be used to indicate which column is row index column need be generated if you want to add a new column to the results containing the row numbers.  The new column contains row number of type `BIGINT` in the file starting from 0 before any filtering and mutation, and works on all file formats supported.

Pull Request resolved: facebookincubator#9174

Reviewed By: mbasmanova

Differential Revision: D56472291

Pulled By: Yuhta

fbshipit-source-id: 848693a9ccc5ee5e3279f012d6198721b7691d6f
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2024
Catch exception during BufferedWriter processing to avoid server crash as ~BufferedWriter
expect either the object is aborted or all the data mutations have been flushed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.