-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
* FLOAT, DOUBLE - Signed comparison with special handling of NaNs and
signed zeros. The details are documented in the
[Thrift definition](src/main/thrift/parquet.thrift) in the
`ColumnOrder` union. They are summarized here but the Thrift definition
is considered authoritative:
* NaNs should not be written to min or max statistics fields.
* If the computed max value is zero (whether negative or positive),
`+0.0` should be written into the max statistics field.
* If the computed min value is zero (whether negative or positive),
`-0.0` should be written into the min statistics field.
For backwards compatibility when reading files:
* If the min is a NaN, it should be ignored.
* If the max is a NaN, it should be ignored.
* If the min is +0, the row group may contain -0 values as well.
* If the max is -0, the row group may contain +0 values as well.
* When looking for NaN values, min and max should be ignored.
Specifically the points about the computed max and min values when they are negative/positive zero.
To Reproduce
Add test to parquet/src/column/writer/mod.rs:
#[test]
fn test_float_statistics_zero_only() {
let stats = statistics_roundtrip::<FloatType>(&[0.0]);
assert!(stats.has_min_max_set());
assert!(stats.is_min_max_backwards_compatible());
if let Statistics::Float(stats) = stats {
assert_eq!(stats.min(), &-0.0);
assert!(stats.min().is_sign_negative());
assert_eq!(stats.max(), &0.0);
assert!(stats.max().is_sign_positive());
} else {
panic!("expecting Statistics::Float");
}
}Run:
parquet$ cargo test --lib column::writer::tests::test_float_statistics_zero_only -- --nocapture --exact
Finished test [unoptimized + debuginfo] target(s) in 0.06s
Running unittests src/lib.rs (/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/deps/parquet-b466af98c7f74484)
running 1 test
thread 'column::writer::tests::test_float_statistics_zero_only' panicked at parquet/src/column/writer/mod.rs:2121:13:
assertion failed: stats.min().is_sign_negative()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test column::writer::tests::test_float_statistics_zero_only ... FAILED
failures:
failures:
column::writer::tests::test_float_statistics_zero_only
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 617 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Expected behavior
Test should succeed
Additional context
Reactions are currently unavailable