Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Apr 12, 2022

Which issue does this PR close?

Closes #1544
Closes #1239

Rationale for this change

We have a full validation mode that validates the full contents of creating Arrays but it is (on purpose) not called in many places in arrow for performance reasons: https://docs.rs/arrow/11.1.0/arrow/array/struct.ArrayData.html#method.validate_full

This leads to two possible issues:

  1. Lack of test coverage of arrays constructed within arrow
  2. Possible Lack of test coverage of the validation routine itself

What changes are included in this PR?

  1. Add a force_validate feature flag that forced the validation check for all array creations (defaults to off)
  2. Add a new CI check that runs with this flag on (see example failure)

Found / filed new tickets or existing tickets:

Are there any user-facing changes?

New optional feature force_validate


// Provide a force_validate mode
#[cfg(feature = "force_validate")]
new_self.validate_full().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the validation call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb marked this pull request as ready for review April 12, 2022 14:43
@alamb alamb requested a review from tustvold April 12, 2022 14:43
@codecov-commenter
Copy link

Codecov Report

Merging #1546 (19dc05a) into master (68038f5) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 19dc05a differs from pull request most recent head 58e3c28. Consider uploading reports for the commit 58e3c28 to get more accurate results

@@            Coverage Diff             @@
##           master    #1546      +/-   ##
==========================================
- Coverage   82.82%   82.82%   -0.01%     
==========================================
  Files         190      190              
  Lines       54941    54943       +2     
==========================================
- Hits        45507    45506       -1     
- Misses       9434     9437       +3     
Impacted Files Coverage Δ
arrow/src/array/array_binary.rs 92.93% <ø> (ø)
arrow/src/array/array_boolean.rs 93.18% <ø> (ø)
arrow/src/array/array_list.rs 95.52% <ø> (ø)
arrow/src/array/array_primitive.rs 94.85% <ø> (ø)
arrow/src/array/data.rs 82.98% <0.00%> (-0.15%) ⬇️
arrow/src/datatypes/datatype.rs 66.40% <0.00%> (-0.40%) ⬇️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
parquet/src/encodings/encoding.rs 93.56% <0.00%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68038f5...58e3c28. Read the comment docs.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this also closes #1239

Edit: Once we bash out the failing tests, we could look to enable this in the DataFusion CI as well for even more coverage 😄


// Provide a force_validate mode
#[cfg(feature = "force_validate")]
new_self.validate_full().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

}

#[test]
// Fails when validation enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least we know there is a problem. So it is progress 🤷 we'll get it fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to fix it at #1567

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2022

I will try and polish this up shortly

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better now 😄

@alamb alamb merged commit 3fed612 into apache:master Apr 14, 2022
@alamb alamb deleted the alamb/test_validation branch April 14, 2022 17:35
@alamb alamb added the development-process Related to development process of arrow-rs label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate development-process Related to development process of arrow-rs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CI check for full validation mode Add test_validate feature flag + CI check

4 participants