- 
        Couldn't load subscription status. 
- Fork 1k
Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) #1110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c5597b2    to
    7838585      
    Compare
  
    | /// Total number of batches to attempt to read. | ||
| /// `record_batch_size` * `num_iterations` should be greater | ||
| /// than `num_rows` to ensure the data can be read back completely | ||
| num_iterations: usize, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't seem to serve a purpose, as it was always set in such a way as to read all the data, so I removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is redundant when record_batch_size is provided (which means the data is not all read in one big chunk, but is read in record_batch_size chunks)
7838585    to
    0baa151      
    Compare
  
    | Codecov Report
 
 @@            Coverage Diff             @@
##           master    #1110      +/-   ##
==========================================
+ Coverage   82.55%   82.56%   +0.01%     
==========================================
  Files         169      169              
  Lines       50456    50535      +79     
==========================================
+ Hits        41655    41726      +71     
- Misses       8801     8809       +8     
 Continue to review full report at Codecov. 
 | 
ff604e0    to
    8b98d0e      
    Compare
  
    | Thanks to @yordan-pavlov 's work on #1130 this now passes on master 🎉 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice work @tustvold
| /// Total number of batches to attempt to read. | ||
| /// `record_batch_size` * `num_iterations` should be greater | ||
| /// than `num_rows` to ensure the data can be read back completely | ||
| num_iterations: usize, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is redundant when record_batch_size is provided (which means the data is not all read in one big chunk, but is read in record_batch_size chunks)
Which issue does this PR close?
Closes #1053.
Rationale for this change
See ticket
What changes are included in this PR?
This extends the parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages.
Currently this runs into what appears to be a bug in the null handling for ArrowArrayReader. This is likely the same as in apache/datafusion#1441 - I have temporarily switched back to ComplexObjectArrayReader to get the test to pass, and will look into a fix prior to marking this ready for review.This has been fixed by #1130Are there any user-facing changes?
No, this only adds tests