-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix datafusion-cli
print output
#8895
Conversation
datafusion-cli
print output
@@ -161,7 +161,7 @@ impl PrintFormat { | |||
maxrows: MaxRows, | |||
with_header: bool, | |||
) -> Result<()> { | |||
if batches.is_empty() || batches[0].num_rows() == 0 { | |||
if batches.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix (thanks @Jefffrey for finding this). The rest of the PR is tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the intended behavior, I think this should probably be:
if batches.is_empty() { | |
if batches.is_empty() || batches.iter().all(|b| b.num_rows() == 0) { |
However, I think the intention is wrong. We should totally print the header even if there's no data. This entire "don't print anything if the result is empty" is super confusing, at least in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think the intention is wrong. We should totally print the header even if there's no data. This entire "don't print anything if the result is empty" is super confusing, at least in my opinion.
This is a good call -- I will update and add a test
I believe the logic is to support generating output incrementally ("Streaming") rather than waiting for the entire output to be generated as Vec<RecordBatch>
.
However, that logic is sort of split between this module and print_format.rs
which I agree is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 2ef1f9d
Ok(()) | ||
} | ||
|
||
struct PrintBatchesTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to rework the rest of the tests in this module as a follow on PR using this struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to merge this tomorrow morning US Eastern time unless anyone else would like more time to review |
.filter(|b| b.num_rows() > 0) | ||
.cloned() | ||
.collect(); | ||
if batches.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen for queries that return empty results?
e.g.
select * from table where 1=0
Will this print nothing at all, not even the table outline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to mention that a similar behavior exists for streaming results. If the first batch of the stream is empty, the header will not be printed, even if it was set to true (this issue existed prior to this PR). Perhaps we should print an empty record batch for the header before these checks if the flag is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this print nothing at all, not even the table outline?
Right, it will NOT print the outline / header. Also see #8895 (comment) , i.e. I also find this rather confusing. However this PR here just fixes the status quo to not swallow the output for aggregates. I personally think we should change the behavior in a follow-up to at least print the table outline / header even for empty results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I didn't realize the "print nothing if no data" was a previous issue before that streaming PR
Then yeah I agree can fix that separately if it wasn't technically a regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #8904 to track (and I confirmed the "print nothing if no data" behavior was present in versions 33 and 25).
Thanks everyone for the reviews |
I added better test coverage in #8896 as promised |
Which issue does this PR close?
Resolves #8702
Rationale for this change
This is a regression introduced in #8651 where if the first record batch happens to be empty data in non empty batches will not be printed.
What changes are included in this PR?
Are these changes tested?
Yes, new tests
Are there any user-facing changes?
Fix bug