Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix datafusion-cli print output #8895

Merged
merged 3 commits into from
Jan 18, 2024
Merged

Fix datafusion-cli print output #8895

merged 3 commits into from
Jan 18, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 17, 2024

Which issue does this PR close?

Resolves #8702

Rationale for this change

This is a regression introduced in #8651 where if the first record batch happens to be empty data in non empty batches will not be printed.

What changes are included in this PR?

  1. Fix bug
  2. Add test

Are these changes tested?

Yes, new tests

Are there any user-facing changes?

Fix bug

@alamb alamb changed the title Fix datafusion-cli print output Fix datafusion-cli print output Jan 17, 2024
@@ -161,7 +161,7 @@ impl PrintFormat {
maxrows: MaxRows,
with_header: bool,
) -> Result<()> {
if batches.is_empty() || batches[0].num_rows() == 0 {
if batches.is_empty() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix (thanks @Jefffrey for finding this). The rest of the PR is tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the intended behavior, I think this should probably be:

Suggested change
if batches.is_empty() {
if batches.is_empty() || batches.iter().all(|b| b.num_rows() == 0) {

However, I think the intention is wrong. We should totally print the header even if there's no data. This entire "don't print anything if the result is empty" is super confusing, at least in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think the intention is wrong. We should totally print the header even if there's no data. This entire "don't print anything if the result is empty" is super confusing, at least in my opinion.

This is a good call -- I will update and add a test

I believe the logic is to support generating output incrementally ("Streaming") rather than waiting for the entire output to be generated as Vec<RecordBatch>.

However, that logic is sort of split between this module and print_format.rs which I agree is confusing

cc @berkaysynnada

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2ef1f9d

Ok(())
}

struct PrintBatchesTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to rework the rest of the tests in this module as a follow on PR using this struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Copy link
Contributor Author

alamb commented Jan 17, 2024

I plan to merge this tomorrow morning US Eastern time unless anyone else would like more time to review

.filter(|b| b.num_rows() > 0)
.cloned()
.collect();
if batches.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen for queries that return empty results?

e.g.

select * from table where 1=0

Will this print nothing at all, not even the table outline?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to mention that a similar behavior exists for streaming results. If the first batch of the stream is empty, the header will not be printed, even if it was set to true (this issue existed prior to this PR). Perhaps we should print an empty record batch for the header before these checks if the flag is set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this print nothing at all, not even the table outline?

Right, it will NOT print the outline / header. Also see #8895 (comment) , i.e. I also find this rather confusing. However this PR here just fixes the status quo to not swallow the output for aggregates. I personally think we should change the behavior in a follow-up to at least print the table outline / header even for empty results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I didn't realize the "print nothing if no data" was a previous issue before that streaming PR

Then yeah I agree can fix that separately if it wasn't technically a regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #8904 to track (and I confirmed the "print nothing if no data" behavior was present in versions 33 and 25).

@alamb alamb merged commit bdf5e9d into apache:main Jan 18, 2024
23 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2024

Thanks everyone for the reviews

@alamb alamb deleted the alamb/cli-fix branch January 18, 2024 11:34
@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2024

I added better test coverage in #8896 as promised

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datafusion-cli: results of group by aggregation query not showing
4 participants