-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix datafusion-cli
print output
#8895
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -161,23 +161,29 @@ impl PrintFormat { | |
maxrows: MaxRows, | ||
with_header: bool, | ||
) -> Result<()> { | ||
if batches.is_empty() || batches[0].num_rows() == 0 { | ||
// filter out any empty batches | ||
let batches: Vec<_> = batches | ||
.iter() | ||
.filter(|b| b.num_rows() > 0) | ||
.cloned() | ||
.collect(); | ||
if batches.is_empty() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What will happen for queries that return empty results? e.g. select * from table where 1=0 Will this print nothing at all, not even the table outline? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also like to mention that a similar behavior exists for streaming results. If the first batch of the stream is empty, the header will not be printed, even if it was set to true (this issue existed prior to this PR). Perhaps we should print an empty record batch for the header before these checks if the flag is set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right, it will NOT print the outline / header. Also see #8895 (comment) , i.e. I also find this rather confusing. However this PR here just fixes the status quo to not swallow the output for aggregates. I personally think we should change the behavior in a follow-up to at least print the table outline / header even for empty results. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah ok, I didn't realize the "print nothing if no data" was a previous issue before that streaming PR Then yeah I agree can fix that separately if it wasn't technically a regression There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filed #8904 to track (and I confirmed the "print nothing if no data" behavior was present in versions 33 and 25). |
||
return Ok(()); | ||
} | ||
|
||
match self { | ||
Self::Csv | Self::Automatic => { | ||
print_batches_with_sep(writer, batches, b',', with_header) | ||
print_batches_with_sep(writer, &batches, b',', with_header) | ||
} | ||
Self::Tsv => print_batches_with_sep(writer, batches, b'\t', with_header), | ||
Self::Tsv => print_batches_with_sep(writer, &batches, b'\t', with_header), | ||
Self::Table => { | ||
if maxrows == MaxRows::Limited(0) { | ||
return Ok(()); | ||
} | ||
format_batches_with_maxrows(writer, batches, maxrows) | ||
format_batches_with_maxrows(writer, &batches, maxrows) | ||
} | ||
Self::Json => batches_to_json!(ArrayWriter, writer, batches), | ||
Self::NdJson => batches_to_json!(LineDelimitedWriter, writer, batches), | ||
Self::Json => batches_to_json!(ArrayWriter, writer, &batches), | ||
Self::NdJson => batches_to_json!(LineDelimitedWriter, writer, &batches), | ||
} | ||
} | ||
} | ||
|
@@ -189,7 +195,7 @@ mod tests { | |
|
||
use super::*; | ||
|
||
use arrow::array::Int32Array; | ||
use arrow::array::{ArrayRef, Int32Array}; | ||
use arrow::datatypes::{DataType, Field, Schema}; | ||
use datafusion::error::Result; | ||
|
||
|
@@ -351,4 +357,111 @@ mod tests { | |
|
||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn test_print_batches_empty_batches() -> Result<()> { | ||
let batch = one_column_batch(); | ||
let empty_batch = RecordBatch::new_empty(batch.schema()); | ||
|
||
#[rustfmt::skip] | ||
let expected =&[ | ||
"+---+", | ||
"| a |", | ||
"+---+", | ||
"| 1 |", | ||
"| 2 |", | ||
"| 3 |", | ||
"+---+\n", | ||
]; | ||
|
||
PrintBatchesTest::new() | ||
.with_format(PrintFormat::Table) | ||
.with_batches(vec![empty_batch.clone(), batch, empty_batch]) | ||
.with_expected(expected) | ||
.run(); | ||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn test_print_batches_empty_batches_no_header() -> Result<()> { | ||
let empty_batch = RecordBatch::new_empty(one_column_batch().schema()); | ||
|
||
// empty batches should not print a header | ||
let expected = &[""]; | ||
|
||
PrintBatchesTest::new() | ||
.with_format(PrintFormat::Table) | ||
.with_batches(vec![empty_batch]) | ||
.with_header(true) | ||
.with_expected(expected) | ||
.run(); | ||
Ok(()) | ||
} | ||
|
||
struct PrintBatchesTest { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I plan to rework the rest of the tests in this module as a follow on PR using this struct There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
format: PrintFormat, | ||
batches: Vec<RecordBatch>, | ||
maxrows: MaxRows, | ||
with_header: bool, | ||
expected: Vec<&'static str>, | ||
} | ||
|
||
impl PrintBatchesTest { | ||
fn new() -> Self { | ||
Self { | ||
format: PrintFormat::Table, | ||
batches: vec![], | ||
maxrows: MaxRows::Unlimited, | ||
with_header: false, | ||
expected: vec![], | ||
} | ||
} | ||
|
||
/// set the format | ||
fn with_format(mut self, format: PrintFormat) -> Self { | ||
self.format = format; | ||
self | ||
} | ||
|
||
/// set the batches to convert | ||
fn with_batches(mut self, batches: Vec<RecordBatch>) -> Self { | ||
self.batches = batches; | ||
self | ||
} | ||
|
||
/// set whether to include a header | ||
fn with_header(mut self, with_header: bool) -> Self { | ||
self.with_header = with_header; | ||
self | ||
} | ||
|
||
/// set expected output | ||
fn with_expected(mut self, expected: &[&'static str]) -> Self { | ||
self.expected = expected.to_vec(); | ||
self | ||
} | ||
|
||
/// run the test | ||
fn run(self) { | ||
let mut buffer: Vec<u8> = vec![]; | ||
self.format | ||
.print_batches(&mut buffer, &self.batches, self.maxrows, self.with_header) | ||
.unwrap(); | ||
let actual = String::from_utf8(buffer).unwrap(); | ||
let expected = self.expected.join("\n"); | ||
assert_eq!( | ||
actual, expected, | ||
"actual:\n\n{actual}expected:\n\n{expected}" | ||
); | ||
} | ||
} | ||
|
||
/// return a batch with one column and three rows | ||
fn one_column_batch() -> RecordBatch { | ||
RecordBatch::try_from_iter(vec![( | ||
"a", | ||
Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef, | ||
)]) | ||
.unwrap() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix (thanks @Jefffrey for finding this). The rest of the PR is tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the intended behavior, I think this should probably be:
However, I think the intention is wrong. We should totally print the header even if there's no data. This entire "don't print anything if the result is empty" is super confusing, at least in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call -- I will update and add a test
I believe the logic is to support generating output incrementally ("Streaming") rather than waiting for the entire output to be generated as
Vec<RecordBatch>
.However, that logic is sort of split between this module and
print_format.rs
which I agree is confusingcc @berkaysynnada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 2ef1f9d