Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pruning for boolean columns #490

Closed
alamb opened this issue Jun 3, 2021 · 0 comments · Fixed by #500
Closed

Support pruning for boolean columns #490

alamb opened this issue Jun 3, 2021 · 0 comments · Fixed by #500
Assignees
Labels
datafusion Changes in the datafusion crate enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 3, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When attempting to prune containers such as parquet row groups based on boolean columns (e.g. a flag column), the pruning logic does not work.

So for example, with a query like

select * from my_parquet_based_table where my_flag_column = true

Will not prune any row groups based on the my_flag_column predicate.

Describe the solution you'd like
I would like pruning to occur for boolean columns. Aka add support here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_optimizer/pruning.rs

Here is an example test that fails:

diff --git a/datafusion/src/physical_optimizer/pruning.rs b/datafusion/src/physical_optimizer/pruning.rs
index 3a5a64c6f..d2e93b9b5 100644
--- a/datafusion/src/physical_optimizer/pruning.rs
+++ b/datafusion/src/physical_optimizer/pruning.rs
@@ -508,6 +508,16 @@ mod tests {
             }
         }
 
+        fn new_bool<'a>(
+            min: impl IntoIterator<Item = Option<bool>>,
+            max: impl IntoIterator<Item = Option<bool>>,
+        ) -> Self {
+            Self {
+                min: Arc::new(min.into_iter().collect::<BooleanArray>()),
+                max: Arc::new(max.into_iter().collect::<BooleanArray>()),
+            }
+        }
+
         fn min(&self) -> Option<ArrayRef> {
             Some(self.min.clone())
         }
@@ -927,8 +937,8 @@ mod tests {
     #[test]
     fn prune_api() {
         let schema = Arc::new(Schema::new(vec![
-            Field::new("s1", DataType::Utf8, false),
-            Field::new("s2", DataType::Int32, false),
+            Field::new("s1", DataType::Utf8, true),
+            Field::new("s2", DataType::Int32, true),
         ]));
 
         // Prune using s2 > 5
@@ -953,4 +963,35 @@ mod tests {
 
         assert_eq!(result, expected);
     }
+
+
+    #[test]
+    fn prune_api_bool() {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("b1", DataType::Boolean, true),
+        ]));
+
+        let statistics = TestStatistics::new().with(
+            "b1",
+            ContainerStats::new_bool(
+                vec![Some(false), Some(false), Some(true), None, Some(false)], // min
+                vec![Some(false), Some(true),  Some(true), None, None ], // max
+            ),
+        );
+
+        // For predicate "b1" (boolean expr)
+        // b1 [false, false] ==> no rows should pass
+        // b1 [false, true] ==> some rows could pass
+        // b1 [true, true] ==> some rows could pass
+        // b1 [NULL, NULL]  ==> no rows could pass
+        // b1 [false, NULL]  ==> no rows could pass
+        let expr = col("b1");
+        let expected = vec![false, true, true, false, false];
+
+        let p = PruningPredicate::try_new(&expr, schema).unwrap();
+        let result = p.prune(&statistics).unwrap();
+
+        assert_eq!(result, expected);
+    }
+
 }
@alamb alamb added enhancement New feature or request datafusion Changes in the datafusion crate labels Jun 3, 2021
@alamb alamb self-assigned this Jun 3, 2021
@alamb alamb closed this as completed in #500 Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant