-
Notifications
You must be signed in to change notification settings - Fork 986
DRILL-8526: Hive Predicate Push Down for ORC and Parquet #2995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cgivre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR! There are a few minor comments, but the biggest thing that we'll need to merge this are some unit tests. Can you please add some to confirm that this is working as you intend?
...rage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/filter/HiveFilter.java
Show resolved
Hide resolved
...rage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/filter/HiveFilter.java
Show resolved
Hide resolved
| RuntimeException> { | ||
| private static final Logger logger = LoggerFactory.getLogger(HiveFilterBuilder.class); | ||
|
|
||
| final private HiveScan groupScan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please be consistent with private final vs final private
|
Wow! This is a serious performance improvement. Nicely done! One other thing... I've found that the limit was also not pushed down in a lot of storage plugins. It is usually fairly easy to add. Would you consider checking to see if the limit pushdown is there and if not, add it as well? |
@cgivre I did some test for this feature, including data type testing, logical expression testing, and some special expressions.as following:
i will add some unit tests for it |
|
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/drill/exec/store/hive/readers/filter/HiveCompareFunctionsProcessor.java
Outdated
Show resolved
Hide resolved
| import org.apache.hive.com.esotericsoftware.kryo.io.Output; | ||
|
|
||
| public class HiveFilter { | ||
| private SearchArgument searchArgument; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make final.
...ve/core/src/main/java/org/apache/drill/exec/store/hive/readers/filter/HiveFilterBuilder.java
Outdated
Show resolved
Hide resolved
019ce0d to
0d20e3d
Compare
cgivre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted some minor changes. Just a reminder but we really need unit tests in order to merge this.
Also, have you considered adding a limit pushdown? It is usually pretty easy to do and only involves:
- Implementing two methods in the group scan (
HiveScan) which are:supportsLimitPushdownandapplyLimit. - Passing the limit through the subscans.
- Adding some logic in the readers to stop when the limit is reached.
Maybe it would be best to open a new JIRA for that, but IMHO, it is one of the easiest and most effective pushdowns that can be implemented yet Drill didn't seem to do for all the plugins.
...main/java/org/apache/drill/exec/store/hive/readers/filter/HiveCompareFunctionsProcessor.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/drill/exec/store/hive/readers/filter/HiveCompareFunctionsProcessor.java
Outdated
Show resolved
Hide resolved
| } | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please add a blank line at the end of all classes. Here and elsewhere.
...rage-hive/core/src/main/java/org/apache/drill/exec/store/hive/readers/filter/HiveFilter.java
Show resolved
Hide resolved
| private SearchArgument.Builder builder; | ||
| private HashMap<String, SqlTypeName> dataTypeMap; | ||
|
|
||
| HiveFilterBuilder(LogicalExpression le, HashMap<String, SqlTypeName> dataTypeMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be public?
|
|
||
| private boolean allExpressionsConverted = false; | ||
| private SearchArgument.Builder builder; | ||
| private HashMap<String, SqlTypeName> dataTypeMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make any variables final which can be made final.
| logger.warn("Unsupported logical operator:{} for push down", functionName); | ||
| return builder; | ||
| } | ||
| for (int i = 0; i < args.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace with enhanced for-loop?
| private PredicateLeaf.Type convertLeafType(SqlTypeName sqlTypeName) { | ||
| PredicateLeaf.Type type = null; | ||
| switch (sqlTypeName) { | ||
| case BOOLEAN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please indent switch statement.
@cgivre ok, I will add some unit tests and supoort limit push down later |
|
@shfshihuafeng Just to be clear, we can't merge this without unit tests. You don't have to do the limit pushdown if you don't want to or aren't able to, but I just wanted to recommend it as it can be a big performance benefit for not much work. Alternatively, you can create a separate pull request for that and do it later. It is your choice.. |
@cgivre I think it is best to open a new JIRA,so you can refer to following link for hive limit push down,and i commit pr for this |
f1c632f to
f3ef7a7
Compare
|
@cgivre I added unit tests which were not run on JDK 11+. Therefore, I submitted another PR in version drill 18 to run these unit tests on jdk 8. |
ee1ea37 to
692a81b
Compare
cgivre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1 Unless there are any objections, I think we should merge this. It doesn't break anything and apparently the Hive unit tests were not running (and already broken).
@shfshihuafeng It would be nice if you could fix them and assist with updating the Hive client to a newer version, but since this is adding value for you, I'd like to get it merged.
DRILL-8526: Hive Predicate Push Down for ORC and Parquet
Description
Drill do not support filter push down for orc format.When a large amount of data is filtered out, Predicate PushDown can significantly improve the query performance of ORC format.
I have implemented this featur and supported the data types are as follows:
int
bigint,
INYINT,
SMALLINT,
double,
float,
BOOLEAN
date,
TIMESTAMP,
decimal
string,
varchar
char
Testing
Through comparative testing , ORC format with filter pushdown achieves nearly a 20x performance improvement over execution without pushdown on following sql.
1.Import TPCH data. In my test, the amount of Table lineitem was 6001215.
2.set "store.hive.optimize_scan_filter_pushdown =true" by the option on drill web
3. execute sql select * from hive.lineitem_o where L_ORDERKEY=1;
(1)store.hive.optimize_scan_filter_pushdown =false

(2)store.hive.optimize_scan_filter_pushdown =true