Support for pushdown like filter (endsWith and contains) #9683

yabola · 2024-02-08T05:05:35Z

This PR supports pushing down endsWith (like %x) and contains (like %x%) to Iceberg. The benefits are:

Support for early filtering of partition columns in partitioned tables, before this PR iceberg needs to scan the entire table.
Support for pushdown agg under certain scenarios (before endWiths or contains could not be pushed down, so it could not pushdown agg).
Support for filtering files using Parquet dictionaries for regular columns in tables.

Before this PR, iceberg only support startWith.

yabola · 2024-02-08T05:47:59Z

I have an example of performance comparison.
Table : p_lineorder_ice has partition column LO_ORDERDATE.

pushdown partition column
test sql: select * from ice.ssb10.p_lineorder_ice where LO_ORDERDATE like '%01'
before this pr:

after this pr:

pushdown agg column
test sql: select count(1) from ice.ssb10.p_lineorder_ice where LO_ORDERDATE like '%01';
before this PR:

after this PR:

amogh-jahagirdar

@yabola This is very cool, could we break up the PR though for easier review? Let me take a deeper look at the code before I propose a way to break it up into separate PRs. On the surface though it seems like we should be able to break apart the expression changes in API/Core and then have separate PRs for the file formats and Spark integration?

amogh-jahagirdar · 2024-02-08T16:16:00Z

Also, I'll need to think more if we can actually support this for delete files. If not, this will need to only be applied for CoW tables. For example, for agg pushdown, we don't support MoR tables (well unless it's compacted)

yabola · 2024-02-09T03:37:55Z

@yabola This is very cool, could we break up the PR though for easier review? Let me take a deeper look at the code before I propose a way to break it up into separate PRs. On the surface though it seems like we should be able to break apart the expression changes in API/Core and then have separate PRs for the file formats and Spark integration?

I will break up my PR. Thanks for your advice.

yabola · 2024-02-11T13:54:38Z

Also, I'll need to think more if we can actually support this for delete files. If not, this will need to only be applied for CoW tables. For example, for agg pushdown, we don't support MoR tables (well unless it's compacted)

@amogh-jahagirdar I checked the code. When there are delete rows, it won't pushdown agg and I have tested it.

iceberg/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

Lines 244 to 247 in 5f577f1

    
           if (!task.deletes().isEmpty()) { 
        
             LOG.info("Skipping aggregate pushdown: detected row level deletes"); 
        
             return false; 
        
           }

github-actions · 2024-10-14T00:16:14Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

github-actions · 2024-10-22T00:15:14Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

yabola · 2025-02-10T12:51:10Z

@rdblue @amogh-jahagirdar Hi~ could I reopen this?

Support pushdown endswith and contains filter

de82bdb

github-actions bot added API spark parquet core ORC labels Feb 8, 2024

yabola changed the title ~~Support pushdown like filter (endsWith and contains)~~ Support for pushdown like filter (endsWith and contains) Feb 8, 2024

fix ut

45a638e

amogh-jahagirdar assigned amogh-jahagirdar and unassigned amogh-jahagirdar Feb 8, 2024

amogh-jahagirdar self-requested a review February 8, 2024 15:50

amogh-jahagirdar reviewed Feb 8, 2024

View reviewed changes

yabola mentioned this pull request Feb 11, 2024

Add filter pushdown API for contains and endsWith #9710

Closed

github-actions bot added the stale label Oct 14, 2024

github-actions bot closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for pushdown like filter (endsWith and contains) #9683

Support for pushdown like filter (endsWith and contains) #9683

yabola commented Feb 8, 2024 •

edited

Loading

yabola commented Feb 8, 2024 •

edited

Loading

amogh-jahagirdar left a comment

amogh-jahagirdar commented Feb 8, 2024 •

edited

Loading

yabola commented Feb 9, 2024

yabola commented Feb 11, 2024

github-actions bot commented Oct 14, 2024

github-actions bot commented Oct 22, 2024

yabola commented Feb 10, 2025

Support for pushdown like filter (endsWith and contains) #9683

Support for pushdown like filter (endsWith and contains) #9683

Conversation

yabola commented Feb 8, 2024 • edited Loading

yabola commented Feb 8, 2024 • edited Loading

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

amogh-jahagirdar commented Feb 8, 2024 • edited Loading

yabola commented Feb 9, 2024

yabola commented Feb 11, 2024

github-actions bot commented Oct 14, 2024

github-actions bot commented Oct 22, 2024

yabola commented Feb 10, 2025

yabola commented Feb 8, 2024 •

edited

Loading

yabola commented Feb 8, 2024 •

edited

Loading

amogh-jahagirdar commented Feb 8, 2024 •

edited

Loading