Skip to content

Can get wrong results when querying Delta tables #1121

@Kimahriman

Description

@Kimahriman

Describe the bug

Because Delta scans work by using a subclass of ParquetFileFormat within a normal Hadoop relation, Comet will see this and simply replace it with a CometParquetFileFormat, losing all the Delta-specific things in it's own subclass, such as deletion vector support and column mapping

Steps to reproduce

Don't have exact steps right now, noticed this randomly while testing things out and kind of expected this to be a problem.

Expected behavior

If doing a Delta scan, it should not be eligible to be converted to a CometScan. The check for ParquetFileFormat probably needs to be an exact class comparison that doesn't include subclasses.

Longer term it would be interesting if it is possible to delegate the necessary behavior to custom file formats, but all the work trying to push down the Parquet scans to datafusion might make that impossible unless a different approach like using delta-rs directly is used such as in #174

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions