Add support for `InMemoryRelation` #137

sunchao · 2024-02-29T17:44:07Z

What is the problem the feature request solves?

Currently Comet cannot be triggered if Spark users read data from cached RDD. To support this use case, we'll need to add support for Spark's InMemoryRelation.

It looks like we may need to implement Arrow for CachedBatchSerializer.

Describe the potential solution

Add Comet support for InMemoryRelation, so that Spark query starts from cached RDD can also use Comet native execution.

Additional context

It is not a priority as of now, but will be something good to have in future.

The text was updated successfully, but these errors were encountered:

advancedxy · 2024-03-01T11:18:14Z

Another way to read InMemoryRelation is to wrapped it with an CometRowToColumnarExec like I proposed in #119

sunchao · 2024-03-01T16:32:18Z

@advancedxy Yea, CometRowToColumnarExec could be a more general solution, not only for InMemoryRelation, but also for other types of data sources like CSV, JSON, etc. The advantage of implementing Arrow for CachedBatchSerializer here is that we can avoid the extra cost from row to columnar conversion, and potentially be more space efficient because of better compression.

advancedxy · 2024-03-01T16:49:23Z

The advantage of implementing Arrow for CachedBatchSerializer here is that we can avoid the extra cost from row to columnar conversion, and potentially be more space efficient because of better compression.

Yea, of course. I can get the rational. We could always add specialized operators to improve performance as long as it's worth the effort and there's interest to implement it.

sunchao added the enhancement New feature or request label Feb 29, 2024

advancedxy mentioned this issue Mar 15, 2024

feat: Add CometRowToColumnar operator #206

Merged

sunchao closed this as completed in #206 Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `InMemoryRelation` #137

Add support for `InMemoryRelation` #137

sunchao commented Feb 29, 2024

advancedxy commented Mar 1, 2024

sunchao commented Mar 1, 2024

advancedxy commented Mar 1, 2024

Add support for InMemoryRelation #137

Add support for InMemoryRelation #137

Comments

sunchao commented Feb 29, 2024

What is the problem the feature request solves?

Describe the potential solution

Additional context

advancedxy commented Mar 1, 2024

sunchao commented Mar 1, 2024

advancedxy commented Mar 1, 2024

Add support for `InMemoryRelation` #137

Add support for `InMemoryRelation` #137