feat: basic read/write operations for Delta Lake #578

lonless9 · 2025-06-30T07:09:30Z

part of #171

Delta table read/write operations in Spark SQL and DataFrame APIs.

…a-lake

…rces

…ing tables.

…nResolver

linhr

This is amazing!! Great work!!! 🚀

Cargo.toml

crates/sail-execution/Cargo.toml

crates/sail-delta-lake/src/delta_datafusion/schema_adapter.rs

crates/sail-delta-lake/src/operations/load.rs

linhr · 2025-07-17T08:51:39Z

crates/sail-delta-lake/src/delta_datafusion/mod.rs

+        let mut all_batches = Vec::new();
+        let mut total_rows = 0u64;
+
+        // Execute all partitions and collect the data


This is how the existing implementation collects all data in memory and writes all data using a single process. It would be much more scalable if the writer tasks are distributed and ingest data in a streaming fashion.

(This is just a note to explain the future work.)

linhr · 2025-07-17T08:53:57Z

crates/sail-delta-lake/src/delta_datafusion/mod.rs

+}
+
+#[async_trait]
+impl TableProvider for DeltaTableProvider {


WriterBuilder is used here in insert_into(), but we have similar writer logic in DeltaDataSink while we use TableProvider only for reading. Is my understanding correct?

python/pysail/tests/spark/test_deltalake.py

linhr · 2025-07-17T09:04:14Z

python/pysail/tests/spark/test_deltalake.py

+# Test constants
+YEAR_2025 = 2025
+YEAR_2026 = 2026
+EXPECTED_RESULT_COUNT = 2


I think it's Ok to not define constants here. Using the literal values directly in the tests could make them more readable.

Hatch fmt complains about it, I'll fix this.

Oh I see. The Python linter can get annoying sometimes, especially in tests. You can simply bypass a certain rule for a particular line via # noqa: <RULE> comments (where <RULE> is the rule in violation).

lonless9 added 19 commits June 30, 2025 15:05

init delta lake

a187242

setup crate

396c95d

add provider

8a48f05

integrate deltalake support into sail-common-datafusion and sail-delt…

96e0341

…a-lake

add datafusion-execution support to sail-delta-lake for now

86ce5b1

fix error handling

787729b

simplify partition

77eb41f

filter pushdown capabilities

dbfcdb7

Implement Delta Lake support in PlanResolver for single path data sou…

38334a1

…rces

clippy

f3d2498

table logic

66f0694

open before create

5d07a7f

revert session

e3a1490

update the return logic to provide a TableScan logical plan for exist…

09fdb52

…ing tables.

utilize ObjectStore for Delta table creation and reading

ea83a06

refactor

531bb16

defining unimpl

0014e95

adapter

dd99062

wip

d09d724

lonless9 self-assigned this Jun 30, 2025

lonless9 changed the title ~~feat: Delta Lake integration~~ feat: delta lake integration Jun 30, 2025

lonless9 added 9 commits June 30, 2025 16:09

add CDF loading

f1a946b

fmt

daf87dd

error type

9860b0f

arrow schema handling and DeltaScan configuration

8995ba3

DeltaTableProvider with scanning capabilities

a3aebc6

data catalog

484c59d

Implement DeltaTableFactory for creating external Delta tables in Pla…

3dc05f1

…nResolver

integrate CreateExternalTable

dfe6656

impl table creation and object store reg

e0cd133

lonless9 added 18 commits July 16, 2025 10:24

unify traits

59fcd85

use default

d648790

consume option

3cfb82b

refactor datasource

57f01d2

add tests

18a1b38

fmt

9550054

remove factory

c6dba90

refactor ListingFormat trait

886adc9

refactor usage

20afbc5

refactor create_format_factory

04588f7

update test

efa0a79

address comment

427cce1

remove pub

6bdf985

Merge branch 'data-source' into delta-lake-integration

d352881

Squashed commit

ecbe81e

Merge branch 'main' into delta-lake-integration

3f67837

remove dbg

9ef0556

option placeholder

dc8076b

lonless9 marked this pull request as ready for review July 17, 2025 06:10

lonless9 requested a review from linhr July 17, 2025 06:33

lonless9 changed the title ~~feat: delta lake integration~~ feat: delta lake Read/Write operations Jul 17, 2025

linhr approved these changes Jul 17, 2025

View reviewed changes

linhr changed the title ~~feat: delta lake Read/Write operations~~ feat: basic read/write operations for Delta Lake Jul 17, 2025

lonless9 added 2 commits July 17, 2025 18:38

address comments

66b3e43

fmt

ead8c8c

linhr approved these changes Jul 17, 2025

View reviewed changes

lonless9 merged commit c2c28fb into main Jul 17, 2025
15 checks passed

lonless9 deleted the delta-lake-integration branch July 17, 2025 13:07

keen85 mentioned this pull request Aug 5, 2025

feat: support sail/datafusion as engine mwc360/LakeBench#35

Closed

lonless9 mentioned this pull request Aug 29, 2025

[Epic] Full Delta Lake Integration #171

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: basic read/write operations for Delta Lake #578

feat: basic read/write operations for Delta Lake #578

Uh oh!

lonless9 commented Jun 30, 2025 •

edited

Loading

Uh oh!

linhr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linhr Jul 17, 2025

Uh oh!

linhr Jul 17, 2025

Uh oh!

Uh oh!

linhr Jul 17, 2025

Uh oh!

lonless9 Jul 17, 2025

Uh oh!

linhr Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: basic read/write operations for Delta Lake #578

feat: basic read/write operations for Delta Lake #578

Uh oh!

Conversation

lonless9 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linhr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linhr Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

linhr Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linhr Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

lonless9 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

linhr Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lonless9 commented Jun 30, 2025 •

edited

Loading