Skip to content

Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262

Closed
@liamphmurphy

Description

@liamphmurphy

Environment

Delta-rs version: python v0.16

Binding: ^^

Environment:

  • Cloud provider: AWS s3 with dynamo

Bug

What happened:

To test the rust engine, we cleared out any existing delta tables in our nonprod environment and switched from pyarrow over to the rust engine with schema merging, with this write_deltalake call:

 write_deltalake(s3_path, table, schema=pyarrow_schema, mode="append", engine="rust", partition_by=["Uid","date","hour"], schema_mode="merge", configuration={"delta.logRetentionDuration": "interval 7 day"})

Despite it being a brand new Delta table and after some successful writes, eventually the lambdas started erroring with Generic DeltaTable error: Version mismatch. I believe the error is coming from here:

return Err(DeltaTableError::Generic("Version mismatch".to_string()));

What you expected to happen:

Especially since we are testing with a fresh table, I'd expect all writes to work (and not just some) even with the new schema merge flag set.

How to reproduce it:
I was not able to reproduce with a randomly generated dataset locally, so my guess is its something more to do with the dynamo locking on S3 If you have thoughts on how I could test this better, please let me know.

Note that we have roughly 10 concurrent lambdas that could potentially write to Lambda. However, before this change we had 50 writing with pyarrow and all was well.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstorage/awsAWS S3 storage related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions