Skip to content

Backblaze B2 S3-Compatible API: Lance Dataset Versioning Support Issue #5614

@brumocas

Description

@brumocas

Summary

Backblaze B2's S3-compatible API returns 501 Not Implemented when Lance attempts to write version manifest files to the _versions/ directory, preventing direct writes to B2 storage.
Also reported at: Backblaze/b2-sdk-python#558

Error

OSError: LanceError(IO): Generic S3 error: Error performing PUT 
https://s3.eu-central-003.backblazeb2.com/ml-datasets-misc/agit-synth-datasets/datasets.lance/_versions/1.manifest
- Server returned non-2xx status code: 501 Not Implemented

Request: PUT https://s3.eu-central-003.backblazeb2.com/ml-datasets-misc/agit-synth-datasets/datasets.lance/_versions/1.manifest
Response: 501 Not Implemented

Problem

Lance (a columnar data format for ML/AI) uses a versioned dataset structure:

  • _versions/1.manifest - Version history files
  • _latest.manifest - Points to current version
  • data/ - Data fragments

When writing to S3-compatible storage, Lance needs to write to _versions/ directory, but Backblaze B2 returns 501 Not Implemented for this operation.

Minimal Reproduction

import lance
import pyarrow as pa

storage_options = {
    "endpoint": "https://s3.eu-central-003.backblazeb2.com",
    "access_key_id": "YOUR_KEY_ID",
    "secret_access_key": "YOUR_SECRET_KEY",
    "region": "eu-central-003",
    "virtual_hosted_style_request": "false",
}

table = pa.Table.from_pylist([{"id": 1, "name": "test"}])
s3_path = "s3://bucket-name/dataset.lance"

# Fails with 501 when writing _versions/1.manifest
lance.write_dataset(table, s3_path, mode="create", storage_options=storage_options)

Impact

  • Blocks: Direct writes to B2 using Lance's native S3 support
  • Workaround: Must write locally, then upload entire directory structure manually
  • Affects: ML data pipelines, real-time ingestion, distributed systems using Lance

Request

Please clarify:

  1. Is this a known limitation? Timeline for support?
  2. Is there a configuration to enable _versions/ directory writes?
  3. Are specific S3 operations blocked that need to be enabled?

Environment

  • Endpoint: s3.eu-central-003.backblazeb2.com
  • Lance: 1.0.0
  • Python: 3.11

Kind regards,
Bruno Costa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions