Skip to content

Dev#1499

Merged
phenobarbital merged 24 commits into
masterfrom
dev
Mar 20, 2026
Merged

Dev#1499
phenobarbital merged 24 commits into
masterfrom
dev

Conversation

@phenobarbital
Copy link
Copy Markdown
Owner

@phenobarbital phenobarbital commented Mar 20, 2026

Summary by Sourcery

Refactor the Delta driver’s create/write APIs to fully wrap delta-rs, add comprehensive tests and updated examples, and record the work in SDD specs and completed task docs.

New Features:

  • Expose full delta-rs write_deltalake options through the delta driver’s write() and create() methods, including schema evolution, configuration, predicates, and metadata fields.
  • Add new example scripts demonstrating usage of the Delta, ElasticSearch, and various backend inspection utilities.
  • Introduce workspace and supporting artifacts for local development and Rust-based conversions.

Enhancements:

  • Normalize Polars inputs through Arrow and run blocking Delta writes in background threads to keep the async driver non-blocking.
  • Extend the delta driver write() API with typed parameters, deprecated if_exists handling, and unified storage_options behavior.

Tests:

  • Add tests covering all Delta write modes, schema evolution, configuration/metadata forwarding, predicate overwrites, multiple data types, and create() alignment with the new API.

phenobarbital and others added 24 commits May 9, 2021 18:29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ta-rs API surface

- New write() signature with mode, schema_mode, configuration, predicate,
  partition_by, storage_options, name, description as keyword-only params
- Deprecated if_exists param with DeprecationWarning
- Unified Polars/Pandas/Arrow paths through write_deltalake
- Wrapped blocking write_deltalake in asyncio.to_thread()
- Removed deprecated engine="rust" parameter
- Forward **kwargs for advanced options (writer_properties, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h new write interface

- Added schema_mode, configuration params to create()
- Wrapped write_deltalake in asyncio.to_thread()
- Added Polars->Arrow normalization
- Fixed Path(str)->Path(path) and Path(str)->Path(data) bugs
- Removed engine="rust", omit None values from args

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…a write interface

19 tests covering: all 4 write modes, Pandas/Polars/Arrow input,
schema_mode merge/overwrite, configuration forwarding, predicate
targeted overwrite, if_exists deprecation warning, partition_by,
storage_options fallback, name/description metadata, and create()
with schema_mode and configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added test_write_new_api() demonstrating: mode overwrite, schema_mode
merge, configuration metadata, predicate targeted overwrite, name/
description, backward-compatible if_exists, and partition_by.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
spec status changed to approved by linter
Merged new-driver branch with all FEAT-003 changes:
- TASK-011: Refactored write() with full delta-rs API surface
- TASK-012: Refactored create() method
- TASK-013: Unit tests (19/19 passing)
- TASK-014: Example script updates

All 4 tasks verified and closed.
@phenobarbital phenobarbital merged commit 129efae into master Mar 20, 2026
1 of 2 checks passed
@phenobarbital phenobarbital deleted the dev branch March 20, 2026 01:28
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 20, 2026

Reviewer's Guide

Refactors the Delta driver’s create/write APIs to be thin async wrappers around delta-rs write_deltalake with expanded parameter support, adds comprehensive tests and specs for the new behavior, updates examples, and records completion of several SDD tasks; also adds a few local helper scripts and workspace artifacts.

Sequence diagram for the refactored DeltaDriver.write async flow

sequenceDiagram
    actor Caller
    participant DeltaDriver
    participant AsyncIO as asyncio.to_thread
    participant DeltaRS as write_deltalake

    Caller->>DeltaDriver: write(data, table_id, path, mode, schema_mode, partition_by, configuration, storage_options, predicate, name, description, if_exists, **kwargs)
    activate DeltaDriver

    alt if_exists is not None
        DeltaDriver->>DeltaDriver: warnings.warn("if_exists is deprecated")
        DeltaDriver->>DeltaDriver: mode = if_exists
    end

    DeltaDriver->>DeltaDriver: if isinstance(data, pl.DataFrame): data = data.to_arrow()
    DeltaDriver->>DeltaDriver: destination = Path(path) / table_id
    DeltaDriver->>DeltaDriver: build args dict (mode, schema_mode, partition_by, configuration, predicate, name, description, storage_options or self.storage_options, **kwargs) excluding None

    DeltaDriver->>AsyncIO: to_thread(write_deltalake, destination, data, **args)
    activate AsyncIO
    AsyncIO->>DeltaRS: write_deltalake(destination, data, **args)
    activate DeltaRS
    DeltaRS-->>AsyncIO: success or raises DeltaError/DeltaProtocolError
    deactivate DeltaRS
    AsyncIO-->>DeltaDriver: result or exception
    deactivate AsyncIO

    alt success
        DeltaDriver->>DeltaDriver: self._delta = destination
        DeltaDriver-->>Caller: return None
    else DeltaError or DeltaProtocolError
        DeltaDriver->>DeltaDriver: wrap in DriverError
        DeltaDriver-->>Caller: raise DriverError
    else other Exception
        DeltaDriver->>DeltaDriver: wrap in DriverError
        DeltaDriver-->>Caller: raise DriverError
    end

    deactivate DeltaDriver
Loading

Sequence diagram for the refactored DeltaDriver.create async flow

sequenceDiagram
    actor Caller
    participant DeltaDriver
    participant FS as FileSystem
    participant AsyncIO as asyncio.to_thread
    participant DeltaRS as write_deltalake

    Caller->>DeltaDriver: create(path, data, name, mode, schema_mode, configuration, **kwargs)
    activate DeltaDriver

    alt path is str
        DeltaDriver->>DeltaDriver: path = Path(path).resolve()
    end

    alt data is str (file path)
        DeltaDriver->>DeltaDriver: data = Path(data).resolve()
    end

    alt data is Path (file)
        DeltaDriver->>FS: read file based on suffix (csv, xls/xlsx, parquet)
        FS-->>DeltaDriver: data as DataFrame or Table
    end

    DeltaDriver->>DeltaDriver: if isinstance(data, pl.DataFrame): data = data.to_arrow()
    DeltaDriver->>DeltaDriver: build args dict (mode, name, schema_mode, configuration, **kwargs) excluding None

    DeltaDriver->>AsyncIO: to_thread(write_deltalake, path, data, **args)
    activate AsyncIO
    AsyncIO->>DeltaRS: write_deltalake(path, data, **args)
    activate DeltaRS
    DeltaRS-->>AsyncIO: success or raises DeltaError
    deactivate DeltaRS
    AsyncIO-->>DeltaDriver: result or exception
    deactivate AsyncIO

    alt success
        DeltaDriver-->>Caller: return None
    else DeltaError
        DeltaDriver->>DeltaDriver: wrap in DriverError
        DeltaDriver-->>Caller: raise DriverError
    else other Exception
        DeltaDriver->>DeltaDriver: wrap in DriverError
        DeltaDriver-->>Caller: raise DriverError
    end

    deactivate DeltaDriver
Loading

Updated class diagram for the Delta driver create and write interfaces

classDiagram
    class DeltaDriver {
        +dict storage_options
        +Path _delta
        +async create(path, data, name, mode, schema_mode, configuration, **kwargs) None
        +async write(data, table_id, path, mode, schema_mode, partition_by, configuration, storage_options, predicate, name, description, if_exists, **kwargs) None
    }

    class write_deltalake {
        <<function>>
        +call(path, data, mode, schema_mode, partition_by, configuration, storage_options, predicate, name, description, **kwargs)
    }

    class DeltaError {
        <<exception>>
    }

    class DeltaProtocolError {
        <<exception>>
    }

    class DriverError {
        <<exception>>
    }

    class asyncio_to_thread {
        <<function>>
        +call(func, arg1, arg2, **kwargs)
    }

    class pandas_DataFrame {
        <<data>>
    }

    class polars_DataFrame {
        <<data>>
        +to_arrow() Table
    }

    class pyarrow_Table {
        <<data>>
    }

    DeltaDriver ..> write_deltalake : uses
    DeltaDriver ..> DeltaError : handles
    DeltaDriver ..> DeltaProtocolError : handles
    DeltaDriver ..> DriverError : raises
    DeltaDriver ..> asyncio_to_thread : wraps call
    DeltaDriver ..> pandas_DataFrame : accepts
    DeltaDriver ..> polars_DataFrame : accepts
    DeltaDriver ..> pyarrow_Table : accepts

    polars_DataFrame --> pyarrow_Table : to_arrow() conversion
Loading

File-Level Changes

Change Details Files
Refactor Delta driver create() to support schema/configuration options and run write_deltalake asynchronously.
  • Extended create() signature with schema_mode and configuration parameters plus **kwargs, returning None.
  • Fixed path/data string handling and simplified Excel engine selection.
  • Normalized Polars DataFrame inputs to Arrow tables.
  • Built a filtered argument dict including mode, name, schema_mode, configuration, and extra kwargs.
  • Replaced direct write_deltalake call with asyncio.to_thread and improved error messages.
asyncdb/drivers/delta.py
Refactor Delta driver write() API into a full-featured async wrapper around write_deltalake with backward-compatible if_exists handling.
  • Expanded write() signature to accept Arrow Table inputs, flexible path types, and keyword-only options including mode, schema_mode, partition_by, configuration, storage_options, predicate, name, description, and deprecated if_exists.
  • Issued DeprecationWarning when if_exists is used and mapped it onto mode.
  • Normalized Polars DataFrame inputs via to_arrow for a unified write_deltalake code path.
  • Constructed destination path from root path and table_id and built an args dict that includes partitioning, configuration, predicate, metadata, and storage_options with driver-level defaults.
  • Wrapped write_deltalake in asyncio.to_thread, updated self._delta on success, and preserved DriverError wrapping of delta-rs errors.
asyncdb/drivers/delta.py
Add focused async tests validating the new Delta driver write/create behavior and API surface.
  • Created a driver stub fixture that binds delta.write and delta.create without invoking the full MRO chain.
  • Added tests for write() modes (append, overwrite, ignore, error) including DriverError on error mode when table exists.
  • Verified support for Pandas, Polars, and PyArrow inputs, schema_mode merge/overwrite, and partition_by behavior.
  • Tested configuration metadata, predicate-based targeted overwrite, storage_options fallback to driver defaults, and name/description forwarding.
  • Added tests for create() with Pandas/Polars, schema_mode-driven evolution, and configuration forwarding, plus deprecation behavior of if_exists.
tests/test_delta_write.py
Update the Delta example script to demonstrate the new write() API options.
  • Added an async example function demonstrating the new write() API with overwrite, schema_mode merge, configuration, predicate, name/description, partitioning, and deprecated if_exists usage.
  • Bound write/create methods onto a lightweight delta driver stub for demonstration without full initialization.
  • Printed simple progress markers for each scenario to serve as a runnable reference.
examples/test_delta.py
Document and mark completion of several SDD tasks related to exception migration, tests, and Delta driver refactor.
  • Moved TASK-001, TASK-002, TASK-003 Markdown specs from active to completed with detailed completion notes describing pure-Python exception implementation, removal of Cython artifacts, and unit test coverage.
  • Added FEAT-003 migration spec outlining the Delta driver move to delta-rs and its design, API, risks, and acceptance criteria.
  • Recorded completion details for TASK-010, TASK-011, TASK-012, TASK-013, and TASK-014 in new completed task files, aligning them with the implemented code and tests.
  • Updated the SDD task index JSON (structure not shown in diff) to reflect new/updated specs and tasks.
sdd/specs/migrate-delta-driver.spec.md
sdd/tasks/active/TASK-001-pure-python-exceptions.md
sdd/tasks/active/TASK-002-remove-cython-update-exports.md
sdd/tasks/active/TASK-003-exception-unit-tests.md
sdd/tasks/completed/TASK-001-pure-python-exceptions.md
sdd/tasks/completed/TASK-002-remove-cython-update-exports.md
sdd/tasks/completed/TASK-003-exception-unit-tests.md
sdd/tasks/completed/TASK-010-iceberg-tests-example.md
sdd/tasks/completed/TASK-011-refactor-write-method.md
sdd/tasks/completed/TASK-012-refactor-create-method.md
sdd/tasks/completed/TASK-013-delta-write-tests.md
sdd/tasks/completed/TASK-014-update-delta-example.md
sdd/tasks/.index.json
Add several exploratory scripts and examples for other systems (ArangoDB, ElasticSearch, DocumentDB, rust-based conversions) plus workspace and data artifacts.
  • Added small scripts to inspect python-arango and Arango async support, including package layout, versions, and method sync/async status.
  • Added a script to benchmark a Rust-based rst_convert.todict helper against Python list comprehension and to load results into Pandas.
  • Added an ElasticSearch example using asyncdb.drivers.elastic with hard-coded Navigator cluster credentials and a simple match_all query.
  • Added a Mongo/DocumentDB example script using pymongo, plus helper scripts to inspect Arango client behavior and versions.
  • Committed local workspace, CSVs, CA bundle, Cargo.lock, and a symlink or path to an external python-driver library under libraries/.
examples/tst_rst_convert.py
examples/test_elastic.py
examples/internals/test_db.py
scripts/check_arango_version.py
scripts/find_arango_async_import.py
scripts/inspect_arango_async.py
scripts/inspect_arango_structure.py
scripts/inspect_arango_async_db.py
.claude/worktrees/feat-001-exception-migration
.claude/worktrees/feat-002-apache-iceberg-support
.claude/worktrees/feat-053-ontological-graph-rag
activities.csv
stores.csv
asyncdb.code-workspace
asyncdb/conversions/rst_convert/Cargo.lock
global-bundle.pem
libraries/python-driver

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The new write() implementation unconditionally lets if_exists override mode; this can surprise callers who explicitly pass mode—consider only honoring if_exists when mode is still at its default to match the spec’s intent.
  • The examples/test_elastic.py file contains hard-coded production-like host and password values; these should be removed or replaced with obvious placeholders before merging to avoid credential leakage.
  • Several non-source artifacts appear to have been added (e.g. activities.csv, stores.csv, global-bundle.pem, workspace files, libraries/python-driver, .claude/worktrees/*); please confirm these are intended to live in the repo and otherwise drop or gitignore them.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `write()` implementation unconditionally lets `if_exists` override `mode`; this can surprise callers who explicitly pass `mode`—consider only honoring `if_exists` when `mode` is still at its default to match the spec’s intent.
- The `examples/test_elastic.py` file contains hard-coded production-like host and password values; these should be removed or replaced with obvious placeholders before merging to avoid credential leakage.
- Several non-source artifacts appear to have been added (e.g. `activities.csv`, `stores.csv`, `global-bundle.pem`, workspace files, `libraries/python-driver`, `.claude/worktrees/*`); please confirm these are intended to live in the repo and otherwise drop or gitignore them.

## Individual Comments

### Comment 1
<location path="asyncdb/drivers/delta.py" line_range="165-167" />
<code_context>
         if isinstance(data, str):
-            data = Path(str).resolve()
+            data = Path(data).resolve()
         if isinstance(data, Path):
-            # open this file with Pandas or Arrow
+            # Read file into Arrow/Pandas based on extension
             ext = data.suffix
             if ext == ".csv":
                 read_options = pcsv.ReadOptions()
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider explicitly handling or rejecting unsupported file extensions when `data` is a `Path`.

For `Path` inputs you only special-case `.csv`, Excel, and `.parquet`; for anything else, the `Path` is passed straight to `write_deltalake`, which expects a DataFrame/Arrow object. Please either raise a clear error for unsupported extensions or validate and document the supported set here rather than silently forwarding the `Path`.
</issue_to_address>

### Comment 2
<location path="asyncdb/drivers/delta.py" line_range="451-457" />
<code_context>
         """
-        args = {"mode": if_exists, "engine": "rust", **kwargs}
+        # Handle deprecated if_exists parameter
+        if if_exists is not None:
+            warnings.warn(
+                "if_exists is deprecated, use mode instead",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+            mode = if_exists
+
+        # Normalize Polars DataFrame to Arrow for a unified code path
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Deprecated `if_exists` is passed directly to `mode` without validation.

Because `mode` is now typed as `Literal[...]`, assigning `if_exists` directly can allow legacy callers to pass unexpected strings that bypass this contract and flow into `write_deltalake`. Please validate/normalize `if_exists` against the allowed mode values and raise early on invalid input before assigning it to `mode`.

Suggested implementation:

```python
from pathlib import Path
from typing import Literal, cast

```

```python
        # Handle deprecated if_exists parameter
        if if_exists is not None:
            warnings.warn(
                "if_exists is deprecated, use mode instead",
                DeprecationWarning,
                stacklevel=2,
            )
            # Normalize and validate legacy ``if_exists`` values against supported modes
            allowed_modes: set[str] = {"error", "append", "overwrite", "ignore", "merge"}
            normalized_if_exists = str(if_exists).lower()
            if normalized_if_exists not in allowed_modes:
                raise ValueError(
                    f"Invalid value for deprecated argument 'if_exists': {if_exists!r}. "
                    f"Expected one of: {sorted(allowed_modes)}."
                )
            mode = cast(
                Literal["error", "append", "overwrite", "ignore", "merge"],
                normalized_if_exists,
            )

```

1. If `asyncdb/drivers/delta.py` already imports `Literal` from `typing`, adjust the import edit so you only add `cast` instead of re-importing `Literal`, for example:
   `from typing import Literal, cast` → or extend the existing `from typing import ...` line.
2. Ensure the `mode` parameter’s `Literal[...]` annotation (in the function signature you didn’t show) matches the set `{"error", "append", "overwrite", "ignore", "merge"}`. If your project uses a different subset or naming, update both `allowed_modes` and the `Literal[...]` in the cast to the exact set used elsewhere.
</issue_to_address>

### Comment 3
<location path="examples/tst_rst_convert.py" line_range="7-12" />
<code_context>
+import pandas as pd
+import rst_convert
+
+credentials = {
+    "user": "troc_pgdata",
+    "password": "12345678",
+    "host": "127.0.0.1",
+    "port": "5432",
+    "database": "navigator"
+}
+
</code_context>
<issue_to_address>
**🚨 issue (security):** Avoid committing hard-coded database credentials in example scripts.

This example uses literal user/password/host/database values. Even if these are non-sensitive now, it normalizes committing credentials and raises the risk of real ones being pushed. Please load connection details from environment variables or a config file with safe defaults instead of embedding them directly in code.
</issue_to_address>

### Comment 4
<location path="tests/test_delta_write.py" line_range="56-65" />
<code_context>
+# --- Backward compatibility: if_exists deprecation ---
+
+
+@pytest.mark.asyncio
+async def test_write_if_exists_deprecated(driver, sample_pandas_df, tmp_path):
+    """write() with if_exists emits DeprecationWarning and still works."""
+    with warnings.catch_warnings(record=True) as w:
+        warnings.simplefilter("always")
+        await driver.write(sample_pandas_df, "t1", tmp_path, if_exists="append")
+        dep_warnings = [x for x in w if issubclass(x.category, DeprecationWarning)]
+        assert len(dep_warnings) == 1
+        assert "if_exists is deprecated" in str(dep_warnings[0].message)
+
+    dt = DeltaTable(str(tmp_path / "t1"))
+    assert dt.to_pandas().shape[0] == 3
+
+
+# --- create() tests ---
+
+
+@pytest.mark.asyncio
+async def test_create_pandas(driver, sample_pandas_df, tmp_path):
+    """create() creates a Delta table from a Pandas DataFrame."""
+    dest = tmp_path / "new_table"
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for create() using file-based inputs and string paths to cover the new path-handling logic.

The refactored `create()` now normalizes string `path` values and branches on file-type when `data` is a `str`/`Path` (CSV/Excel/Parquet). Current tests only cover in-memory inputs and `Path` destinations. Please add tests that:

1. Use `path` as a string and verify the resulting Delta table is created correctly.
2. Use `data` as a CSV file path written to `tmp_path` and assert the resulting table contents.
3. Optionally repeat for Parquet and one Excel format (e.g. `.xlsx`) to ensure the correct readers are invoked and the new branches are covered.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread asyncdb/drivers/delta.py
Comment on lines 165 to 167
if isinstance(data, Path):
# open this file with Pandas or Arrow
# Read file into Arrow/Pandas based on extension
ext = data.suffix
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider explicitly handling or rejecting unsupported file extensions when data is a Path.

For Path inputs you only special-case .csv, Excel, and .parquet; for anything else, the Path is passed straight to write_deltalake, which expects a DataFrame/Arrow object. Please either raise a clear error for unsupported extensions or validate and document the supported set here rather than silently forwarding the Path.

Comment thread asyncdb/drivers/delta.py
Comment on lines +451 to +457
if if_exists is not None:
warnings.warn(
"if_exists is deprecated, use mode instead",
DeprecationWarning,
stacklevel=2,
)
mode = if_exists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Deprecated if_exists is passed directly to mode without validation.

Because mode is now typed as Literal[...], assigning if_exists directly can allow legacy callers to pass unexpected strings that bypass this contract and flow into write_deltalake. Please validate/normalize if_exists against the allowed mode values and raise early on invalid input before assigning it to mode.

Suggested implementation:

from pathlib import Path
from typing import Literal, cast
        # Handle deprecated if_exists parameter
        if if_exists is not None:
            warnings.warn(
                "if_exists is deprecated, use mode instead",
                DeprecationWarning,
                stacklevel=2,
            )
            # Normalize and validate legacy ``if_exists`` values against supported modes
            allowed_modes: set[str] = {"error", "append", "overwrite", "ignore", "merge"}
            normalized_if_exists = str(if_exists).lower()
            if normalized_if_exists not in allowed_modes:
                raise ValueError(
                    f"Invalid value for deprecated argument 'if_exists': {if_exists!r}. "
                    f"Expected one of: {sorted(allowed_modes)}."
                )
            mode = cast(
                Literal["error", "append", "overwrite", "ignore", "merge"],
                normalized_if_exists,
            )
  1. If asyncdb/drivers/delta.py already imports Literal from typing, adjust the import edit so you only add cast instead of re-importing Literal, for example:
    from typing import Literal, cast → or extend the existing from typing import ... line.
  2. Ensure the mode parameter’s Literal[...] annotation (in the function signature you didn’t show) matches the set {"error", "append", "overwrite", "ignore", "merge"}. If your project uses a different subset or naming, update both allowed_modes and the Literal[...] in the cast to the exact set used elsewhere.

Comment on lines +7 to +12
credentials = {
"user": "troc_pgdata",
"password": "12345678",
"host": "127.0.0.1",
"port": "5432",
"database": "navigator"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 issue (security): Avoid committing hard-coded database credentials in example scripts.

This example uses literal user/password/host/database values. Even if these are non-sensitive now, it normalizes committing credentials and raises the risk of real ones being pushed. Please load connection details from environment variables or a config file with safe defaults instead of embedding them directly in code.

Comment thread tests/test_delta_write.py
Comment on lines +56 to +65
@pytest.mark.asyncio
async def test_write_append(driver, sample_pandas_df, tmp_path):
"""write() with mode='append' creates table then appends rows."""
await driver.write(sample_pandas_df, "t1", tmp_path, mode="append")
await driver.write(sample_pandas_df, "t1", tmp_path, mode="append")
dt = DeltaTable(str(tmp_path / "t1"))
result = dt.to_pandas()
assert len(result) == 6


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for create() using file-based inputs and string paths to cover the new path-handling logic.

The refactored create() now normalizes string path values and branches on file-type when data is a str/Path (CSV/Excel/Parquet). Current tests only cover in-memory inputs and Path destinations. Please add tests that:

  1. Use path as a string and verify the resulting Delta table is created correctly.
  2. Use data as a CSV file path written to tmp_path and assert the resulting table contents.
  3. Optionally repeat for Parquet and one Excel format (e.g. .xlsx) to ensure the correct readers are invoked and the new branches are covered.

phenobarbital added a commit that referenced this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant