Skip to content

Conversation

@jackye1995
Copy link
Contributor

@jackye1995 jackye1995 commented Dec 23, 2025

Make fixes for the following breaking changes in 0.4.0:

  1. introduced full error handling spec, update rust interface to implement the spec, and also dir and rest implementations in rust, python, java based on it
  2. fix that create_empty_table is deprecated and declare_table is introduced. We will start to use declare_table, and mark create_empty_table as deprecated and will be removed in 3.0.0. If declare_table fails, it currently falls back to create_empty_table.
  3. we add deregister_table support for dir namespace (without manifest), by adding a .lance-deregistered file. So when checking table existence, we check (1) the table directory exists, (2) if table directory exists, the .lance-deregistered file does not exist. This allows user to deregister a table immediately without deleting data which could be long running.

@github-actions github-actions bot added enhancement New feature or request python java labels Dec 23, 2025
@github-actions
Copy link
Contributor

Code Review

Summary

This PR upgrades lance-namespace to 0.4.0, implementing the new error handling spec and transitioning from create_empty_table to declare_table.

Critical Issues (P0/P1)

P1: Missing tests for new declare_table and deregister_table implementations

The PR adds significant new functionality in dir.rs and manifest.rs:

  • declare_table implementation for both DirectoryNamespace and ManifestNamespace
  • deregister_table implementation for V1 mode (using .lance-deregistered marker files)
  • Skip logic for deregistered tables in list_tables

However, I don't see corresponding tests for these new code paths. Per the project's review guidelines: "We do not merge code without tests."

Specifically, please add tests for:

  1. declare_table in both V1 (DirectoryNamespace) and manifest mode
  2. deregister_table in V1 mode
  3. list_tables skipping deregistered tables
  4. describe_table and table_exists returning errors for deregistered tables

P1: Race condition in deregistration check

In dir.rs, operations like describe_table first check table_exists via reserved file, then separately check is_table_deregistered. Between these checks, another process could deregister the table. Consider combining these checks or using atomic operations.

Minor Observations

  • The error module in lance-namespace/src/error.rs is well-structured with comprehensive error codes and good test coverage.
  • The Python/Java binding implementations for error conversion look correct.
  • The deprecation of create_empty_table in favor of declare_table is handled properly with the #[deprecated] attribute.

@github-actions
Copy link
Contributor

Code Review: feat: upgrade lance-namespace to 0.4.0

Summary

This PR upgrades lance-namespace to 0.4.0, implementing the new error handling spec and introducing declare_table as a replacement for the deprecated create_empty_table. It also adds deregister_table support for directory namespace without manifest.

P1 Issues

1. Race condition in deregister_table for directory namespace (rust/lance-namespace-impls/src/dir.rs:1167-1174)

The sequence of checking if a table exists, then checking if it's deregistered, then creating the marker file is not atomic. Between the check and the write, another process could perform the same operation, leading to potential inconsistencies. While the PR description acknowledges this, consider at minimum using an atomic put-if-not-exists pattern for the marker file creation if the object store supports it.

2. Silent failure masking in is_table_deregistered (rust/lance-namespace-impls/src/dir.rs:512-517)

async fn is_table_deregistered(&self, table_name: &str) -> bool {
    ...
    .unwrap_or(false)
}

This silently treats object store errors (network issues, permission problems) as "not deregistered", which could lead to operations succeeding on tables that should be inaccessible. Consider propagating the error or at least logging it.

3. Missing test coverage for deregister_table in non-manifest mode

The deregister_table implementation for directory namespace without manifest is a significant new feature but I don't see corresponding test coverage in the diff. Tests should verify:

  • Successful deregistration creates the marker file
  • Deregistered tables are excluded from list_tables
  • describe_table fails for deregistered tables
  • Double-deregistration returns an appropriate error

Minor Observations

  • The fallback pattern in Java's WriteDatasetBuilder catching UnsupportedOperationException is correct for the deprecation path.
  • Error code consistency across Rust/Python/Java implementations looks good.
  • The ErrorCode::from_u32 returning None for unknown codes with fallback to Internal in NamespaceError::from_code is a reasonable approach.

@github-actions
Copy link
Contributor

Code Review: PR #5568 - Upgrade lance-namespace to 0.4.0

I reviewed this PR focusing on the most critical aspects. Overall, the implementation looks well-structured with comprehensive test coverage.

P0 Issues

None found.

P1 Issues

  1. Potential Bug in table_exists without manifest (rust/lance-namespace-impls/src/dir.rs:834-862)

    The diff shows the table_exists function uses check_table_status() for atomic checks when manifest is disabled. However, when the manifest fallback logic fails at line 838, the function falls through to do a directory-only check that does NOT use the atomic check_table_status() method. This means deregistered tables won't be properly filtered in this fallback path.

    The code at line 847 uses read_dir(table_path) directly without checking the .lance-deregistered marker file, which inconsistently handles the V1 deregistration semantics.

    Recommendation: Ensure the fallback path also uses check_table_status() and properly checks is_deregistered.

  2. Error Code Consistency in JNI Layer (java/lance-jni/src/error.rs:104-189)

    The throw_namespace_exception function uses unsafe JNI calls. While necessary for JNI interop, the fallback error handling (line 102-107) silently falls back to RuntimeException if the specialized exception class is not found. This could cause issues during development/debugging if the Java exception classes are misnamed or missing.

    Recommendation: Consider logging a warning when falling back to RuntimeException so developers can detect misconfigurations.

Observations

  • Good use of atomic TableStatus checks via check_table_status() to avoid TOCTOU race conditions
  • Comprehensive test coverage for the new deregistration feature (V1 mode tests)
  • Proper deprecation handling for create_empty_table -> declare_table
  • The new error handling spec with ErrorCode enum provides good cross-language consistency

@github-actions
Copy link
Contributor

Code Review: feat: upgrade lance-namespace to 0.4.0

This PR upgrades lance-namespace to 0.4.0 with new error handling, declare_table API, and deregister_table support for V1 mode. Overall, the implementation is well-structured with comprehensive tests.

P0 Issues (Must Fix)

None identified.

P1 Issues (Should Fix)

  1. Race condition in check_table_status fallback path (rust/lance-namespace-impls/src/dir.rs:533-540)

    The fallback path when read_dir fails silently returns TableStatus { exists: false, ... }. This could mask legitimate I/O errors (e.g., permission denied, network timeout) and cause incorrect behavior where operations proceed as if the table doesn't exist when it actually does.

    Consider propagating the error or at least distinguishing between "directory doesn't exist" vs other I/O errors:

    Err(e) if is_not_found_error(&e) => TableStatus { exists: false, ... },
    Err(e) => return Err(e),  // Propagate other errors
  2. Potential for stale marker files (deregister_table in V1 mode)

    If a table is deregistered via .lance-deregistered marker but the physical table data is later deleted and recreated, the marker file could persist and block access to the new table. Consider documenting this behavior or adding cleanup logic.

  3. Missing declare_table in LanceNamespace trait (rust/lance-namespace/src/namespace.rs)

    The declare_table method is added with a default "not implemented" response. The trait should document that this is the preferred method over the deprecated create_empty_table. The deprecation warning is present but could be more visible in the trait docs.

Minor Observations (Non-blocking)

  • The Python fallback from declare_table to create_empty_table (python/python/lance/dataset.py:5532-5551) catches AttributeError which may mask unrelated attribute access issues. Consider being more specific.

  • Good test coverage for the new deregistration and declaration features.

  • The atomic check_table_status approach via single directory listing is a good pattern for avoiding TOCTOU issues.

@github-actions
Copy link
Contributor

PR Review: feat: upgrade lance-namespace to 0.4.0

Summary

This PR upgrades lance-namespace to 0.4.0, introducing error handling improvements, the new declare_table API (replacing deprecated create_empty_table), and deregister_table support for directory namespaces without manifest via .lance-deregistered marker files.

Potential Issues (P0/P1)

P1: Race condition in deregistration check for list_directory_tables

In rust/lance-namespace-impls/src/dir.rs, the list_directory_tables method iterates through directories and calls is_table_deregistered for each table:

for entry in entries {
    // ...
    let table_name = &path[..path.len() - 6];
    // Skip deregistered tables
    if self.is_table_deregistered(table_name).await {
        continue;
    }
    tables.push(table_name.to_string());
}

This makes a separate exists() call for each table, while other methods like describe_table and table_exists use the atomic check_table_status function. For consistency and to avoid potential race conditions, consider using the same atomic check approach here. While this is a minor concern for listing (worst case is briefly inconsistent list), it would be better to have consistent behavior.

P1: Python fallback exception handling may be too broad

In python/python/lance/dataset.py:5530-5546, the fallback catches NotImplementedError and AttributeError:

except (UnsupportedOperationError, NotImplementedError, AttributeError):
    # Fall back to deprecated create_empty_table

Catching AttributeError could mask legitimate bugs where the namespace object is misconfigured or methods are being called incorrectly. Consider being more specific about what AttributeError cases are acceptable to catch (e.g., only when declare_table method doesn't exist).

Other Observations

  • Good use of TableStatus struct for atomic table state checking
  • Comprehensive test coverage for the new deregister_table V1 mode functionality
  • Proper deprecation warnings for create_empty_table
  • Error code spec implementation looks solid with consistent codes across Rust/Python/Java

@github-actions
Copy link
Contributor

Code Review: Upgrade lance-namespace to 0.4.0

I reviewed this PR which upgrades the lance-namespace library to 0.4.0 with new error handling, declare_table API, and deregister_table support.

P0/P1 Issues

1. [P1] Missing declare_table implementation in Python namespace.rs

The Python bindings in python/src/namespace.rs add deregister_table method to PyDirectoryNamespace and PyRestNamespace, but do not expose declare_table. The Python dataset.py expects a declare_table method on the namespace object, but it will only work through the Python lance-namespace library wrapper, not through the native Rust bindings. This could cause confusion or failures when using the native namespace implementations directly.

2. [P1] Error handling mismatch in Python error.rs

The current python/src/error.rs does not handle the LanceError::Namespace variant. The PR description mentions implementing the error spec, but the Python error conversion code only handles generic cases and does not import or use NamespaceError like the Java bindings do. Namespace errors will fall through to runtime_error() without proper error code propagation.

Compare with Java which properly extracts the NamespaceError code:

LanceError::Namespace { source, .. } => {
    if let Some(ns_err) = source.downcast_ref::<NamespaceError>() {
        Self::namespace_error(ns_err.code().as_u32(), ns_err.to_string())
    } else {
        Self::runtime_error(err.to_string())
    }
}

3. [P1] Potential race condition in Python fallback logic

In python/python/lance/dataset.py, the fallback from declare_table to create_empty_table catches UnsupportedOperationError and NotImplementedError, but the check if hasattr(namespace, "declare_table") happens before the try block. If another exception occurs during declare_table, it will propagate unexpectedly rather than falling back.

Minor Observations

  • The PR description mentions adding .lance-deregistered file support for directory namespace deregistration without manifest, but I could not find this implementation in the diff. The deregister_table method in dir.rs:1036-1049 returns NotSupported when manifest is disabled. Please verify this matches the intended behavior.

  • The deprecation path looks correct: create_empty_table marked with #[allow(deprecated)] in JNI code, with fallback to deprecated API when declare_table is not available.

  • Test coverage for deregister_table is present in both dir.rs and rest_adapter.rs.

Summary

The core functionality is well-implemented. The main concerns are consistency between Java and Python error handling, and ensuring the Python bindings expose all necessary namespace methods.

@github-actions
Copy link
Contributor

PR Review: feat: upgrade lance-namespace to 0.4.0

This PR upgrades the lance-namespace dependency to 0.4.0 and implements full error handling, the new declare_table API (with deprecation of create_empty_table), and deregister_table support for directory namespaces.

P0/P1 Issues

No critical issues found. The implementation is well-structured with proper error handling across Rust, Python, and Java bindings.

Minor Observations (non-blocking)

  1. Fallback Logic for declare_table: The Java and Python implementations fall back to deprecated create_empty_table when declare_table fails. This is a reasonable approach for backward compatibility, but ensure the fallback path is tested. The PR description mentions this is intended for older namespace implementations.

  2. Deregister File Approach: The .lance-deregistered marker file for soft-deleting tables without manifest mode is a practical design choice that avoids long-running delete operations. The existence check in table_exists correctly handles both the directory existence and the absence of this marker file.

  3. Test Coverage: The diff includes test additions in Java (NamespaceIntegrationTest.java) which covers the new functionality.

Summary

The implementation cleanly separates the error handling spec implementation, API migration from create_empty_table to declare_table, and the new deregistration capability. The cross-language bindings (Rust, Python, Java) are consistent in their approach to error propagation and the new namespace error types.

LGTM ✓

@github-actions
Copy link
Contributor

Code Review: feat: upgrade lance-namespace to 0.4.0

Summary

This PR upgrades to lance-namespace 0.4.0, adding error handling spec support, introducing declare_table (deprecating create_empty_table), and implementing deregister_table for directory namespaces via .lance-deregistered marker files.

P0/P1 Issues

1. Race Condition in check_table_status (P1)

The method at rust/lance-namespace-impls/src/dir.rs:532-550 performs a single directory listing, but there is a race window between checking status and performing operations. For example, in declare_table, the status check and file creation are separate operations. While this is inherent to filesystem-based operations, consider using put_if_not_exists (if available in the object store API) for atomic creation of the reserved file.

2. Missing re-registration path (P1)

The deregister_table implementation creates a .lance-deregistered marker file, but there is no corresponding way to re-register (remove the marker). This could leave users stuck if they accidentally deregister a table. Consider adding a register_table implementation for V1 mode that removes the deregistered marker.

3. Error code downcast in Java/Python may fail silently (P0)

In java/lance-jni/src/error.rs:213-218 and python/src/error.rs:69-74, the code downcasts source to NamespaceError. If the NamespaceError type comes from a different crate version or is wrapped differently, the downcast could fail and fall back to a generic error, losing the structured error code. This could break client-side error handling. Consider adding logging when the downcast fails to aid debugging.

4. check_table_status visibility (P1)

The method check_table_status at rust/lance-namespace-impls/src/dir.rs:536 appears to be public based on test usage. If intended as public API, it needs documentation. If an internal helper, consider making it private or pub(crate).

Minor Observations

  • Good backward compatibility handling in Python with deprecation warnings
  • Good test coverage for V1 deregistration and declaration
  • The atomic status check pattern (TableStatus struct) improves over separate existence checks

Generated with Claude Code

@github-actions
Copy link
Contributor

PR Review: feat: upgrade lance-namespace to 0.4.0

This PR upgrades the lance-namespace-reqwest-client dependency to version 0.4.0 and implements several changes in the namespace API including error handling improvements and new declare_table/deregister_table functionality.

Summary

The changes look well-structured overall with good test coverage. I have one P1 concern to highlight:


P1: Missing declare_table implementation and fallback logic

The PR description mentions:

create_empty_table is deprecated and declare_table is introduced. We will start to use declare_table, and mark create_empty_table as deprecated. If declare_table fails, it currently falls back to create_empty_table.

However, I don't see the fallback logic implemented in the Rust namespace trait or Python/Java bindings. The Java JNI code adds declareTableNative methods but:

  1. The LanceNamespace trait in rust/lance-namespace/src/namespace.rs doesn't define a declare_table method with fallback behavior
  2. The ManifestNamespace and DirectoryNamespace implementations only have create_empty_table
  3. The deprecation annotation (#[deprecated] or #[allow(deprecated)]) is only applied to the JNI wrapper but not the trait method

If this fallback behavior is meant to happen at the REST/client layer (in the external lance-namespace-reqwest-client 0.4.0 crate), that's fine. But if it's supposed to be in this repository, the implementation appears incomplete.

Recommendation: Please clarify if the declare_table -> create_empty_table fallback is implemented in the external crate or should be added here. If it's external, consider adding a code comment documenting this behavior for maintainability.


Minor observations (not blocking)

  1. Error code propagation: The Java JNI error handling in java/lance-jni/src/error.rs has good downcast logic for NamespaceError to extract error codes. The panic! on missing exception class is appropriate since the namespace dependency is bundled.

  2. Python error handling: The changes to python/src/error.rs properly extend PythonErrorExt to handle namespace errors. The implementation uses the same pattern as other error types.

  3. Test coverage: The PR includes tests for the new deregister_table functionality in both test_namespace_dir.py and test_namespace_rest.py.


🤖 Generated with Claude Code

@github-actions
Copy link
Contributor

PR Review: feat: upgrade lance-namespace to 0.4.0

Summary

This PR upgrades the lance-namespace dependency to 0.4.0 and implements several breaking changes from that version: a full error handling spec, deprecating create_empty_table in favor of declare_table, and adding deregister_table support for directory namespaces.

Review

No P0/P1 issues identified. The implementation is well-structured with good test coverage.

Minor Observations (non-blocking):

  1. Version bump direction: The Cargo.toml shows version changes from 2.0.0-beta.5 to 2.0.0-beta.4 and lance-namespace-reqwest-client from 0.3.1 to 0.4.0. The internal crate version decrease appears intentional (likely a reversion from a prior bump), but worth confirming this is expected.

  2. Deregistration mechanism (dir.rs:1180-1194): The deregister_table implementation correctly delegates to manifest namespace when enabled and returns NotSupported otherwise. The manifest implementation properly removes the entry from manifest while leaving physical data intact - this is the expected behavior for deregistration.

  3. Test coverage: The manifest.rs tests are parameterized with #[case::with_optimization(true/false)] for inline optimization, providing good coverage of both code paths.

  4. Register table validation (manifest.rs:1649-1678): Good security checks for path traversal (..) and absolute paths in register_table.

Overall, the changes are clean and well-tested. The integration with the 0.4.0 namespace spec appears complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant