Skip to content

fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#18

Open
tomerqodo wants to merge 6 commits intocoderabbit_combined_20260121_augment_sentry_coderabbit_1_base_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr559from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr559
Open

fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#18
tomerqodo wants to merge 6 commits intocoderabbit_combined_20260121_augment_sentry_coderabbit_1_base_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr559from
coderabbit_combined_20260121_augment_sentry_coderabbit_1_head_fixprefect-gcp_prevent_double-nesting_in_gcsbucket_resolve_path_pr559

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Jan 22, 2026

Benchmark PR from qodo-benchmark#559

Summary by CodeRabbit

  • Bug Fixes

    • Resolved redundant prefixing of bucket folders in storage paths.
    • Added validation to prevent bucket folder from matching the bucket name.
  • Tests

    • Added regression test for path handling correctness.
  • Chores

    • Updated test environment warning filters and timeout configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

zzstoatzz and others added 6 commits January 21, 2026 15:46
When storage_block_id is null (e.g., context serialized to Ray workers),
create_result_record() adds bucket_folder to storage_key via _resolve_path.
Then write_path() calls _resolve_path again, causing double-nested paths
like "results/results/abc123" instead of "results/abc123".

Add duplicate-prefix check to _resolve_path, matching the existing check
in _join_bucket_folder.

Fixes PrefectHQ#20174

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- fix .prefectignore content in test fixture (remove leading whitespace)
- add filterwarnings to pyproject.toml to suppress upstream deprecation warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
remove extra disable_logging fixture and temporary_settings wrapper
to match pattern used by prefect-aws and other integrations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
fixes ephemeral server startup timeouts on Python 3.11+ with xdist workers
see PrefectHQ#16397

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

This PR adds cross-field validation to the GcsBucket class ensuring bucket_folder differs from bucket, implements a guard in _resolve_path to prevent duplicate path prefixing, updates test configuration by removing automatic logging disablement and adding server timeout settings, and introduces a regression test for the path nesting fix.

Changes

Cohort / File(s) Summary
Cross-field validation and path resolution
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py, src/integrations/prefect-gcp/tests/test_cloud_storage.py
Added model_validator import and cross-field validation to bucket_folder validator to enforce that bucket_folder ≠ bucket using info.data. Updated _bucket_folder_suffix method signature to accept info parameter. Introduced guard in _resolve_path to skip re-prefixing when bucket_folder already present in path. Added regression test test_resolve_path_no_double_nesting validating path resolution prevents double-nesting.
Test configuration updates
src/integrations/prefect-gcp/tests/conftest.py, src/integrations/prefect-gcp/pyproject.toml
Removed disable_logging fixture and related imports. Updated prefect_db fixture to invoke prefect_test_harness with server_timeout=60. Replaced single DeprecationWarning ignore rule with two broader ignores targeting httplib2 and pathspec deprecation warnings.
Test file reformatting
src/integrations/prefect-gcp/tests/projects/test_steps.py
Reformatted multi-line .prefectignore string to single-line format with newline separators; no functional change.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop through validation's garden fair,
Where bucket_folder paths take extra care,
No double-nesting leads us astray,
Cross-field checks keep bugs at bay!
With tests to prove the guard works tight,

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is minimal and provides only a reference to a benchmark PR without substantial detail about the changes, testing, or rationale. Expand the description to include an overview of proposed changes, explain the problem being fixed, note which tests were added, and ensure all checklist items are addressed.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: preventing double-nesting in GcsBucket._resolve_path, which is the primary fix across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py`:
- Around line 726-735: The current check uses "self.bucket_folder in path" which
matches substrings anywhere; change it to a path-prefix-aware check: normalize
leading/trailing slashes and then test whether the path equals the bucket_folder
or startswith bucket_folder + "/" (e.g., normalized_path ==
normalized_bucket_folder or normalized_path.startswith(normalized_bucket_folder
+ "/")). Update the logic in the same method where self.bucket_folder and path
are used (the _resolve_path/write_path area) to use this prefix check so only
true prefixes are skipped.
- Line 10: The import model_validator is unused because the cross-field
validator _bucket_folder_suffix currently uses `@field_validator`; change that to
a model-level validator: replace the `@field_validator` decorator on
_bucket_folder_suffix with `@model_validator`(mode="after") and adjust the
function signature to accept (cls, values) (or (self, values) depending on
classmethod/instance semantics) and return the validated values so the
cross-field logic runs correctly; alternatively, if you opt not to convert the
validator, simply remove model_validator from the import list to fix the lint
error.

In `@src/integrations/prefect-gcp/tests/conftest.py`:
- Around line 14-19: The fixture prefect_db is passing an invalid keyword
argument to prefect_test_harness (server_timeout) which will raise a TypeError;
change the argument name to server_startup_timeout when calling
prefect_test_harness in the prefect_db fixture so it matches the function
signature and increases the startup timeout for CI.

In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py`:
- Around line 143-165: The test fixture builds an already-prefixed path with an
extra slash because bucket_folder contains a trailing '/', so update
test_resolve_path_no_double_nesting to construct already_prefixed_path without
adding another '/' (e.g., concatenate bucket_folder and "abc123" directly or
strip/normalize the trailing slash) before calling gcs_bucket._resolve_path;
ensure assertions still check equality to that normalized already_prefixed_path
and that it does not start with bucket_folder repeated.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bda7a19 and 9077e3a.

📒 Files selected for processing (5)
  • src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
  • src/integrations/prefect-gcp/pyproject.toml
  • src/integrations/prefect-gcp/tests/conftest.py
  • src/integrations/prefect-gcp/tests/projects/test_steps.py
  • src/integrations/prefect-gcp/tests/test_cloud_storage.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/integrations/prefect-gcp/tests/projects/test_steps.py (1)
src/prefect/utilities/dockerutils.py (1)
  • write_text (298-307)
src/integrations/prefect-gcp/tests/test_cloud_storage.py (1)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py (1)
  • _resolve_path (715-743)
src/integrations/prefect-gcp/tests/conftest.py (1)
src/prefect/testing/utilities.py (1)
  • prefect_test_harness (114-197)
🪛 GitHub Actions: Integrations Packages Tests
src/integrations/prefect-gcp/tests/conftest.py

[error] 19-19: TypeError: prefect_test_harness() got an unexpected keyword argument 'server_timeout'. This prevented test harness setup for Prefect integration tests.

🪛 GitHub Actions: Static analysis
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

[error] 1-1: ruff-check: 1 error fixed (unused import). Removed unused 'model_validator' import; file modified by pre-commit hook. Command: pre-commit run --show-diff-on-failure --color=always --all-files.

🪛 Ruff (0.14.13)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

711-711: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
src/integrations/prefect-gcp/pyproject.toml (1)

103-105: Warning filter updates look fine.

src/integrations/prefect-gcp/tests/projects/test_steps.py (1)

57-57: LGTM — ignore patterns remain covered.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union

from pydantic import Field, field_validator
from pydantic import Field, field_validator, model_validator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, let's check the file exists and read the import statement around line 10
echo "=== Checking import statement at line 10 ==="
sed -n '1,20p' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

# Check for any usage of model_validator in the file
echo -e "\n=== Searching for model_validator usage ==="
rg -n 'model_validator' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

# Check what's at lines 698-713 (the mentioned additional location)
echo -e "\n=== Checking lines 698-713 ==="
sed -n '698,713p' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

# Count total lines in file for context
echo -e "\n=== File info ==="
wc -l src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py

Repository: qodo-benchmark/prefect-combined-coderabbit

Length of output: 1902


Fix unused model_validator import (CI lint failure).

The model_validator import on line 10 is unused. The cross-field validation in _bucket_folder_suffix (lines 698-713) currently uses @field_validator but should use @model_validator for proper Pydantic validation semantics. Either remove the unused import or refactor the validation to actually use @model_validator.

Option 1: Remove unused import
-from pydantic import Field, field_validator, model_validator
+from pydantic import Field, field_validator
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from pydantic import Field, field_validator, model_validator
from pydantic import Field, field_validator
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` at line 10, The
import model_validator is unused because the cross-field validator
_bucket_folder_suffix currently uses `@field_validator`; change that to a
model-level validator: replace the `@field_validator` decorator on
_bucket_folder_suffix with `@model_validator`(mode="after") and adjust the
function signature to accept (cls, values) (or (self, values) depending on
classmethod/instance semantics) and return the validated values so the
cross-field logic runs correctly; alternatively, if you opt not to convert the
validator, simply remove model_validator from the import list to fix the lint
error.

Comment on lines 726 to +735
# If bucket_folder provided, it means we won't write to the root dir of
# the bucket. So we need to add it on the front of the path.
#
# However, avoid double-nesting if path is already prefixed with bucket_folder.
# This can happen when storage_block_id is null (e.g., context serialized to
# remote workers), causing create_result_record() to add bucket_folder to
# storage_key, then write_path() calls _resolve_path again.
# See https://github.com/PrefectHQ/prefect/issues/20174
if self.bucket_folder and self.bucket_folder in path:
return path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid substring matches when checking for existing prefixes.
self.bucket_folder in path will skip prefixing when the folder name appears later in the path (e.g., archive/results/file). Use a prefix/path-aware check instead.

🛠️ Proposed fix
-        if self.bucket_folder and self.bucket_folder in path:
+        if self.bucket_folder and path.startswith(self.bucket_folder):
             return path
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` around lines 726 -
735, The current check uses "self.bucket_folder in path" which matches
substrings anywhere; change it to a path-prefix-aware check: normalize
leading/trailing slashes and then test whether the path equals the bucket_folder
or startswith bucket_folder + "/" (e.g., normalized_path ==
normalized_bucket_folder or normalized_path.startswith(normalized_bucket_folder
+ "/")). Update the logic in the same method where self.bucket_folder and path
are used (the _resolve_path/write_path area) to use this prefix check so only
true prefixes are skipped.

Comment on lines 14 to +19
@pytest.fixture(scope="session", autouse=True)
def prefect_db():
with prefect_test_harness():
yield


@pytest.fixture(scope="session", autouse=True)
def disable_logging():
with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}):
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_timeout=60):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the definition of prefect_test_harness
rg -n "def prefect_test_harness" -g "*.py" -B 2 -A 10

# Find all usages to understand the pattern
rg -n "prefect_test_harness\(" -g "*.py" -B 1 -A 1

Repository: qodo-benchmark/prefect-combined-coderabbit

Length of output: 7663


Use the correct prefect_test_harness parameter name.

server_timeout is not a valid argument. The correct parameter is server_startup_timeout as defined in the function signature. Using an invalid keyword argument will cause a TypeError at runtime.

🛠️ Proposed fix
-    with prefect_test_harness(server_timeout=60):
+    with prefect_test_harness(server_startup_timeout=60):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.fixture(scope="session", autouse=True)
def prefect_db():
with prefect_test_harness():
yield
@pytest.fixture(scope="session", autouse=True)
def disable_logging():
with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}):
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_timeout=60):
`@pytest.fixture`(scope="session", autouse=True)
def prefect_db():
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_startup_timeout=60):
🧰 Tools
🪛 GitHub Actions: Integrations Packages Tests

[error] 19-19: TypeError: prefect_test_harness() got an unexpected keyword argument 'server_timeout'. This prevented test harness setup for Prefect integration tests.

🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/conftest.py` around lines 14 - 19, The
fixture prefect_db is passing an invalid keyword argument to
prefect_test_harness (server_timeout) which will raise a TypeError; change the
argument name to server_startup_timeout when calling prefect_test_harness in the
prefect_db fixture so it matches the function signature and increases the
startup timeout for CI.

Comment on lines +143 to +165
def test_resolve_path_no_double_nesting(self, gcs_bucket):
"""
Regression test for https://github.com/PrefectHQ/prefect/issues/20174

When storage_block_id is null (e.g., context serialized to Ray workers),
create_result_record() adds bucket_folder to storage_key via _resolve_path.
Then write_path() calls _resolve_path again. Without the duplicate check,
this causes double-nested paths like "results/results/abc123".
"""
bucket_folder = gcs_bucket.bucket_folder
if not bucket_folder:
pytest.skip("Test only applies when bucket_folder is set")

# Simulate path that already has bucket_folder prefix
# (as would happen when create_result_record calls _resolve_path)
already_prefixed_path = f"{bucket_folder}/abc123"

# When write_path calls _resolve_path again, it should NOT double-nest
result = gcs_bucket._resolve_path(already_prefixed_path)

# Should return the same path, not bucket_folder/bucket_folder/abc123
assert result == already_prefixed_path
assert not result.startswith(f"{bucket_folder}{bucket_folder}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid double slash in the already‑prefixed path fixture.
bucket_folder already includes a trailing /, so interpolating /{...} produces // and isn’t representative of actual paths.

🛠️ Proposed fix
-        already_prefixed_path = f"{bucket_folder}/abc123"
+        already_prefixed_path = str(PurePosixPath(bucket_folder) / "abc123")
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py` around lines 143 -
165, The test fixture builds an already-prefixed path with an extra slash
because bucket_folder contains a trailing '/', so update
test_resolve_path_no_double_nesting to construct already_prefixed_path without
adding another '/' (e.g., concatenate bucket_folder and "abc123" directly or
strip/normalize the trailing slash) before calling gcs_bucket._resolve_path;
ensure assertions still check equality to that normalized already_prefixed_path
and that it does not start with bucket_folder repeated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants