fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#31
Conversation
When storage_block_id is null (e.g., context serialized to Ray workers), create_result_record() adds bucket_folder to storage_key via _resolve_path. Then write_path() calls _resolve_path again, causing double-nested paths like "results/results/abc123" instead of "results/abc123". Add duplicate-prefix check to _resolve_path, matching the existing check in _join_bucket_folder. Fixes PrefectHQ#20174 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- fix .prefectignore content in test fixture (remove leading whitespace) - add filterwarnings to pyproject.toml to suppress upstream deprecation warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
remove extra disable_logging fixture and temporary_settings wrapper to match pattern used by prefect-aws and other integrations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
fixes ephemeral server startup timeouts on Python 3.11+ with xdist workers see PrefectHQ#16397 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughThis PR enhances GCS bucket path resolution with cross-field validation preventing bucket_folder conflicts with bucket names, adds a no-double-nesting guard to prevent path duplication, adjusts test configuration for CI reliability, and includes regression tests validating the corrected behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py`:
- Around line 726-735: The substring check using "self.bucket_folder in path"
can falsely treat mid-path occurrences as prefixes; in the _resolve_path /
write_path logic replace that membership test with a proper prefix check (e.g.,
normalize both values and use startswith or compare path segments) so only true
leading-folder matches skip re-prefixing; update the condition that currently
reads "if self.bucket_folder and self.bucket_folder in path: return path" to a
prefix-aware check using self.bucket_folder and path.
- Around line 698-713: Split the responsibilities: keep the existing
`@field_validator` method _bucket_folder_suffix to only ensure bucket_folder ends
with a "/" (remove the cross-field check), and add a new
`@model_validator`(mode="after") classmethod (e.g., _validate_bucket_vs_folder)
that receives the validated model, compares model.bucket and model.bucket_folder
(strip trailing slashes) and raises ValueError("bucket_folder cannot be the same
as bucket name") if they match; reference the existing _bucket_folder_suffix and
add the new model validator to perform the invariant check after all fields are
validated.
In `@src/integrations/prefect-gcp/tests/conftest.py`:
- Around line 15-19: The context manager call in the prefect_db fixture uses the
wrong keyword argument; change the prefect_test_harness(...) invocation to use
server_startup_timeout=60 (replace server_timeout) so the prefect_db fixture
passes the correct parameter to prefect_test_harness.
In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py`:
- Around line 143-165: The test builds already_prefixed_path by concatenating
bucket_folder and "abc123" but bucket_folder is normalized with a trailing
slash, causing a double-slash (e.g., base_folder//abc123) and a false positive;
fix by constructing already_prefixed_path without introducing a duplicate slash
(e.g., use bucket_folder.rstrip('/') + '/' + 'abc123' or os.path.join semantics)
and update the negative prefix assertion to compare against the normalized form
(use bucket_folder.rstrip('/') when composing the
f"{bucket_folder}{bucket_folder}" check) so test_resolve_path_no_double_nesting
and its checks against _resolve_path correctly detect double-nesting rather than
double-separators.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.pysrc/integrations/prefect-gcp/pyproject.tomlsrc/integrations/prefect-gcp/tests/conftest.pysrc/integrations/prefect-gcp/tests/projects/test_steps.pysrc/integrations/prefect-gcp/tests/test_cloud_storage.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/integrations/prefect-gcp/tests/test_cloud_storage.py (1)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py (1)
_resolve_path(715-743)
🪛 GitHub Actions: Integrations Packages Tests
src/integrations/prefect-gcp/tests/conftest.py
[error] 1-1: TypeError: prefect_test_harness() got an unexpected keyword argument 'server_timeout'. This is causing widespread test failures across many test modules.
🪛 Ruff (0.14.14)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
711-711: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit checks
🔇 Additional comments (2)
src/integrations/prefect-gcp/tests/projects/test_steps.py (1)
57-57: LGTM: explicit.prefectignorecontentUsing a single-line literal avoids indentation/newline surprises from multiline strings.
src/integrations/prefect-gcp/pyproject.toml (1)
103-105: Confirm warning filters still match emitted deprecationsThese patterns are string‑sensitive; if upstream warning text changes they’ll stop suppressing. Please verify with a local pytest run that the filters still catch the intended warnings.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| @field_validator("bucket_folder") | ||
| @classmethod | ||
| def _bucket_folder_suffix(cls, value): | ||
| def _bucket_folder_suffix(cls, value, info): | ||
| """ | ||
| Ensures that the bucket folder is suffixed with a forward slash. | ||
| Also validates that bucket_folder doesn't conflict with bucket name. | ||
| """ | ||
| if value != "" and not value.endswith("/"): | ||
| value = f"{value}/" | ||
|
|
||
| # Cross-field validation: ensure bucket_folder doesn't match bucket name | ||
| # This should use @model_validator but incorrectly uses @field_validator | ||
| if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"): | ||
| raise ValueError("bucket_folder cannot be the same as bucket name") | ||
|
|
||
| return value |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and check the actual code
find . -type f -name "cloud_storage.py" | grep prefect_gcpRepository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 145
🏁 Script executed:
# If found, check the specific lines
if [ -f "src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py" ]; then
sed -n '698,713p' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
fiRepository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 796
🌐 Web query:
Pydantic v2 cross-field validation between two fields best practice @model_validatorvs@field_validator``
💡 Result:
Best practice (Pydantic v2): use @model_validator for true cross-field invariants
Prefer @model_validator(mode="after") when the rule depends on multiple fields
- Runs after field parsing/validation, so you can safely read all fields from the model instance.
- This is what Pydantic recommends “in most cases” for multi-field validation, because
@field_validatorcross-field access can be order-dependent. [1]
from pydantic import BaseModel, model_validator
class User(BaseModel):
password: str
password_repeat: str
`@model_validator`(mode="after")
def passwords_match(self):
if self.password != self.password_repeat:
raise ValueError("Passwords do not match")
return selfUse @field_validator only for “one field depends on earlier fields” (and you accept ordering constraints)
- You can read other already-validated fields via
info.data, but validation happens in field definition order, so later fields may not be available yet. [1][2] - Works well for cases like validating
password_repeatagainstpassword(whenpasswordis defined first). [1]
from pydantic import BaseModel, field_validator, ValidationInfo
class User(BaseModel):
password: str
password_repeat: str
`@field_validator`("password_repeat", mode="after")
`@classmethod`
def passwords_match(cls, v: str, info: ValidationInfo) -> str:
if v != info.data["password"]:
raise ValueError("Passwords do not match")
return vRule of thumb
- Multiple fields / whole-model invariant:
@model_validator(mode="after")(safer, not order-fragile). [1] - Single field with a dependency on earlier fields:
@field_validator(..., mode="after")withinfo.data, but be careful about field order. [1][2] - For v1 migrations:
@root_validator→@model_validator. [3]
Sources: Pydantic validators docs (v2) [1][2], migration guide [3].
[1] https://docs.pydantic.dev/latest/concepts/validators/
[2] https://docs.pydantic.dev/2.5/concepts/validators/
[3] https://docs.pydantic.dev/2.0/migration/
Refactor to use @model_validator for the cross-field bucket/bucket_folder check
Cross-field validation in Pydantic v2 is best handled with @model_validator(mode="after"), which runs after all fields are validated. The current approach using @field_validator with info.data.get() is order-dependent and—as noted in the code comment—is not the correct pattern. The refactor keeps the field validator focused on its single responsibility (ensuring the slash suffix) while the model validator handles the cross-field invariant.
Suggested refactor
`@field_validator`("bucket_folder")
`@classmethod`
-def _bucket_folder_suffix(cls, value, info):
+def _bucket_folder_suffix(cls, value):
"""
Ensures that the bucket folder is suffixed with a forward slash.
- Also validates that bucket_folder doesn't conflict with bucket name.
"""
if value != "" and not value.endswith("/"):
value = f"{value}/"
-
- # Cross-field validation: ensure bucket_folder doesn't match bucket name
- # This should use `@model_validator` but incorrectly uses `@field_validator`
- if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"):
- raise ValueError("bucket_folder cannot be the same as bucket name")
-
return value
+
+@model_validator(mode="after")
+def _validate_bucket_folder(self):
+ if self.bucket_folder.strip("/") == self.bucket:
+ raise ValueError("bucket_folder cannot be the same as bucket name")
+ return self🧰 Tools
🪛 Ruff (0.14.14)
711-711: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` around lines 698 -
713, Split the responsibilities: keep the existing `@field_validator` method
_bucket_folder_suffix to only ensure bucket_folder ends with a "/" (remove the
cross-field check), and add a new `@model_validator`(mode="after") classmethod
(e.g., _validate_bucket_vs_folder) that receives the validated model, compares
model.bucket and model.bucket_folder (strip trailing slashes) and raises
ValueError("bucket_folder cannot be the same as bucket name") if they match;
reference the existing _bucket_folder_suffix and add the new model validator to
perform the invariant check after all fields are validated.
| # If bucket_folder provided, it means we won't write to the root dir of | ||
| # the bucket. So we need to add it on the front of the path. | ||
| # | ||
| # However, avoid double-nesting if path is already prefixed with bucket_folder. | ||
| # This can happen when storage_block_id is null (e.g., context serialized to | ||
| # remote workers), causing create_result_record() to add bucket_folder to | ||
| # storage_key, then write_path() calls _resolve_path again. | ||
| # See https://github.com/PrefectHQ/prefect/issues/20174 | ||
| if self.bucket_folder and self.bucket_folder in path: | ||
| return path |
There was a problem hiding this comment.
Substring check can skip prefixing for unrelated paths
self.bucket_folder in path matches mid‑path occurrences (e.g., backup/data/file when bucket_folder is data/) and will incorrectly bypass prefixing. Use a prefix check instead.
🐛 Proposed fix
- if self.bucket_folder and self.bucket_folder in path:
+ if self.bucket_folder and path.startswith(self.bucket_folder):
return path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # If bucket_folder provided, it means we won't write to the root dir of | |
| # the bucket. So we need to add it on the front of the path. | |
| # | |
| # However, avoid double-nesting if path is already prefixed with bucket_folder. | |
| # This can happen when storage_block_id is null (e.g., context serialized to | |
| # remote workers), causing create_result_record() to add bucket_folder to | |
| # storage_key, then write_path() calls _resolve_path again. | |
| # See https://github.com/PrefectHQ/prefect/issues/20174 | |
| if self.bucket_folder and self.bucket_folder in path: | |
| return path | |
| # If bucket_folder provided, it means we won't write to the root dir of | |
| # the bucket. So we need to add it on the front of the path. | |
| # | |
| # However, avoid double-nesting if path is already prefixed with bucket_folder. | |
| # This can happen when storage_block_id is null (e.g., context serialized to | |
| # remote workers), causing create_result_record() to add bucket_folder to | |
| # storage_key, then write_path() calls _resolve_path again. | |
| # See https://github.com/PrefectHQ/prefect/issues/20174 | |
| if self.bucket_folder and path.startswith(self.bucket_folder): | |
| return path |
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` around lines 726 -
735, The substring check using "self.bucket_folder in path" can falsely treat
mid-path occurrences as prefixes; in the _resolve_path / write_path logic
replace that membership test with a proper prefix check (e.g., normalize both
values and use startswith or compare path segments) so only true leading-folder
matches skip re-prefixing; update the condition that currently reads "if
self.bucket_folder and self.bucket_folder in path: return path" to a
prefix-aware check using self.bucket_folder and path.
| def prefect_db(): | ||
| with prefect_test_harness(): | ||
| yield | ||
|
|
||
|
|
||
| @pytest.fixture(scope="session", autouse=True) | ||
| def disable_logging(): | ||
| with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}): | ||
| # Increase timeout for CI environments where multiple xdist workers | ||
| # start servers simultaneously, which can be slower on Python 3.11+ | ||
| # See https://github.com/PrefectHQ/prefect/issues/16397 | ||
| with prefect_test_harness(server_timeout=60): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find dependency specification files
find . -maxdepth 3 -type f \( -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements*.txt" \) | head -20Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 164
🏁 Script executed:
# Look for the prefect-gcp integration and its dependencies
cat src/integrations/prefect-gcp/pyproject.toml 2>/dev/null || echo "File not found, searching for it..."Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 3119
🏁 Script executed:
# Search for where prefect_test_harness is imported from
rg "prefect_test_harness" --type py -B 2 -A 2Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 19414
🏁 Script executed:
# Check the actual function definition and signature
sed -n '1,100p' src/prefect/testing/utilities.py | head -60Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 2060
🏁 Script executed:
# Get the full function signature and docstring
ast-grep --pattern 'def prefect_test_harness($_) {
$$$
}'Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 68
🏁 Script executed:
# More focused search on the function definition
rg "def prefect_test_harness" -A 30 src/prefect/testing/utilities.pyRepository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 1025
Use correct parameter name: server_startup_timeout
The parameter is named server_startup_timeout, not server_timeout. Change line 19 to:
with prefect_test_harness(server_startup_timeout=60):🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/conftest.py` around lines 15 - 19, The
context manager call in the prefect_db fixture uses the wrong keyword argument;
change the prefect_test_harness(...) invocation to use server_startup_timeout=60
(replace server_timeout) so the prefect_db fixture passes the correct parameter
to prefect_test_harness.
| def test_resolve_path_no_double_nesting(self, gcs_bucket): | ||
| """ | ||
| Regression test for https://github.com/PrefectHQ/prefect/issues/20174 | ||
|
|
||
| When storage_block_id is null (e.g., context serialized to Ray workers), | ||
| create_result_record() adds bucket_folder to storage_key via _resolve_path. | ||
| Then write_path() calls _resolve_path again. Without the duplicate check, | ||
| this causes double-nested paths like "results/results/abc123". | ||
| """ | ||
| bucket_folder = gcs_bucket.bucket_folder | ||
| if not bucket_folder: | ||
| pytest.skip("Test only applies when bucket_folder is set") | ||
|
|
||
| # Simulate path that already has bucket_folder prefix | ||
| # (as would happen when create_result_record calls _resolve_path) | ||
| already_prefixed_path = f"{bucket_folder}/abc123" | ||
|
|
||
| # When write_path calls _resolve_path again, it should NOT double-nest | ||
| result = gcs_bucket._resolve_path(already_prefixed_path) | ||
|
|
||
| # Should return the same path, not bucket_folder/bucket_folder/abc123 | ||
| assert result == already_prefixed_path | ||
| assert not result.startswith(f"{bucket_folder}{bucket_folder}") |
There was a problem hiding this comment.
already_prefixed_path adds a double slash when bucket_folder already ends with /
bucket_folder is normalized to include a trailing slash, so the extra / yields base_folder//abc123. That makes the test accept a double‑separator path.
✅ Suggested fix
- already_prefixed_path = f"{bucket_folder}/abc123"
+ already_prefixed_path = f"{bucket_folder}abc123"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_resolve_path_no_double_nesting(self, gcs_bucket): | |
| """ | |
| Regression test for https://github.com/PrefectHQ/prefect/issues/20174 | |
| When storage_block_id is null (e.g., context serialized to Ray workers), | |
| create_result_record() adds bucket_folder to storage_key via _resolve_path. | |
| Then write_path() calls _resolve_path again. Without the duplicate check, | |
| this causes double-nested paths like "results/results/abc123". | |
| """ | |
| bucket_folder = gcs_bucket.bucket_folder | |
| if not bucket_folder: | |
| pytest.skip("Test only applies when bucket_folder is set") | |
| # Simulate path that already has bucket_folder prefix | |
| # (as would happen when create_result_record calls _resolve_path) | |
| already_prefixed_path = f"{bucket_folder}/abc123" | |
| # When write_path calls _resolve_path again, it should NOT double-nest | |
| result = gcs_bucket._resolve_path(already_prefixed_path) | |
| # Should return the same path, not bucket_folder/bucket_folder/abc123 | |
| assert result == already_prefixed_path | |
| assert not result.startswith(f"{bucket_folder}{bucket_folder}") | |
| def test_resolve_path_no_double_nesting(self, gcs_bucket): | |
| """ | |
| Regression test for https://github.com/PrefectHQ/prefect/issues/20174 | |
| When storage_block_id is null (e.g., context serialized to Ray workers), | |
| create_result_record() adds bucket_folder to storage_key via _resolve_path. | |
| Then write_path() calls _resolve_path again. Without the duplicate check, | |
| this causes double-nested paths like "results/results/abc123". | |
| """ | |
| bucket_folder = gcs_bucket.bucket_folder | |
| if not bucket_folder: | |
| pytest.skip("Test only applies when bucket_folder is set") | |
| # Simulate path that already has bucket_folder prefix | |
| # (as would happen when create_result_record calls _resolve_path) | |
| already_prefixed_path = f"{bucket_folder}abc123" | |
| # When write_path calls _resolve_path again, it should NOT double-nest | |
| result = gcs_bucket._resolve_path(already_prefixed_path) | |
| # Should return the same path, not bucket_folder/bucket_folder/abc123 | |
| assert result == already_prefixed_path | |
| assert not result.startswith(f"{bucket_folder}{bucket_folder}") |
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py` around lines 143 -
165, The test builds already_prefixed_path by concatenating bucket_folder and
"abc123" but bucket_folder is normalized with a trailing slash, causing a
double-slash (e.g., base_folder//abc123) and a false positive; fix by
constructing already_prefixed_path without introducing a duplicate slash (e.g.,
use bucket_folder.rstrip('/') + '/' + 'abc123' or os.path.join semantics) and
update the negative prefix assertion to compare against the normalized form (use
bucket_folder.rstrip('/') when composing the f"{bucket_folder}{bucket_folder}"
check) so test_resolve_path_no_double_nesting and its checks against
_resolve_path correctly detect double-nesting rather than double-separators.
Benchmark PR from agentic-review-benchmarks#5
Summary by CodeRabbit
Release Notes
Bug Fixes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.