fix(prefect-gcp): prevent double-nesting in GcsBucket._resolve_path#18
Conversation
When storage_block_id is null (e.g., context serialized to Ray workers), create_result_record() adds bucket_folder to storage_key via _resolve_path. Then write_path() calls _resolve_path again, causing double-nested paths like "results/results/abc123" instead of "results/abc123". Add duplicate-prefix check to _resolve_path, matching the existing check in _join_bucket_folder. Fixes PrefectHQ#20174 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- fix .prefectignore content in test fixture (remove leading whitespace) - add filterwarnings to pyproject.toml to suppress upstream deprecation warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
remove extra disable_logging fixture and temporary_settings wrapper to match pattern used by prefect-aws and other integrations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
fixes ephemeral server startup timeouts on Python 3.11+ with xdist workers see PrefectHQ#16397 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughThis PR adds cross-field validation to the GcsBucket class ensuring bucket_folder differs from bucket, implements a guard in _resolve_path to prevent duplicate path prefixing, updates test configuration by removing automatic logging disablement and adding server timeout settings, and introduces a regression test for the path nesting fix. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py`:
- Around line 726-735: The current check uses "self.bucket_folder in path" which
matches substrings anywhere; change it to a path-prefix-aware check: normalize
leading/trailing slashes and then test whether the path equals the bucket_folder
or startswith bucket_folder + "/" (e.g., normalized_path ==
normalized_bucket_folder or normalized_path.startswith(normalized_bucket_folder
+ "/")). Update the logic in the same method where self.bucket_folder and path
are used (the _resolve_path/write_path area) to use this prefix check so only
true prefixes are skipped.
- Line 10: The import model_validator is unused because the cross-field
validator _bucket_folder_suffix currently uses `@field_validator`; change that to
a model-level validator: replace the `@field_validator` decorator on
_bucket_folder_suffix with `@model_validator`(mode="after") and adjust the
function signature to accept (cls, values) (or (self, values) depending on
classmethod/instance semantics) and return the validated values so the
cross-field logic runs correctly; alternatively, if you opt not to convert the
validator, simply remove model_validator from the import list to fix the lint
error.
In `@src/integrations/prefect-gcp/tests/conftest.py`:
- Around line 14-19: The fixture prefect_db is passing an invalid keyword
argument to prefect_test_harness (server_timeout) which will raise a TypeError;
change the argument name to server_startup_timeout when calling
prefect_test_harness in the prefect_db fixture so it matches the function
signature and increases the startup timeout for CI.
In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py`:
- Around line 143-165: The test fixture builds an already-prefixed path with an
extra slash because bucket_folder contains a trailing '/', so update
test_resolve_path_no_double_nesting to construct already_prefixed_path without
adding another '/' (e.g., concatenate bucket_folder and "abc123" directly or
strip/normalize the trailing slash) before calling gcs_bucket._resolve_path;
ensure assertions still check equality to that normalized already_prefixed_path
and that it does not start with bucket_folder repeated.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.pysrc/integrations/prefect-gcp/pyproject.tomlsrc/integrations/prefect-gcp/tests/conftest.pysrc/integrations/prefect-gcp/tests/projects/test_steps.pysrc/integrations/prefect-gcp/tests/test_cloud_storage.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/integrations/prefect-gcp/tests/projects/test_steps.py (1)
src/prefect/utilities/dockerutils.py (1)
write_text(298-307)
src/integrations/prefect-gcp/tests/test_cloud_storage.py (1)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py (1)
_resolve_path(715-743)
src/integrations/prefect-gcp/tests/conftest.py (1)
src/prefect/testing/utilities.py (1)
prefect_test_harness(114-197)
🪛 GitHub Actions: Integrations Packages Tests
src/integrations/prefect-gcp/tests/conftest.py
[error] 19-19: TypeError: prefect_test_harness() got an unexpected keyword argument 'server_timeout'. This prevented test harness setup for Prefect integration tests.
🪛 GitHub Actions: Static analysis
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
[error] 1-1: ruff-check: 1 error fixed (unused import). Removed unused 'model_validator' import; file modified by pre-commit hook. Command: pre-commit run --show-diff-on-failure --color=always --all-files.
🪛 Ruff (0.14.13)
src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
711-711: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
src/integrations/prefect-gcp/pyproject.toml (1)
103-105: Warning filter updates look fine.src/integrations/prefect-gcp/tests/projects/test_steps.py (1)
57-57: LGTM — ignore patterns remain covered.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union | ||
|
|
||
| from pydantic import Field, field_validator | ||
| from pydantic import Field, field_validator, model_validator |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's check the file exists and read the import statement around line 10
echo "=== Checking import statement at line 10 ==="
sed -n '1,20p' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
# Check for any usage of model_validator in the file
echo -e "\n=== Searching for model_validator usage ==="
rg -n 'model_validator' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
# Check what's at lines 698-713 (the mentioned additional location)
echo -e "\n=== Checking lines 698-713 ==="
sed -n '698,713p' src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
# Count total lines in file for context
echo -e "\n=== File info ==="
wc -l src/integrations/prefect-gcp/prefect_gcp/cloud_storage.pyRepository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 1902
Fix unused model_validator import (CI lint failure).
The model_validator import on line 10 is unused. The cross-field validation in _bucket_folder_suffix (lines 698-713) currently uses @field_validator but should use @model_validator for proper Pydantic validation semantics. Either remove the unused import or refactor the validation to actually use @model_validator.
Option 1: Remove unused import
-from pydantic import Field, field_validator, model_validator
+from pydantic import Field, field_validator📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from pydantic import Field, field_validator, model_validator | |
| from pydantic import Field, field_validator |
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` at line 10, The
import model_validator is unused because the cross-field validator
_bucket_folder_suffix currently uses `@field_validator`; change that to a
model-level validator: replace the `@field_validator` decorator on
_bucket_folder_suffix with `@model_validator`(mode="after") and adjust the
function signature to accept (cls, values) (or (self, values) depending on
classmethod/instance semantics) and return the validated values so the
cross-field logic runs correctly; alternatively, if you opt not to convert the
validator, simply remove model_validator from the import list to fix the lint
error.
| # If bucket_folder provided, it means we won't write to the root dir of | ||
| # the bucket. So we need to add it on the front of the path. | ||
| # | ||
| # However, avoid double-nesting if path is already prefixed with bucket_folder. | ||
| # This can happen when storage_block_id is null (e.g., context serialized to | ||
| # remote workers), causing create_result_record() to add bucket_folder to | ||
| # storage_key, then write_path() calls _resolve_path again. | ||
| # See https://github.com/PrefectHQ/prefect/issues/20174 | ||
| if self.bucket_folder and self.bucket_folder in path: | ||
| return path |
There was a problem hiding this comment.
Avoid substring matches when checking for existing prefixes.
self.bucket_folder in path will skip prefixing when the folder name appears later in the path (e.g., archive/results/file). Use a prefix/path-aware check instead.
🛠️ Proposed fix
- if self.bucket_folder and self.bucket_folder in path:
+ if self.bucket_folder and path.startswith(self.bucket_folder):
return path🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py` around lines 726 -
735, The current check uses "self.bucket_folder in path" which matches
substrings anywhere; change it to a path-prefix-aware check: normalize
leading/trailing slashes and then test whether the path equals the bucket_folder
or startswith bucket_folder + "/" (e.g., normalized_path ==
normalized_bucket_folder or normalized_path.startswith(normalized_bucket_folder
+ "/")). Update the logic in the same method where self.bucket_folder and path
are used (the _resolve_path/write_path area) to use this prefix check so only
true prefixes are skipped.
| @pytest.fixture(scope="session", autouse=True) | ||
| def prefect_db(): | ||
| with prefect_test_harness(): | ||
| yield | ||
|
|
||
|
|
||
| @pytest.fixture(scope="session", autouse=True) | ||
| def disable_logging(): | ||
| with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}): | ||
| # Increase timeout for CI environments where multiple xdist workers | ||
| # start servers simultaneously, which can be slower on Python 3.11+ | ||
| # See https://github.com/PrefectHQ/prefect/issues/16397 | ||
| with prefect_test_harness(server_timeout=60): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the definition of prefect_test_harness
rg -n "def prefect_test_harness" -g "*.py" -B 2 -A 10
# Find all usages to understand the pattern
rg -n "prefect_test_harness\(" -g "*.py" -B 1 -A 1Repository: qodo-benchmark/prefect-combined-coderabbit
Length of output: 7663
Use the correct prefect_test_harness parameter name.
server_timeout is not a valid argument. The correct parameter is server_startup_timeout as defined in the function signature. Using an invalid keyword argument will cause a TypeError at runtime.
🛠️ Proposed fix
- with prefect_test_harness(server_timeout=60):
+ with prefect_test_harness(server_startup_timeout=60):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.fixture(scope="session", autouse=True) | |
| def prefect_db(): | |
| with prefect_test_harness(): | |
| yield | |
| @pytest.fixture(scope="session", autouse=True) | |
| def disable_logging(): | |
| with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}): | |
| # Increase timeout for CI environments where multiple xdist workers | |
| # start servers simultaneously, which can be slower on Python 3.11+ | |
| # See https://github.com/PrefectHQ/prefect/issues/16397 | |
| with prefect_test_harness(server_timeout=60): | |
| `@pytest.fixture`(scope="session", autouse=True) | |
| def prefect_db(): | |
| # Increase timeout for CI environments where multiple xdist workers | |
| # start servers simultaneously, which can be slower on Python 3.11+ | |
| # See https://github.com/PrefectHQ/prefect/issues/16397 | |
| with prefect_test_harness(server_startup_timeout=60): |
🧰 Tools
🪛 GitHub Actions: Integrations Packages Tests
[error] 19-19: TypeError: prefect_test_harness() got an unexpected keyword argument 'server_timeout'. This prevented test harness setup for Prefect integration tests.
🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/conftest.py` around lines 14 - 19, The
fixture prefect_db is passing an invalid keyword argument to
prefect_test_harness (server_timeout) which will raise a TypeError; change the
argument name to server_startup_timeout when calling prefect_test_harness in the
prefect_db fixture so it matches the function signature and increases the
startup timeout for CI.
| def test_resolve_path_no_double_nesting(self, gcs_bucket): | ||
| """ | ||
| Regression test for https://github.com/PrefectHQ/prefect/issues/20174 | ||
|
|
||
| When storage_block_id is null (e.g., context serialized to Ray workers), | ||
| create_result_record() adds bucket_folder to storage_key via _resolve_path. | ||
| Then write_path() calls _resolve_path again. Without the duplicate check, | ||
| this causes double-nested paths like "results/results/abc123". | ||
| """ | ||
| bucket_folder = gcs_bucket.bucket_folder | ||
| if not bucket_folder: | ||
| pytest.skip("Test only applies when bucket_folder is set") | ||
|
|
||
| # Simulate path that already has bucket_folder prefix | ||
| # (as would happen when create_result_record calls _resolve_path) | ||
| already_prefixed_path = f"{bucket_folder}/abc123" | ||
|
|
||
| # When write_path calls _resolve_path again, it should NOT double-nest | ||
| result = gcs_bucket._resolve_path(already_prefixed_path) | ||
|
|
||
| # Should return the same path, not bucket_folder/bucket_folder/abc123 | ||
| assert result == already_prefixed_path | ||
| assert not result.startswith(f"{bucket_folder}{bucket_folder}") |
There was a problem hiding this comment.
Avoid double slash in the already‑prefixed path fixture.
bucket_folder already includes a trailing /, so interpolating /{...} produces // and isn’t representative of actual paths.
🛠️ Proposed fix
- already_prefixed_path = f"{bucket_folder}/abc123"
+ already_prefixed_path = str(PurePosixPath(bucket_folder) / "abc123")🤖 Prompt for AI Agents
In `@src/integrations/prefect-gcp/tests/test_cloud_storage.py` around lines 143 -
165, The test fixture builds an already-prefixed path with an extra slash
because bucket_folder contains a trailing '/', so update
test_resolve_path_no_double_nesting to construct already_prefixed_path without
adding another '/' (e.g., concatenate bucket_folder and "abc123" directly or
strip/normalize the trailing slash) before calling gcs_bucket._resolve_path;
ensure assertions still check equality to that normalized already_prefixed_path
and that it does not start with bucket_folder repeated.
Benchmark PR from qodo-benchmark#559
Summary by CodeRabbit
Bug Fixes
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.