Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions src/integrations/prefect-gcp/prefect_gcp/cloud_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from pathlib import Path, PurePosixPath
from typing import Any, BinaryIO, Dict, List, Optional, Tuple, Union

from pydantic import Field, field_validator
from pydantic import Field, field_validator, model_validator

from prefect import task
from prefect.blocks.abstract import ObjectStorageBlock
Expand Down Expand Up @@ -697,12 +697,19 @@ def basepath(self) -> str:

@field_validator("bucket_folder")
@classmethod
def _bucket_folder_suffix(cls, value):
def _bucket_folder_suffix(cls, value, info):
"""
Ensures that the bucket folder is suffixed with a forward slash.
Also validates that bucket_folder doesn't conflict with bucket name.
"""
if value != "" and not value.endswith("/"):
value = f"{value}/"

# Cross-field validation: ensure bucket_folder doesn't match bucket name
# This should use @model_validator but incorrectly uses @field_validator
if info.data.get("bucket") and value.strip("/") == info.data.get("bucket"):
raise ValueError("bucket_folder cannot be the same as bucket name")
Comment on lines +708 to +711

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Quality: Cross-field validation belongs in @model_validator, not @field_validator

The code adds cross-field validation (checking bucket_folder against bucket) inside a @field_validator, and line 709 even includes a comment acknowledging this is wrong: "This should use @model_validator but incorrectly uses @field_validator."

While this technically works because bucket is defined before bucket_folder in the model (so it's available in info.data during field validation), this approach is fragile and not idiomatic Pydantic v2:

  1. Fragile field ordering dependency: If someone reorders the fields, bucket may no longer be in info.data when bucket_folder is validated, silently skipping the check.
  2. Unused import: model_validator is imported (line 10) but never used, suggesting the intent was to use it.
  3. Self-admitted anti-pattern: The comment explicitly calls out this is wrong but does it anyway.
  4. Unrelated to the stated fix: This validation is unrelated to the double-nesting bug (issue [prefect-gcp] GcsBucket._resolve_path causes double-nesting when storage_block_id is null PrefectHQ/prefect#20174) and adds scope creep to the PR.

This should either be a @model_validator(mode='after') (which is what the import and comment suggest), or be removed from this PR if it's not related to the double-nesting fix.

Was this helpful? React with 👍 / 👎

  • Apply suggested fix


return value

def _resolve_path(self, path: str) -> str:
Expand All @@ -718,6 +725,14 @@ def _resolve_path(self, path: str) -> str:
"""
# If bucket_folder provided, it means we won't write to the root dir of
# the bucket. So we need to add it on the front of the path.
#
# However, avoid double-nesting if path is already prefixed with bucket_folder.
# This can happen when storage_block_id is null (e.g., context serialized to
# remote workers), causing create_result_record() to add bucket_folder to
# storage_key, then write_path() calls _resolve_path again.
# See https://github.com/PrefectHQ/prefect/issues/20174
if self.bucket_folder and self.bucket_folder in path:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Substring in check causes false positives; use startswith

The double-nesting guard on line 734 uses self.bucket_folder in path (substring containment) instead of path.startswith(self.bucket_folder). This is a bug that will cause false positives:

  • If bucket_folder is "data/" and the path is "my-data/file.txt", the in check matches the substring "data/" within the path, incorrectly returning the path without prepending the bucket folder. The correct result should be "data/my-data/file.txt".
  • Another example: bucket_folder = "a/" and path "banana/file" — the in check matches "a/" in "banana/".

The sibling method _join_bucket_folder in the same class (around line 868) correctly uses startswith():

if self.bucket_folder != "" and bucket_path.startswith(self.bucket_folder):

The PR description and context also explicitly state startswith() should be used.

Was this helpful? React with 👍 / 👎

Suggested change
if self.bucket_folder and self.bucket_folder in path:
if self.bucket_folder and path.startswith(self.bucket_folder):
return path
  • Apply suggested fix

return path
path = (
str(PurePosixPath(self.bucket_folder, path)) if self.bucket_folder else path
)
Expand Down
3 changes: 2 additions & 1 deletion src/integrations/prefect-gcp/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ asyncio_default_fixture_loop_scope = "session"
asyncio_mode = "auto"
env = ["PREFECT_TEST_MODE=1"]
filterwarnings = [
"ignore:Type google._upb._message.* uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14:DeprecationWarning",
"ignore:'.*' deprecated - use .*:DeprecationWarning:httplib2",
"ignore:GitWildMatchPattern .* is deprecated:DeprecationWarning:pathspec",
]

[tool.uv.sources]
Expand Down
12 changes: 4 additions & 8 deletions src/integrations/prefect-gcp/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,15 @@
from google.cloud.exceptions import NotFound
from prefect_gcp.credentials import GcpCredentials

from prefect.settings import PREFECT_LOGGING_TO_API_ENABLED, temporary_settings
from prefect.testing.utilities import prefect_test_harness


@pytest.fixture(scope="session", autouse=True)
def prefect_db():
with prefect_test_harness():
yield


@pytest.fixture(scope="session", autouse=True)
def disable_logging():
with temporary_settings({PREFECT_LOGGING_TO_API_ENABLED: False}):
# Increase timeout for CI environments where multiple xdist workers
# start servers simultaneously, which can be slower on Python 3.11+
# See https://github.com/PrefectHQ/prefect/issues/16397
with prefect_test_harness(server_timeout=60):
yield


Expand Down
7 changes: 1 addition & 6 deletions src/integrations/prefect-gcp/tests/projects/test_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,7 @@ def tmp_files(tmp_path: Path):
"testdir2/testfile5.txt",
]

(tmp_path / ".prefectignore").write_text(
"""
testdir1/*
.prefectignore
"""
)
(tmp_path / ".prefectignore").write_text("testdir1/*\n.prefectignore\n")

for file in files:
filepath = tmp_path / file
Expand Down
24 changes: 24 additions & 0 deletions src/integrations/prefect-gcp/tests/test_cloud_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,30 @@ def test_resolve_path(self, gcs_bucket, path):
expected = None
assert actual == expected

def test_resolve_path_no_double_nesting(self, gcs_bucket):
"""
Regression test for https://github.com/PrefectHQ/prefect/issues/20174

When storage_block_id is null (e.g., context serialized to Ray workers),
create_result_record() adds bucket_folder to storage_key via _resolve_path.
Then write_path() calls _resolve_path again. Without the duplicate check,
this causes double-nested paths like "results/results/abc123".
"""
bucket_folder = gcs_bucket.bucket_folder
if not bucket_folder:
pytest.skip("Test only applies when bucket_folder is set")

# Simulate path that already has bucket_folder prefix
# (as would happen when create_result_record calls _resolve_path)
already_prefixed_path = f"{bucket_folder}/abc123"

# When write_path calls _resolve_path again, it should NOT double-nest
result = gcs_bucket._resolve_path(already_prefixed_path)

# Should return the same path, not bucket_folder/bucket_folder/abc123
assert result == already_prefixed_path
assert not result.startswith(f"{bucket_folder}{bucket_folder}")

def test_read_path(self, gcs_bucket):
assert gcs_bucket.read_path("blob") == b"bytes"

Expand Down
Loading