Skip to content

Conversation

@gverkes
Copy link

@gverkes gverkes commented Oct 13, 2025

Fixes #6669

Tracking issue

Closes flyteorg/flyte#6669

Why are the changes needed?

When deserializing Pydantic models containing FlyteFile or FlyteDirectory fields using model_validate(), the deserialized objects were
missing private attributes (_remote_source, _downloader, etc.). This caused an AttributeError when attempting to re-serialize these
objects with model_dump(), breaking the serialize → deserialize → serialize cycle.

This is a critical bug that prevents users from using FlyteFile/FlyteDirectory within Pydantic BaseModel classes in a normal way.

What changes were proposed in this pull request?

Added Pydantic model validators to both FlyteFile and FlyteDirectory classes that ensure private attributes are properly initialized during
deserialization:

  1. FlyteFile (flytekit/types/file/file.py):

    • Enhanced deserialize_flyte_file validator to check if private attributes exist
    • If missing, reconstructs the FlyteFile using dict_to_flyte_file() transformer
    • If attributes already exist (e.g., when passing already-constructed FlyteFile), returns as-is
  2. FlyteDirectory (flytekit/types/directory/types.py):

    • Applied same fix to deserialize_flyte_dir validator
    • Uses dict_to_flyte_directory() to properly reconstruct the object

How was this patch tested?

Added two new unit tests in test_pydantic_basemodel_transformer.py:

  1. test_flytefile_pydantic_model_dump_validate_cycle - Verifies FlyteFile can be serialized, deserialized, and re-serialized without errors
  2. test_flytedirectory_pydantic_model_dump_validate_cycle - Same for FlyteDirectory

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This pull request fixes a critical bug in deserializing Pydantic models with FlyteFile and FlyteDirectory fields by adding model validators for private attributes. It ensures proper initialization during deserialization and includes new unit tests to validate the functionality of these enhancements.

@welcome
Copy link

welcome bot commented Oct 13, 2025

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

if info.context is None or info.context.get("deserialize") is not True:
return self
# Check if all private attributes are already set up (e.g., from __init__)
if hasattr(self, "_downloader") and hasattr(self, "_remote_source"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
Do we need to check for the other private vars like _downloaded too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or will they all together be set or not set anyways?


dict_obj = {"path": str(self.path)}
metadata = getattr(self, "metadata", None)
if metadata is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check:
@gverkes did you test whether we need to do this for flyte directory too?

fg91
fg91 previously approved these changes Oct 13, 2025
Copy link
Member

@fg91 fg91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and can confirm that the fix works, thanks for fixing 🙏

My approval appears to be not enough to merge, I'm not a code owner for those dirs. Could you please take a look @davidmirror-ops or @pingsutw ? 🙏

  Fixes #6669

Signed-off-by: Govert Verkes <gmverkes@gmail.com>
Signed-off-by: Govert Verkes <govert@cusp.ai>
@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from 82990a6 to 628cf96 Compare October 13, 2025 19:23
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 18.18182% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.32%. Comparing base (ff4c79c) to head (628cf96).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/types/file/file.py 14.28% 5 Missing and 1 partial ⚠️
flytekit/types/directory/types.py 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3339       +/-   ##
===========================================
+ Coverage   48.48%   79.32%   +30.83%     
===========================================
  Files         236      216       -20     
  Lines       23723    22687     -1036     
  Branches     2970     2973        +3     
===========================================
+ Hits        11502    17996     +6494     
+ Misses      11643     3854     -7789     
- Partials      578      837      +259     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…y validators

Signed-off-by: Govert Verkes <govert@cusp.ai>
@gverkes gverkes force-pushed the fix/pydantic-flytefile-deserialization branch from e279566 to f4a16e4 Compare October 18, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Deserializing Pydantic models with FlyteFile fields creates FlyteFile instances that can't be serialized again.

2 participants