Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix python GC traversal for validators and serializers #787

Merged
merged 1 commit into from
Jul 18, 2023
Merged

Conversation

davidhewitt
Copy link
Contributor

@davidhewitt davidhewitt commented Jul 18, 2023

Change Summary

This PR separates the py_gc_traverse methods from serializers and validators out into a separate PyGcTraverse trait, forcing all types to provide an implementation.

This gave me a mechanical way to check the traversal for all validators and serializers and implement them fully.

I also added a spattering of tests which attempt to generate cycles and leak them. They fail without this patch which gives me confidence this fixes the original pydantic issue listed below.

There is still no guarantee that the traversal implementations are correct, as they're hand-written, but this should make them prevalent enough to force us to remember to look at them.

Related issue number

pydantic/pydantic#6727

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @adriangb

@codecov
Copy link

codecov bot commented Jul 18, 2023

Codecov Report

Merging #787 (10f329e) into main (33fea1e) will decrease coverage by 0.15%.
The diff coverage is 70.45%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #787      +/-   ##
==========================================
- Coverage   93.67%   93.53%   -0.15%     
==========================================
  Files          99      100       +1     
  Lines       14322    14376      +54     
  Branches       25       25              
==========================================
+ Hits        13416    13446      +30     
- Misses        900      924      +24     
  Partials        6        6              
Impacted Files Coverage Δ
src/lib.rs 100.00% <ø> (ø)
src/serializers/computed_fields.rs 95.86% <0.00%> (-2.44%) ⬇️
src/serializers/fields.rs 94.47% <ø> (ø)
src/serializers/mod.rs 99.50% <ø> (+0.95%) ⬆️
src/serializers/type_serializers/any.rs 100.00% <ø> (ø)
src/serializers/type_serializers/bytes.rs 100.00% <ø> (ø)
src/serializers/type_serializers/dataclass.rs 90.90% <ø> (-0.32%) ⬇️
src/serializers/type_serializers/datetime_etc.rs 98.30% <ø> (ø)
src/serializers/type_serializers/definitions.rs 93.22% <ø> (ø)
src/serializers/type_serializers/dict.rs 95.55% <ø> (ø)
... and 58 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33fea1e...10f329e. Read the comment docs.

@davidhewitt
Copy link
Contributor Author

please review

@davidhewitt
Copy link
Contributor Author

(I also removed the py_gc_clear methods, because they were all noops and not needed - we can generally let python dicts take responsibility for breaking cycles.)

@codspeed-hq
Copy link

codspeed-hq bot commented Jul 18, 2023

CodSpeed Performance Report

Merging #787 will not alter performance

Comparing dh/py-gc (10f329e) with main (33fea1e)

Summary

✅ 126 untouched benchmarks

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Is the testing strategy basically to test any validators that have a reference to python objects? Is this a "complete" test scenario or a sample?

Comment on lines +20 to +36
impl<T: PyGcTraverse> PyGcTraverse for Vec<T> {
fn py_gc_traverse(&self, visit: &PyVisit<'_>) -> Result<(), PyTraverseError> {
for item in self {
item.py_gc_traverse(visit)?;
}
Ok(())
}
}

impl<T: PyGcTraverse> PyGcTraverse for AHashMap<String, T> {
fn py_gc_traverse(&self, visit: &PyVisit<'_>) -> Result<(), PyTraverseError> {
for item in self.values() {
item.py_gc_traverse(visit)?;
}
Ok(())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these might even make sense in PyO3 at some point, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though probably not in this form. In PyO3 I'd like (if it's possible) to do it automatically with a #[derive] macro, however the challenge is how to make most types noops and enable gc traversal for just the ones that matter. I'm sure there's an issue about this in the PyO3 backlog but I can't find it right now.

tests/validators/test_model_init.py Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Contributor Author

In principle we would want to test all validators and serializers for all fields they accept which can contain python objects (to check that those fields are properly traversed), however that's a long set and wouldn't catch any changes to the implementations.

At least the tests I added should cover the cases which are most likely to produce a cycle.

@davidhewitt davidhewitt enabled auto-merge (squash) July 18, 2023 18:53
@davidhewitt davidhewitt merged commit 68453df into main Jul 18, 2023
@davidhewitt davidhewitt deleted the dh/py-gc branch July 18, 2023 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants