Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a Unique validator #1793

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from

Conversation

bonastreyair
Copy link

@bonastreyair bonastreyair commented Apr 15, 2021

At @midokura we have encountered that marshmallow did not have any duplicate validator. We are sharing the implementation we are currently using so that others could benefit from it too.

This new validator is designed to work only with iterables.

In the case of having a value which is a list of objects you can check if one is repeated using an attribute of the object to check if it is repeated or not.

pytests have also been added.

Hope it can be reviewed and at some point included in a future release.

@bonastreyair bonastreyair changed the title add NoDuplicates new validator add NoDuplicates validator Apr 15, 2021
Copy link
Member

@deckar01 deckar01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature has been requested in the past. See #1539. The resolution was fairly neutral, so I suspect a working implementation coupled with nested attribute handling gives this a better chance of landing.

src/marshmallow/validate.py Outdated Show resolved Hide resolved
src/marshmallow/validate.py Outdated Show resolved Hide resolved
src/marshmallow/validate.py Outdated Show resolved Hide resolved
@lafrech
Copy link
Member

lafrech commented Apr 15, 2021

I implemented something similar in a project of mine, but never clean enough to share it here.

I had issues, and I think this implem has the same issues, if the iterable contains items that are not hashable because of the set being used.

I ended up using a list rather than a set for the comparison if items are not hashable. I added a hashable argument to the validator to use sets when possible, for performance, but a list would probably be fine in all cases.

This is old and I didn't investigate it much more than this today. Not 100% sure it is relevant. Just bringing this up to avoid letting the issue slip through.

@deckar01
Copy link
Member

deckar01 commented Apr 16, 2021

items that are not hashable

I am tempted to suggest ignoring that use case, but it wouldn't take much code to fall back to the list based membership check when a TypeError occurs.

def __call__(self, value):
    ids = [
        getattr(item, self.attribute) if self.attribute else item
        for item in value
    ]
    try:
        duplicate = self._duplicate_hash(ids)
    except TypeError:
        duplicate = self._duplicate_equal(ids)
    if duplicate:
        raise ValidationError(self._format_error(duplicate))
Side rant

I would be nice if the type system inherited a generic Mutable interface that exposed the add method on lists so that the iterable type could simply be configured by the user.

@bonastreyair bonastreyair changed the title add NoDuplicates validator add a Unique validator Apr 19, 2021
@bonastreyair
Copy link
Author

@deckar01 and @lafrech check out your suggested changes

@lafrech
Copy link
Member

lafrech commented Apr 22, 2021

Thanks.

This leaves open @deckar01's comment above about get_value. Also, validation happens on load and we don't generally have objects after a load but rather builtin structures (dict, list), so I don't see the point of attribute access.

The docstring and error messages could be reworded for better english but this can be done afterwards.

The non-hashable case is not tested.

We generally don't send back faulty values in error messages. Just the fact that the value is invalid.

Otherwise, I agree on the principle.

@bonastreyair
Copy link
Author

  • Added utils.get_value()
  • Added hashable tests
  • Improved typing

Should we remove the value that is not unique in the error then?

Regarding the use of attribute:
Imagine we have a marshmallow schema with a field which is a List of another marshmallow object. Then we may encounter that the elements in that List are objects, and that's why the use of attributes might help to check if the elements on the list are unique.

That solves the case when you have this List with many different objects and an id is assigned to each one of them. If the id is not unique (even if the object is different) the the validation will fail.

@lafrech
Copy link
Member

lafrech commented Apr 23, 2021

I understand the need to check the uniticy of a subitem (I managed the same case in an app of mine).

What I'm saying is that unless you use a post-load decorator to instantiate an object, you're dealing with dicts at this stage so getattr shouldn't work. Or am I missing something?

@bonastreyair
Copy link
Author

bonastreyair commented Apr 23, 2021

Actually that's a question I had when dealing with this.

If I set a field of a List[<marshmallow_object>] that uses the proposed Unique validator, the variable value in the function def __call__(self, value) already receives the <marshmallow_object> itself. That's why I set the attribute.

Example:

@dataclass
class Bar:
    custom_id: int


@dataclass
class Foo:
    bar_list: List[Bar] = field(
        metadata={
            "validate": Unique(attribute="custom_id")
        }
    )


foo = {"bar_list": [{"custom_id": 1}, {"custom_id": 1}]}

FooSchema = marshmallow_dataclass.class_schema(Foo)()
FooSchema.load(foo)

This triggers the ValidationError exception:

marshmallow.exceptions.ValidationError: {'bar_list': ['Found a duplicate object attribute (custom_id): 1.']}

Debugging the variables when Unique function gets triggered I see that value already has a List[Bar] and not a List[Dict].

value = [Bar(custom_id=1), Bar(custom_id=1)]

Let me know if I'm also missing something... may it be because I'm using marshmallow_dataclass library to generate the Schemas?

@lafrech
Copy link
Member

lafrech commented Apr 23, 2021

may it be because I'm using marshmallow_dataclass library to generate the Schemas?

marshmallow-dataclass does instantiation on load so yes, that would be the reason you have an object at this stage and not a dict.

A test should be added to validate unique "attribute" using dicts, not objects.

@bonastreyair
Copy link
Author

bonastreyair commented Apr 30, 2021

@lafrech dict unittests have been added, also added a nested attribute unittests.

Should object tests be removed?

@bonastreyair
Copy link
Author

@deckar01 @lafrech is there anything missing before it can be merged? I've just merged from dev to add latest changes

Copy link
Member

@lafrech lafrech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If value is not attribute/item, I'm afraid get_value will return missing.

If the list has multiple elements, validation will fail due to multiple missing but if it has a single element, it will pass silently, which is bad, at this stage (after deserialization).

Maybe it's not such a good idea to use get_value, or missing should be caught. Or I'm totally mistaken.

Also, I wouldn't refer to value as "attribute" name since it can also be an item. Maybe key, to match data_key.

It might be too complicated to achieve in a generic enough fashion to reach the core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants