Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support omitting Struct attributes from encoding/decoding #199

Open
jcrist opened this issue Oct 8, 2022 · 12 comments
Open

Support omitting Struct attributes from encoding/decoding #199

jcrist opened this issue Oct 8, 2022 · 12 comments

Comments

@jcrist
Copy link
Owner

jcrist commented Oct 8, 2022

Since Struct types force users to declare all Struct attributes, there's currently no way to hold extra state on a Struct without having it also be encoded/decoded.

Two ways to handle this:

  • Add omit=True to Meta annotations. This could also be packaged up as a parametrized type, so users could write
class Point(Struct):
    x: int
    y: int
    _private: Omit[some_type]
  • Excluded attributes with a leading underscore by default? I'm not sure if we should support this one, it may be confusing behavior. As a reference point, golang's json doesn't encode/decode private struct attributes by default.
@IcyMidnight
Copy link

Yes please! I would like to cache some (expensive) computed values in my objects, but not serialize them.

@jcrist
Copy link
Owner Author

jcrist commented Nov 28, 2022

Thanks for the motivating use case (and apologies for the delayed response)! I hope to tackle this issue in the next few weeks before issuing a new release.

Since you commented here - as a user, would you want omitted attributes to also be excluded from __init__? Giving a concrete example:

class Point(Struct):
    x: int
    y: int
    _private_1: Omit[int]
    _private_2: Omit[int] = 0

Should __init__ include _private_1 and _private_2 arguments? Or should they be excluded from __init__, and only default values filled in? If they're excluded, that would mean that _private_1 would raise an AttributeError until it's explicitly set, and _private_2 would have an initial value of 0 after __init__.

@IcyMidnight
Copy link

Honestly I haven't been using the constructors much (decoding from a web request, encoding to a disk cache, decoding from the disk cache) so I don't have a strong opinion. I guess given that the object is valid without those values, the constructor shouldn't require them, but it would be nice if you could include them.

Also I would lean towards excluding attributes with leading underscores by default (without the Omit type), but I could see it being more annoying to have to include them if some API you're using has names like that.

@jcrist
Copy link
Owner Author

jcrist commented Jan 26, 2023

Ok, we finally have the internals setup to make this fairly straightforward to implement.

First, I plan to call these "private" attributes, since that's effectively what they are. A private attribute won't be encoded or decoded, but will have space allocated on its respective struct.

There are some open questions though.

How should this be spelled?

There are two places that field configuration like this can go:

In the type annotation

class Example(Struct):
    x: int
    y: Private[int]
    z: Private[float] = 0.0

An option in msgspec.field

class Example(Struct):
    x: int
    y: int = msgspec.field(private=True)
    z: int = msgspec.field(default=0.0, private=True)

The former I find a bit easier to read, but it adds some complications:

  • The type annotations would have to be parsed at struct definition time, same as they would for ClassVar. This is doable, but to remain efficient can't handle all cases in the case of postponed annotations.
  • A Private type annotation by its nature must be applied to a field - a type list[Private[int]] is invalid, but python would happily let the user spell such a type. Using a default value with field ties this annotation to struct fields alone, avoiding this issue.
  • It's much harder to get this approach to play well with dataclass_transform, treating Private fields differently from public ones.

Right now I'm leaning towards the msgspec.field approach. Many users will already be familiar with dataclasses.field, which includes support for omitting fields from the generated __init__ via dataclasses.field(init=False). This is similar enough to private=True that it shouldn't be that wild of a concept.

Should fields with leading underscores be private by default?

Since we're providing a way to set whether a field is public/private explicitly, users can always override any default behavior chosen here. Since fields with leading underscores are usually treated as private in python by convention, I'm tempted to do so here as well. So the following would be equivalent:

class Example1(Struct):
    _field: int = 0  # private, since the leading underscore

class Example2(Struct):
    _field: int = field(default=0, private=True)  # explicitly private

On the other hand, I'd expect most users to never need to define private fields, so maybe inferring private/public status from the field name itself is overreaching.

Are private attributes excluded from __init__?

Right now I'm planning on excluding private attributes from basically everything in msgspec. They'll have space allocated in the __slots__ of the generated Struct type (and corresponding attributes defined), but will be excluded from:

  • __init__
  • __copy__
  • __reduce__ (pickling)
  • __repr__
  • __eq__ and other comparison methods
  • __hash__
  • __match_args__
  • encode
  • decode

This makes them most useful for including extra state during runtime that is initialized after the Struct instance is already created. cached_property usage is related here (#204), since you'd need a slot allocated to store the cached value.

If you have a use case for including private attributes in __init__, please let comment below. If needed, worst case we can add another init parameter for including/excluding a field from __init__. private=True will imply init=False by default, but you can manually re-include it as field(private=True, init=True).

Should private fields require a default value/factory?

Right now all msgspec.Struct types require all fields be initialized as part of the generated __init__. The only way to get an AttributeError for a field on a struct is if you manually del the attribute off an existing struct. I think this makes sense. mypy/pyright type annotations are designed to handle types (or unions of types), but not the presence/absence of an attribute.

Keeping with this design (and assuming private fields are excluded from __init__), it may make sense to enforce that all private fields have a default value (since there's no other way a value could be set).

class Example(Struct):
    private1: int | None = field(private=True, default=None)  # this is ok, there's a default value
    private2: int | None = field(private=True)  # this is invalid, there's no default value

Summary

In summary, my current leanings on these questions are:

  • Use msgspec.field for configuring if a field is private.
  • Make fields with a leading underscore private by default.
  • Exclude private fields from __init__. We can add a way to override this later if needed.
  • Require all private fields set a default value.

I don't plan on adding this before the next release (hopefully out next week), but after that whatever choices made here should (hopefully) be fairly straightforward to implement 🤞 .

@Solumin
Copy link

Solumin commented Jul 11, 2023

This would be of great use to my current project --- like IcyMidnight, we've got a cache that would be really pointless to encode. The current plan is to clear the cache before encoding, which will work as long as we remember to do it.

  • Use msgspec.field for configuring if a field is private.
  • Make fields with a leading underscore private by default.

What happens if you do _something = field(private=False)? I think this should be allowed: a private field (in the Python convention sense) that's not private (in the msgspec sense).

Additionally, automatically treating fields with a leading underscore as private would break backwards compatibility. I found a few examples of Structs with _private fields which makes me think that making private fields an explicit, opt-in choice is a better idea.

Really looking forward to this feature, I hope you have time to work on it soon.

@illeatmyhat
Copy link

illeatmyhat commented Aug 10, 2023

Adding a private=True parameter to field() seems fine to me.
Private omitted fields are the only way to have self-referential Structs.
e.g. a nested and occasionally recursive data structure with access to the _root node.

We can also use it to compute and store the path in that data structure because Structs aren't aware of their parent nodes.

@FeryET
Copy link

FeryET commented Sep 19, 2023

Hi. Any update on this? I'm confused whether the library has this attribute.

@ahopkins
Copy link

Right now I'm leaning towards the msgspec.field approach. Many users will already be familiar with dataclasses.field, which includes support for omitting fields from the generated init via dataclasses.field(init=False). This is similar enough to private=True that it shouldn't be that wild of a concept.

If this is still up for debate, I concur that this is the right approach.

If you have a use case for including private attributes in init, please let comment below. If needed, worst case we can add another init parameter for including/excluding a field from init. private=True will imply init=False by default, but you can manually re-include it as field(private=True, init=True).

class Primitive(Struct):
    iid: int = field(private=True)    # internal ID, think auto-incrementing sequence meant to be private
    eid: str = field(private=False)   # external ID, some sort of public-exposed string identifier

In this use case, the object when fetched from the DB would have both IDs so the application layer has them available. But, when serialized as an HTTP response, we strip out the private iid.

Use msgspec.field for configuring if a field is private.

Agreed.

Make fields with a leading underscore private by default.

This is a nice API 😎

Exclude private fields from init. We can add a way to override this later if needed.

See my comment above. I am not sold on this one. The two paramater approach makes more sense to me as they are different use cases. I am fine if private=True implies init=False. But, I want to have the flexibility for private=True, init=True.

Require all private fields set a default value.

I guess I am okay with this, as long as I can still require it as an __init__. But once that is allowed, it seems to me this requirement falls off.

@Solumin
Copy link

Solumin commented Nov 21, 2023

Would private fields also be excluded from __struct_fields__?

@Solumin Solumin mentioned this issue Nov 21, 2023
@dhirschfeld
Copy link

dhirschfeld commented Nov 23, 2023

I came here looking for this functionality so I can store some extra metadata on a Struct that I don't want to be serialized or part of the __init__.

I'm envisaging an api where I can have an alternative constructer which can take optional (metadata) kwargs and store them on the struct without having them be included in the serialized representation - e.g.

class MyStruct(msgspec.Struct, tag=True):
    metadata: dict = field(default={}, private=True)
    
    @classmethod
    def create(cls, **kwargs) -> MyStruct:
        instance = MyStruct()
        instance.metadata.update(**kwargs)
        return instance
>>> obj = MyStruct.create(key='value')

>>> obj
MyStruct(metadata={'key': 'value'})

>>> msgspec.json.encode(obj)
b'{"type":"MyStruct"}'

>>> msgspec.json.decode(msgspec.json.encode(obj), type=MyStruct)
MyStruct(metadata={})

My actual use-case is a bit more complex but it boils down to wanting metadata associated with my objects but only for their lifetime in the current interpreter.

@dhirschfeld
Copy link

Another example, of something I'd like to do is to attach an fsspec.AbstractFileSystem to the class which would then be used in a save method to serialize itself and write to disk - e.g.

class MyStruct(msgspec.Struct, tag=True):
    metadata: dict = field(default={})
    fs: fsspec.AbstractFileSystem = field(private=True)

    @classmethod
    def create(cls, fs: fsspec.AbstractFileSystem, **kwargs) -> MyStruct:
        instance = MyStruct()
        instance.metadata.update(**kwargs)
        instance.fs = fs
        return instance

    def save(self, filepath: str) -> None:
        self.fs.write_bytes(
            path=filepath,
            value=msgspec.msgpack.encode(self),
        )

msgspec doesn't know how to serialize an AbstractFileSystem, and I wouldn't want it to be serialized along with my actual data, however it would be convenient to be able to store it on the instances.

@c0x65o
Copy link

c0x65o commented Apr 22, 2024

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants