Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more than one str-like type in unions #610

Open
dhirschfeld opened this issue Dec 12, 2023 · 8 comments
Open

Allow more than one str-like type in unions #610

dhirschfeld opened this issue Dec 12, 2023 · 8 comments

Comments

@dhirschfeld
Copy link

dhirschfeld commented Dec 12, 2023

Description

There are a wide variety to objects that get mapped to strings which would seem to preclude being able to properly deserialize them with msgspec 😢

Simple example - encode/decode kwargs consisting of ~primitive types:

>>> from datetime import datetime
>>> import msgspec
>>> kwargs = dict(seed=42, method='test', effective_date=datetime.now())
>>> kwargs
{'seed': 42,
 'method': 'test',
 'effective_date': datetime.datetime(2023, 12, 12, 13, 32, 18, 588721)}

Round-tripping converts datetime to str:

>>> msgspec.json.decode(msgspec.json.encode(kwargs))
{'seed': 42, 'method': 'test', 'effective_date': '2023-12-12T13:32:18.588721'}

...and it's impossible to specify the actual types:

>>> msgspec.json.decode(msgspec.json.encode(kwargs), type=dict[str, int | str | datetime])
Traceback (most recent call last):
  File "C:\python\envs\dev-py310\lib\site-packages\IPython\core\interactiveshell.py", line 3548, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-82-72fbd13b7d1a>", line 1, in <module>
    msgspec.json.decode(msgspec.json.encode(kwargs), type=dict[str, int | str | datetime])
TypeError: Type unions may not contain more than one str-like type (`str`, `Enum`, `Literal[str values]`, `datetime`, `date`, `time`, `timedelta`, `uuid`, `decimal`, `bytes`, `bytearray`) - type `int | str | datetime.datetime` is not supported

It seems there are several similar requests:

Also to allow overriding default representations (which would solve my problem as shown at the bottom):

@dhirschfeld
Copy link
Author

dhirschfeld commented Dec 12, 2023

The crux of the issue seems to be reusing str as the serialialized representation of multiple different types.

Might it be possible to, instead of serializing multiple types as str, instead give each their own custom wrapper so they had a unique serialized representation as a tagged object - e.g.

from typing import Any, Type
import msgspec
from msgspec import Struct

from datetime import (
    date as Date,
    datetime as DateTime,
)
import uuid


class Wrapper(Struct, frozen=True, tag=True):
    pass
    
class DateWrapper(Wrapper):
    value: Date

class DateTimeWrapper(Wrapper):
    value: DateTime

class UUIDWrapper(Wrapper):
    value: uuid.UUID


type_map = {
    Date: DateWrapper,
    DateTime: DateTimeWrapper,
    uuid.UUID: UUIDWrapper,
}


def enc_hook(obj: Any) -> Any:
    try:
        wrapper = type_map[type(obj)]
    except KeyError:
        return obj
    else:
        return wrapper(obj)


def dec_hook(type_: Type, obj: Any) -> Any:
    if isinstance(obj, Wrapper):
        return obj.value
    return obj

@dhirschfeld
Copy link
Author

☝️ pseudo-code, doesn't actually work, but could something like that be made to work?

@dhirschfeld
Copy link
Author

☝️ pseudo-code, doesn't actually work, but could something like that be made to work?

IIUC the example code doesn't work because enc_hook isn't called for "known" types such as datetime so the conversion to str is effectively hardcoded and not overridable?

If that's correct, the fix could perhaps be to just allow enc_hook to override the encoding for "known" types so I can e.g. provide my own object wrapper for datetime?

@dhirschfeld
Copy link
Author

This is pretty critical functionality for me and could be a dealbreaker for my using msgspec to serialize arbitrary kwargs (with a finite set of known types).

If I define an example set of arguments:

from functools import singledispatch
from typing import Any, Type
import msgspec
from msgspec import Struct

from datetime import (
    date as Date,
    datetime as DateTime,
)
import uuid


kwargs = dict(
    seed=42,
    method='test',
    effective_date=DateTime.now(),
    today=DateTime.now().date(),
    uid=uuid.uuid4(),
)
>>> kwargs
{'seed': 42,
 'method': 'test',
 'effective_date': datetime.datetime(2023, 12, 14, 20, 59, 7, 281790),
 'today': datetime.date(2023, 12, 14),
 'uid': UUID('b4aed260-210b-499a-8035-65dbba88b26b')}

I would like to be able to use msgspec to serialize these kwargs to json.

I'm fine defining custom serializers/deserializers, I just want the capability, somehow, in msgspec.

Am I missing something obvious or is there really no way to serialize the example kwargs to JSON using msgspec?

@dhirschfeld
Copy link
Author

dhirschfeld commented Dec 14, 2023

It's possible in the stdlib:

@singledispatch
def enc_hook(obj: Any) -> Any:
    raise NotImplementedError(f"Objects of type {obj!r} are not supported") 

@enc_hook.register
def _(obj: DateTime) -> dict:
    return dict(type='DateTime', value=obj.isoformat())

@enc_hook.register
def _(obj: Date) -> dict:
    return dict(type='Date', value=obj.isoformat())

@enc_hook.register
def _(obj: uuid.UUID) -> dict:
    return dict(type='UUID', value=obj.hex)


def dec_hook(obj: Any) -> Any:
    match obj:
        case {'type': 'DateTime'}:
            return pd.to_datetime(obj['value'], format='ISO8601').to_pydatetime()
        case {'type': 'Date'}:
            return pd.to_datetime(obj['value'], format='ISO8601').date()
        case {'type': 'UUID'}:
            return uuid.UUID(hex=obj['value'])
        case _:
            return obj
>>> json.loads(json.dumps(kwargs, default=enc_hook), object_hook=dec_hook)
{'seed': 42,
 'method': 'test',
 'effective_date': datetime.datetime(2023, 12, 14, 20, 59, 7, 281790),
 'today': datetime.date(2023, 12, 14),
 'uid': UUID('b4aed260-210b-499a-8035-65dbba88b26b')}

Is there a way with msgspec to similarly losslessly roundtrip any object by defining custom serializers/deserializers?

>>> json.loads(json.dumps(kwargs, default=enc_hook), object_hook=dec_hook) == kwargs
True

>>> msgspec.json.decode(msgspec.json.encode(kwargs))
{'seed': 42,
 'method': 'test',
 'effective_date': '2023-12-14T20:59:07.281790',
 'today': '2023-12-14',
 'uid': 'b4aed260-210b-499a-8035-65dbba88b26b'}

>>> msgspec.json.decode(msgspec.json.encode(kwargs)) == kwargs
False

@dhirschfeld
Copy link
Author

Being able to disambiguate and hence roundtrip a dict with both string and date values (amongst others) is critical functionality for me so I'm going with the standard-library json module for now which gives you the flexibility to define your own serialization formats (hence allowing you to make it work).

I'm just posting now to ask if this is likely to be supported by msgspec in the near future?

@ljnsn
Copy link

ljnsn commented May 8, 2024

I think what you're suggesting would be a nice addition. While this isn't supported, it's possible to do what you want by defining a model dynamically.

kwargs = dict(
    seed=42,
    method="test",
    effective_date=DateTime.now(),
    today=DateTime.now().date(),
    uid=uuid.uuid4(),
)


def to_dict(obj: msgspec.Struct) -> dict[str, Any]:
    return {field: getattr(obj, field) for field in obj.__struct_fields__}


Model = msgspec.defstruct("Model", [(k, type(v)) for k, v in kwargs.items()])

to_dict(msgspec.json.decode(msgspec.json.encode(kwargs), type=Model)) == kwargs  # True

@dhirschfeld-ffma
Copy link

While this isn't supported, it's possible to do what you want by defining a model dynamically.


image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants