-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encode sort_keys argument #609
Comments
No |
I assume it's to make the output more human-readable. My view is that |
I can provide a use-case: we do a lot of snapshot testing, comparing json serialized AST tree’s. Since we frequently change the tree structure and often have to manually review diffs, we sort the keys before dumping the output. It is pretty easy to sort the keys after the fact, but obviously slower. Although, to be fair, not a big deal since it is only important in testing. |
My purpose for using key sorting, and why I asked, was to make a smooth transition in my work project from the standard Flask json serializer (based on the built-in json library) to msgspec. The Flask developers have made the default key sorting - https://github.com/pallets/flask/blob/main/src/flask/json/provider.py#L149 and I would like all our API users and testers not to notice the transition but I don't find this feature mandatory. It would be great only as compatibility with json/ujson/orjson that have key sorting |
I have exactly the same case as @evgenii-moriakhin, I'm trying to integrate msgspec into Flask applications. |
Update - I have a working implementation of this, only need to figure out a nice spelling. In summary - we expose 3 levels of sorting:
The question is - what to name this kwarg and options? For kwarg name, I don't want to use For values, I think I want The reason I want # Example of potential kwarg naming and semantics
msgspec.json.encode(obj) # defaults to no sorting
msgspec.json.encode(obj, sort=False) # no sorting
msgspec.json.encode(obj, sort=True) # only sorts dicts and sets, Structs remain in field order
msgspec.json.encode(obj, sort="canonical") # sorts sets, dicts, and any object keys Thoughts? |
I don't like the ability to pass both a boolean type and a string; from an API friendliness standpoint, it seems quite ambiguous. At the same time, what alternatives can be suggested? The only things that come to mind are Enums or option flags like in https://github.com/ijl/orjson#option. Neither of these two approaches is KISS-like, but they are more explicit (and more meaningful naming can be chosen), which, in my opinion, corresponds to the Zen of Python. It is also worth considering whether there are plans for other serialization options in msgspec in the future? (Again, see example https://github.com/ijl/orjson#option). In this case, the flags option seems the most advantageous, as it allows for expressing explicitness, meaningful naming, and the ability to extend serialization options in the future. This configuration approach with bitwise flags is also used in other popular libraries used in Python (not just orjson) - for example, libvirt. |
I’d vote strongly in favor of an enum. Afaik this is much more widespread [in Python ecosystem] and explicit than binary flags like orjson uses. Flags & bits are more domain of lower level language like C. Unless you plan to support dict/set and structured objects sorting independently. Then flag it is. P.S. curious why sorting structured types is a bigger performance impact than dicts/sets. Thought that for structured types it is a “compile” (encoder build) time impact, while dicts/sets it is an actual item sort. |
I personally don't like using bitwise flags as they don't feel pythonic, and am against the usage of enums (or enums alone) to avoid the need for additional imports just to set a config. There's lots of precedence in python for usage of a fixed set of literals to configure an option (hence the existence of
The implementation I have currently doesn't do the sorting at compile-time because I didn't want to bother with it yet and assumed any code paths requiring sorting would be less performance oriented than the default no-sorting path. We could optimize this path further with structured types (easiest for |
However, you suggested using boolean types. Are there any reasons to use True/False for values instead of string literals? It could be useful if the argument was called sort_keys, but the name is being changed anyway. |
Sure, but boolean values are valid in literal types as well. This would be typed something like: def encode(..., sort: Literal[True, False, "canonical"]=False): ... I'm not attached to using |
Got it, thanks for clarification. As I’ve posted, in my use case it is not performance critical indeed. Rgd enums vs. literals. If you want zero imports, I would then vote for two independent kwargs, e.g. Not sure if a union of boolean & string is pythonic. Since you are not evolving an existing API, but creating a new one from scratch, there is nothing that forces you to mix types like this. What if this is stored as an app config - deserializing a config that may be a string and boolean may be ambiguous itself (depending on the format of course) |
Would having two arguments work? def encode(..., sort: bool, sort_options: Literal["collections", "all"]='collections'): ... I'm not sure about the naming of the Edit: Just saw the above post 🤦 If you wanted control to sort struct fields independently of collections the |
I think splitting the sorting capability into separate options is excessive, and overall I like the approach with string literals without boolean values
it solves the problems of explicit and meaningful naming |
I don't find this argument convincing - if we cared about this we'd only ever pass in strings as config variables to any function. A limited type system in some legacy config format shouldn't limit the options we can use in python.
I also would really prefer to keep this configured via one keyword argument. Beyond limiting the number of options a user needs to understand, there's no real use case for the case of
A few additional kwarg ideas: # def encode(..., order: Literal[None, "deterministic", "sorted"] = None):
# could also call the kwarg `ordering`
encode(obj, order=None) # could also call this "unordered", but I prefer None
encode(obj, order="deterministic")
encode(obj, order="sorted")
# def encode(..., sort_level: Literal[0, 1, 2] = 0):
# treats sorting as a knob to turn up. I don't really like this one.
encode(obj, sort_level=0)
encode(obj, sort_level=1)
encode(obj, sort_level=2) Right now I'm leaning towards the former. I agree it reads better than the |
# def encode(..., order: Literal[None, "deterministic", "sorted"] = None):
# could also call the kwarg `ordering`
encode(obj, order=None) # could also call this "unordered", but I prefer None
encode(obj, order="deterministic")
encode(obj, order="sorted") I like this one |
Question
Hi
I couldn't find any documentation on open\closed issues either
Is there any possibility to sort keys during encode operation?
Like json.dumps(..., sort_keys=True)
The text was updated successfully, but these errors were encountered: