Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer naming convention when converting objects to structs #636

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jan 22, 2024

Struct types support renaming of fields for encoding/decoding. A common use of this is to enforce a camelCase convention in the serialized format:

import msgspec

class Example(msgspec.Struct, rename="camel"):
    field_one: int
    field_two: int

x = Example(1, 2)
print(msgspec.json.encode(x))
#> b'{"fieldOne":1,"fieldTwo":2}

Previously when converting an object to a struct we'd always use the renamed field names rather than the original names. This was true whether the input was a dict, a non-dict mapping, mapping, or an arbitrary object via attributes if from_attributes=True. The latter two inputs will rarely/never occur when coming from a serialization framework, but are more commonly used with database/ORM-like objects. In this case it's more likely that the original attribute names are more useful, as both the database and struct object representations are internal to the application (unlike the serialized names which may have to match some external convention like camelCase).

We now infer the intended naming schem when a non-dict mapping or object is passed to msgspec.convert to convert to a msgspec.Struct type. The inference process is as follows:

  • The attribute names are tried first

  • If an attribute name is present in the input AND the attribute name doesn't match the renamed name, then attribute names are used exclusively for the remainder of the conversion process.

  • If an attribute name isn't present AND the attribute name doesn't match the renamed name, then the renamed name is tried. If the renamed name is present, then renamed names are used exclusively for the remainder of the conversion process.

A key point here is that inputs may not mix attribute and renamed names together - the inference process will decide to use either only one or the other depending on what names are present. Using Example above:

  • An input with field_one and field_two would be valid

  • An input with fieldOne and fieldTwo would be valid

  • An input with field_one and fieldTwo would error saying field_two is missing.

  • An input with fieldOne and field_two would error saying fieldTwo is missing.

The overhead of this inference process is low - at worst only one excess getattr call is made to determine whether to use the original or renamed names.

To reiterate, this change only affects object (non-dict mapping or arbitrary object) inputs to msgspec.convert when converting to a Struct type. Inputs of other types like dict are still assumed to have come from a serialization protocol and will always use the renamed names.

Fixes #630.

Struct types support renaming of fields for encoding/decoding. A common
use of this is to enforce a camelCase convention in the serialized
format:

```python
import msgspec

class Example(msgspec.Struct, rename="camel"):
    field_one: int
    field_two: int

x = Example(1, 2)
print(msgspec.json.encode(x))
#> b'{"fieldOne":1,"fieldTwo":2}
```

Previously when converting an object to a struct we'd always use the
renamed field names rather than the original names. This was true
whether the input was a `dict`, a non-dict mapping, mapping, or an
arbitrary object via attributes if `from_attributes=True`. The latter
two inputs will rarely/never occur when coming from a serialization
framework, but are more commonly used with database/ORM-like objects. In
this case it's more likely that the *original* attribute names are more
useful, as both the database and struct object representations are
internal to the application (unlike the serialized names which may have
to match some external convention like camelCase).

We now infer the intended naming schem when a non-dict mapping or object
is passed to `msgspec.convert` to convert to a `msgspec.Struct` type.
The inference process is as follows:

- The attribute names are tried first

- If an attribute name is present in the input AND the attribute name
  doesn't match the renamed name, then attribute names are used
  exclusively for the remainder of the conversion process.

- If an attribute name isn't present AND the attribute name doesn't
  match the renamed name, then the renamed name is tried. If the renamed
  name is present, then renamed names are used exclusively for the
  remainder of the conversion process.

A key point here is that inputs may not mix attribute and renamed names
together - the inference process will decide to use either only one or
the other depending on what names are present. Using `Example` above:

- An input with `field_one` and `field_two` would be valid

- An input with `fieldOne` and `fieldTwo` would be valid

- An input with `field_one` and `fieldTwo` would error saying
  `field_two` is missing.

- An input with `fieldOne` and `field_two` would error saying `fieldTwo`
  is missing.

The overhead of this inference process is low - at worst only one excess
`getattr` call is made to determine whether to use the original or
renamed names.

To reiterate, this change only affects object (non-dict mapping or
arbitrary object) inputs to `msgspec.convert` when converting to a
`Struct` type. Inputs of other types like `dict` are still assumed to
have come from a serialization protocol and will always use the renamed
names.
@jcrist jcrist merged commit de1a87b into main Jan 22, 2024
8 checks passed
@jcrist jcrist deleted the from-attributes-infer-use-orig-names branch January 22, 2024 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot convert with from_attributes when using a rename convention
1 participant