Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer naming convention when converting objects to structs #636

Merged
merged 1 commit into from
Jan 22, 2024

Commits on Jan 22, 2024

  1. Infer naming convention when converting objects to structs

    Struct types support renaming of fields for encoding/decoding. A common
    use of this is to enforce a camelCase convention in the serialized
    format:
    
    ```python
    import msgspec
    
    class Example(msgspec.Struct, rename="camel"):
        field_one: int
        field_two: int
    
    x = Example(1, 2)
    print(msgspec.json.encode(x))
    #> b'{"fieldOne":1,"fieldTwo":2}
    ```
    
    Previously when converting an object to a struct we'd always use the
    renamed field names rather than the original names. This was true
    whether the input was a `dict`, a non-dict mapping, mapping, or an
    arbitrary object via attributes if `from_attributes=True`. The latter
    two inputs will rarely/never occur when coming from a serialization
    framework, but are more commonly used with database/ORM-like objects. In
    this case it's more likely that the *original* attribute names are more
    useful, as both the database and struct object representations are
    internal to the application (unlike the serialized names which may have
    to match some external convention like camelCase).
    
    We now infer the intended naming schem when a non-dict mapping or object
    is passed to `msgspec.convert` to convert to a `msgspec.Struct` type.
    The inference process is as follows:
    
    - The attribute names are tried first
    
    - If an attribute name is present in the input AND the attribute name
      doesn't match the renamed name, then attribute names are used
      exclusively for the remainder of the conversion process.
    
    - If an attribute name isn't present AND the attribute name doesn't
      match the renamed name, then the renamed name is tried. If the renamed
      name is present, then renamed names are used exclusively for the
      remainder of the conversion process.
    
    A key point here is that inputs may not mix attribute and renamed names
    together - the inference process will decide to use either only one or
    the other depending on what names are present. Using `Example` above:
    
    - An input with `field_one` and `field_two` would be valid
    
    - An input with `fieldOne` and `fieldTwo` would be valid
    
    - An input with `field_one` and `fieldTwo` would error saying
      `field_two` is missing.
    
    - An input with `fieldOne` and `field_two` would error saying `fieldTwo`
      is missing.
    
    The overhead of this inference process is low - at worst only one excess
    `getattr` call is made to determine whether to use the original or
    renamed names.
    
    To reiterate, this change only affects object (non-dict mapping or
    arbitrary object) inputs to `msgspec.convert` when converting to a
    `Struct` type. Inputs of other types like `dict` are still assumed to
    have come from a serialization protocol and will always use the renamed
    names.
    jcrist committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    666c427 View commit details
    Browse the repository at this point in the history