Mapped as a Python class. There are some special avro attributes like aliases
, namespace
and doc
(both not required) that can be specified in a record type.
The doc
attribute can be set via the docstring class. The aliases
and namespaces
must be set using Class Meta
.
import dataclasses
from dataclasses_avroschema import AvroModel
@dataclasses.dataclass
class User(AvroModel):
"My User Class"
name: str
age: int
has_pets: bool = False
money: float = 100.3
class Meta:
namespace = "test.com.ar/user/v1"
aliases = ["User", "My favorite User"]
User.avro_schema()
'{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "long"},
{"name": "has_pets", "type": "boolean", "default": false},
{"name": "money", "type": "double", "default": 100.3}
],
"doc": "My User Class",
"namespace": "test.com.ar/user/v1",
"aliases": ["User", "My favorite User"]
}'
(This script is complete, it should run "as is")
The class Meta
is used to specify schema attributes that are not represented by the class fields like namespace
, aliases
and whether to include the schema documentation
. Also custom schema name (the default is the class' name) via schema_name
attribute, alias_nested_items
when you have nested items and you want to use custom naming for them, custom dacite
configuration can be provided, field_order
and exclude
.
class Meta:
schema_name = "Name other than the class name"
schema_doc = False
namespace = "test.com.ar/user/v1"
aliases = ["User", "My favorite User"]
alias_nested_items = {"address": "Address"}
field_order = ["age", "name",]
exclude = ["last_name",]
dacite_config = {
"strict_unions_match": True,
"strict": True,
}
schema_doc Union[boolean, str]
: Whether include the schema documentation
generated from docstrings
. Default True
. If the value is a string
if will be used to generate the schema documentation.
namespace Optional[str]
: Schema namespace. Default None
aliases Optional[List[str]]
: Schema aliases. Default None
alias_nested_items Optional[Dict[str, str]]
: Nested items names
field_order Optiona[List[str]]
: List of field names to specify their order to the output schema
exclude Optiona[List[str]]
: List of field names to be excluded in the output schema
You can get the json
and dict
representation of your instance using to_json
and to_dict
methods:
import dataclasses
from dataclasses_avroschema import AvroModel
@dataclasses.dataclass
class User(AvroModel):
"My User Class"
name: str
age: int
has_pets: bool = False
money: float = 100.3
user = User(name="Bond", age=50)
user.to_json()
# >>> '{"name": "Bond", "age": 50, "has_pets": false, "money": 100.3}'
user.to_dict()
# >>> {'name': 'Bond', 'age': 50, 'has_pets': False, 'money': 100.3}
(This script is complete, it should run "as is")
It is possible to create python instances
from a dictionary
using the parse_obj
method. If you are familiar with pydantic
, this functionality does the same.
Under the hood dataclasses-avroschema
uses dacite with a default configuration:
"check_types": False,
"forward_references": {
Model.__name__: Model,
},
where Model
is the model that you have defined in your code
import typing
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
@dataclass
class Bus(AvroModel):
driver: str
total: int
data = {"driver": "Marcos", "total": 10}
bus = Bus.parse_obj(data=data)
print(bus)
# >>>> Bus(driver='Marcos', total=10)
(This script is complete, it should run "as is")
There are some use cases where a custom dacite
config is needed, so you can provide one using the dacite_config
in the class Meta
.
For example using Strict unions match
=== "Default dacite configuration"
```python
import typing
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
@dataclass
class Car(AvroModel):
total: int
@dataclass
class Bus(AvroModel):
driver: str
total: int
@dataclass
class Trip(AvroModel):
transport: typing.Union[Car, Bus]
data = {"driver": "Marcos", "total": 10}
bus = Bus.parse_obj(data=data)
serialized_val = Trip(transport=bus).serialize()
print(Trip.deserialize(serialized_val, create_instance=False))
# >>> {"transport": {"driver": "Marcos", "total": 10}}
instance = Trip.deserialize(serialized_val)
print(instance.transport) # This is a Car but it should be a Bus!!!
# >>> Car(total=10)
```
=== "Custom dacite configuration"
```python
import typing
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
@dataclass
class Car(AvroModel):
total: int
@dataclass
class Bus(AvroModel):
driver: str
total: int
@dataclass
class Trip(AvroModel):
transport: typing.Union[Car, Bus]
class Meta:
dacite_config = {
"strict_unions_match": True,
"strict": True,
}
data = {"driver": "Marcos", "total": 10}
bus = Bus.parse_obj(data=data)
serialized_val = Trip(transport=bus).serialize()
print(Trip.deserialize(serialized_val, create_instance=False))
# >>> {"transport": {"driver": "Marcos", "total": 10}}
instance = Trip.deserialize(serialized_val)
print(instance.transport) # Is it s Bus and not a Car!!!
# >>> Bus(driver='Marcos', total=10)
```
(This script is complete, it should run "as is")
Python classes that inheritance from AvroModel
has a validate
method. This method validates
whether the instance data matches
the schema that it represents, for example:
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
@dataclass
class User(AvroModel):
name: str
age: int
has_pets: bool
money: float
encoded: bytes
# this creates a proper instance
user_instance = User(
name="a name",
age=10,
has_pets=True,
money=0,
encoded=b'hi',
)
assert user_instance.validate()
# set 1 to the name attribute and the fastavro validation should fail
# This is possible because in dataclasses there is not restriction,
# but at the moment of using pydantic this will change
user_instance.name = 1
with pytest.raises(ValidationError) as exc:
assert user_instance.validate()
assert json.loads(str(exc.value)) == ["User.name is <1> of type <class 'int'> expected string"]
(This script is complete, it should run "as is")
Sometimes you have a dictionary
and you want to create an instance without creating the nested objects. This library follows
the same approach as pydantic
with parse_obj
method. This is also valid for pydantic.AvroBaseModel
.
from dataclasses import dataclass
import typing
from dataclasses_avroschema import AvroModel
@dataclass
class Address(AvroModel):
"An Address"
street: str
street_number: int
@dataclass
class User(AvroModel):
"User with multiple Address"
name: str
age: int
addresses: typing.List[Address]
data_user = {
"name": "john",
"age": 20,
"addresses": [{
"street": "test",
"street_number": 10,
}],
}
user = User.parse_obj(data=data_user)
assert type(user.addresses[0]) is Address
(This script is complete, it should run "as is")
It is possible to have inheritance so you do not have to repeat the same code. You need to be aware that parent classes might have
attributes with default values and that can cause TypeError: non-default argument
errors.
!!! hint
With Python 3.10, it is now possible to do it natively with dataclasses.
Dataclasses 3.10 added the kw_only
attribute (similar to attrs). It allows you to specify which fields are keyword_only,
thus will be set at the end of the init, not causing an inheritance problem.
from dataclasses import dataclass
from dataclasses_avroschema import AvroModel
@dataclass
class Parent(AvroModel):
name: str
age: int
@dataclass
class Child(Parent):
has_pets: bool
money: float
encoded: bytes
@dataclass
class Child2(Parent, AvroModel):
has_pets: bool
money: float
encoded: bytes
class Meta:
schema_doc = False
Child2.avro_schema()
'{
"type": "record",
"name": "Child2",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "long"},
{"name": "has_pets", "type": "boolean", "default": false},
{"name": "money", "type": "double", "default": 100.3},
{"name": "encoded", "type": "bytes"}
]
}'
assert child_schema["fields"] == child_2_schema["fields"]
(This script is complete, it should run "as is")
The schema generation uses the same order of the python dataclass attributes has, for example. But what if you want to change the field order
and specify a field that has a default value before a required value? This with a python dataclass won't work because required field must be declared before optional fields. Another IMPORTANT use case is when a schema was generated
by a third party (we do not have control of if) and fields with defaults values are declared before the required ones, so using the field_order
property will help us.
The previous example generate has the following class
import dataclasses
from dataclasses_avroschema import AvroModel
@dataclasses.dataclass
class User(AvroModel):
"My User Class"
name: str
age: int
has_pets: bool = False
money: float = 100.3
which represents the schame:
{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "long"},
{"name": "has_pets", "type": "boolean", "default": false},
{"name": "money", "type": "double", "default": 100.3}
],
"doc": "My User Class",
}
We want that the field has_pets
at the beginning of the schema, for this we need to use the field_order
property in the Meta class
:
import dataclasses
from dataclasses_avroschema import AvroModel
@dataclasses.dataclass
class User(AvroModel):
"My User Class"
name: str
age: int
has_pets: bool = False
money: float = 100.3
class Meta:
field_order = ["has_pets",]
which represents the schema
{
"type": "record",
"name": "User",
"fields": [
{"name": "has_pets", "type": "boolean", "default": false},
{"name": "name", "type": "string"},
{"name": "age", "type": "long"},
{"name": "money", "type": "double", "default": 100.3}
],
"doc": "My User Class",
}
!!! warning Schemas with the same fields but with different order are NOT the same schema. In avro the field order is important
It is possible to exclude fields from the schema using the Meta.exclude
attribute. This can be helpful when we have fields that are not serializable.
import dataclasses
from dataclasses_avroschema import AvroModel
class User(AvroModel):
"An User"
name: str
age: int
last_name: str = "Bond"
class Meta:
namespace = "test.com.ar/user/v1"
aliases = [
"User",
"My favorite User",
]
exclude = [
"last_name",
]
which represents the schema whiout the field last_name
{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "long"}
],
"doc": "An User",
"namespace": "test.com.ar/user/v1",
"aliases": ["User", "My favorite User"]
!!! warning If a required field is excluded from the schema then the deserialization will FAIL because a default value is not provided