Skip to content

avro-kotlin/avro4k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

build-main Download Kotlin Avro spec

Introduction

Avro4k (or Avro for Kotlin) is a library that brings Avro serialization format in kotlin, based on the reflection-less kotlin library called kotlinx-serialization.

Here are the main features:

  • Full avro support, including logical types, unions, recursive types, and schema evolution âś…
  • Encode and decode anything to and from binary format, and also in generic data 🧰
  • Generate schemas based on your values and data classes đź“ť
  • Customize the generated schemas and encoded data with annotations đź‘·
  • Fast as it is reflection-less 🚀
  • Simple API to get started quickly, also with native support of java.time, BigDecimal, BigInteger and UUID classes 🥇
  • Relaxed matching for easy schema evolution as it natively adapts compatible types 🌀

Warning

Important: As of today, avro4k is only available for JVM platform, and theoretically for android platform (as apache avro library is already android-ready).
If you would like to have js/wasm/native compatible platforms, please put a đź‘Ť on this issue

Quick start

Basic

Example:
package myapp

import com.github.avrokotlin.avro4k.*
import kotlinx.serialization.*

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    // Generating schemas
    val schema = Avro.schema<Project>()
    println(schema.toString()) // {"type":"record","name":"Project","namespace":"myapp","fields":[{"name":"name","type":"string"},{"name":"language","type":"string"}]}

    // Serializing objects
    val data = Project("kotlinx.serialization", "Kotlin")
    val bytes = Avro.encodeToByteArray(data)

    // Deserializing objects
    val obj = Avro.decodeFromByteArray<Project>(bytes)
    println(obj) // Project(name=kotlinx.serialization, language=Kotlin)
}

Single object

Avro4k provides a way to encode and decode single objects with AvroSingleObject class. This encoding will prefix the binary data with the schema fingerprint to allow knowing the writer schema when reading the data. The downside is that you need to provide a schema registry to get the schema from the fingerprint. This format is perfect for payloads sent through message brokers like kafka or rabbitmq as it is the most compact schema-aware format.

Example:
package myapp

import com.github.avrokotlin.avro4k.*
import kotlinx.serialization.*
import org.apache.avro.SchemaNormalization

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    val schema = Avro.schema<Project>()
    val schemasByFingerprint = mapOf(SchemaNormalization.parsingFingerprint64(schema), schema)
    val singleObjectInstance = AvroSingleObject { schemasByFingerprint[it] }

    // Serializing objects
    val data = Project("kotlinx.serialization", "Kotlin")
    val bytes = singleObjectInstance.encodeToByteArray(data)

    // Deserializing objects
    val obj = singleObjectInstance.decodeFromByteArray<Project>(bytes)
    println(obj) // Project(name=kotlinx.serialization, language=Kotlin)
}

For more details, check in the avro spec the single object encoding.

Object container

Avro4k provides a way to encode and decode object container — also known as data file — with AvroObjectContainerFile class. This encoding will prefix the binary data with the full schema to allow knowing the writer schema when reading the data. This format is perfect for storing multiple long-term objects in a single file.

Example:
package myapp

import com.github.avrokotlin.avro4k.*
import kotlinx.serialization.*
import org.apache.avro.SchemaNormalization

@Serializable
data class Project(val name: String, val language: String)

fun main() {
    val schema = Avro.schema<Project>()
    val schemasByFingerprint = mapOf(SchemaNormalization.parsingFingerprint64(schema), schema)
    val singleObjectInstance = AvroSingleObject { schemasByFingerprint[it] }

    // Serializing objects
    val data = Project("kotlinx.serialization", "Kotlin")
    val bytes = singleObjectInstance.encodeToByteArray(data)

    // Deserializing objects
    val obj = singleObjectInstance.decodeFromByteArray<Project>(bytes)
    println(obj) // Project(name=kotlinx.serialization, language=Kotlin)
}

For more details, check in the avro spec the single object encoding.

Important notes

  • Avro4k is highly based on apache avro library, that implies all the schema validation is done by it
  • All members annotated with @ExperimentalSerializationApi are subject to changes in future releases without any notice as they are experimental, so please check the release notes to check the needed migration

Setup

Gradle Kotlin DSL
plugins {
    kotlin("jvm") version kotlinVersion
    kotlin("plugin.serialization") version kotlinVersion
}

dependencies {
    implementation("com.github.avro-kotlin.avro4k:avro4k-core:$avro4kVersion")
}

Gradle Groovy DSL
plugins {
    id 'org.jetbrains.kotlin.multiplatform' version kotlinVersion
    id 'org.jetbrains.kotlin.plugin.serialization' version kotlinVersion
}

dependencies {
    implementation "com.github.avro-kotlin.avro4k:avro4k-core:$avro4kVersion"
}

Maven

Add serialization plugin to Kotlin compiler plugin:

<build>
    <plugins>
        <plugin>
            <groupId>org.jetbrains.kotlin</groupId>
            <artifactId>kotlin-maven-plugin</artifactId>
            <version>${kotlin.version}</version>
            <executions>
                <execution>
                    <id>compile</id>
                    <phase>compile</phase>
                    <goals>
                        <goal>compile</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <compilerPlugins>
                    <plugin>kotlinx-serialization</plugin>
                </compilerPlugins>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>org.jetbrains.kotlin</groupId>
                    <artifactId>kotlin-maven-serialization</artifactId>
                    <version>${kotlin.version}</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>

Add the avro4k dependency:

<dependency>
    <groupId>com.github.avro-kotlin.avro4k</groupId>
    <artifactId>avro4k-core</artifactId>
    <version>${avro4k.version}</version>
</dependency>

How to generate schemas

Writing schemas manually or using the Java based SchemaBuilder can be tedious. kotlinx-serialization simplifies this generating for us the corresponding descriptors to allow generating avro schemas easily, without any reflection. Also, it provides native compatibility with data classes (including open and sealed classes), inline classes, any collection, array, enums, and primitive values.

Note

For more information about the avro schema, please refer to the avro specification

To allow generating a schema for a specific class, you need to annotate it with @Serializable:

@Serializable
data class Ingredient(val name: String, val sugar: Double)

@Serializable
data class Pizza(val name: String, val ingredients: List<Ingredient>, val topping: Ingredient?, val vegetarian: Boolean)

Then you can generate the schema using the Avro.schema function:

val schema = Avro.schema<Pizza>()
println(schema.toString(true))

The generated schema will look as follows:

{
    "type": "record",
    "name": "Pizza",
    "namespace": "com.github.avrokotlin.avro4k.example",
    "fields": [
        {
            "name": "name",
            "type": "string"
        },
        {
            "name": "ingredients",
            "type": {
                "type": "array",
                "items": {
                    "type": "record",
                    "name": "Ingredient",
                    "fields": [
                        {
                            "name": "name",
                            "type": "string"
                        },
                        {
                            "name": "sugar",
                            "type": "double"
                        }
                    ]
                }
            }
        },
        {
            "name": "topping",
            "type": [
                "null",
                {
                    "type": "record",
                    "name": "Ingredient"
                }
            ],
            "default": null
        },
        {
            "name": "vegetarian",
            "type": "boolean"
        }
    ]
}

If you need to configure your Avro instance, you need to create your own instance of Avro with the wanted configuration, and then use it to generate the schema:

val yourAvroInstance = Avro {
    // your configuration
}
yourAvroInstance.schema<Pizza>()

Usage

Types matrix

Kotlin type Avro reader type Compatible avro writer type Avro logical type Note / Serializer class
Boolean boolean string
Byte, Short, Int int long, float, double, string
Long long int, float, double, string
Float float double, string
Double double float, string
Char int string The value serialized is the char code. When reading from a string, requires exactly 1 char
String string bytes, fixed
ByteArray bytes string, fixed
Map<*, *> map The map key must be string-able. Mainly everything is string-able except null and composite types (collection, data classes)
out Collection<*> array
data class record
enum class enum
Any field with @AvroFixed fixed bytes, string You can only annotated fields that are compatible with bytesor string, otherwise it throws an error
java.math.BigDecimal bytes int, long, float, double, string decimal By default, the scale is 2 and the precision 8. To change it, annotate the field with @AvroDecimal
java.math.BigDecimal string To use it, register the serializer com.github.avrokotlin.avro4k.serializer.BigDecimalAsStringSerializer. @AvroDecimal is ignored in that case
java.util.UUID string uuid To use it, just annotate the field with @Contextual
java.net.URL string To use it, just annotate the field with @Contextual
java.math.BigInteger string int, long, float, double To use it, just annotate the field with @Contextual
java.time.LocalDate int long, string date To use it, just annotate the field with @Contextual
java.time.Instant long string timestamp-millis To use it, just annotate the field with @Contextual
java.time.Instant long string timestamp-micros To use it, register the serializer com.github.avrokotlin.avro4k.serializer.InstantToMicroSerializer
java.time.LocalDateTime long string timestamp-millis To use it, just annotate the field with @Contextual
java.time.LocalTime int long, string time-millis To use it, just annotate the field with @Contextual

Note

For more details, check the built-in classes in kotlinx-serialization

Add documentation to a schema

You may want to add documentation to a schema to provide more information about a field or a named type (only RECORD and ENUM for the moment).

Warning

Do not use @org.apache.avro.reflect.AvroDoc as this annotation is not visible by Avro4k.

import com.github.avrokotlin.avro4k.AvroDoc

@Serializable
@AvroDoc("This is a record documentation")
data class MyData(
    @AvroDoc("This is a field documentation")
    val myField: String
)

@Serializable
@AvroDoc("This is an enum documentation")
enum class MyEnum {
    A,
    B
}

Note

This impacts only the schema generation.

Support additional non-serializable types

When looking at the types matrix, you can see some of them natively supported by Avro4k, but some others are not. Also, your own types may not be serializable.

To fix it, you need to create a custom serializer that will handle the serialization and deserialization of the value, and provide a descriptor.

Note

This impacts the serialization and the deserialization. It can also impact the schema generation if the serializer is providing a custom logical type or a custom schema through the descriptor.

Write your own serializer

To create a custom serializer, you need to implement the KSerializer interface and override the serialize and deserialize functions. Also, you'll need to provide a descriptor that includes the @AvroLogicalType annotation.

Create a generic serializer that doesn't need specific Avro features
object YourTypeSerializer : KSerializer<YourType> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("YourType", PrimitiveKind.STRING)

    override fun serialize(encoder: Encoder, value: YourType) {
        encoder.encodeString(value.toString())
    }

    override fun deserialize(decoder: Decoder): YourType {
        return YourType.fromString(decoder.decodeString())
    }
}
Create a serializer that needs Avro features like getting the schema or encoding bytes and fixed types
object YourTypeSerializer : AvroSerializer<YourType> {
    // the descriptor that will be used to generate the schema
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("YourType", PrimitiveKind.STRING)

    override fun serializeAvro(encoder: AvroEncoder, value: YourType) {
        encoder.currentWriterSchema // you can access the current writer schema
        encoder.encodeString(value.toString())
    }

    override fun deserializeAvro(decoder: AvroDecoder): YourType {
        decoder.currentWriterSchema // you can access the current writer schema
        return YourType.fromString(decoder.decodeString())
    }

    override fun serializeGeneric(encoder: Encoder, value: YourType) {
        // you may want to implement this function if you also want to use the serializer outside of Avro4k
        encoder.encodeString(value.toString())
    }

    override fun deserializeGeneric(decoder: Decoder): YourType {
        // you may want to implement this function if you also want to use the serializer outside of Avro4k
        return YourType.fromString(decoder.decodeString())
    }
}

Register the serializer globally

You first need to configure your Avro instance with the wanted serializer instance:

import kotlinx.serialization.modules.SerializersModule
import kotlinx.serialization.modules.contextual

val myCustomizedAvroInstance = Avro {
    serializersModule = SerializersModule {
        // give the object serializer instance
        contextual(YourTypeSerializerObject)
        // or instanciate it if it's a class and not an object
        contextual(YourTypeSerializerClass())
    }
}

Then just annotated the field with @Contextual:

@Serializable
data class MyData(
    @Contextual val myField: YourType
)

Register the serializer just for a field

@Serializable
data class MyData(
    @Serializable(with = YourTypeSerializer::class) val myField: YourType
)

Changing record's field name

By default, field names are the original name of the kotlin fields in the data classes.

Note

This impacts the schema generation, the serialization and the deserialization of the field.

Individual field name change

To change a field name, annotate it with @SerialName:

@Serializable
data class MyData(
    @SerialName("custom_field_name") val myField: String
)

Note

@SerialName will still be handled by the naming strategy

Field naming strategy (overall change)

To apply a naming strategy to all fields, you need to set the fieldNamingStrategy in the Avro configuration.

Note

This is only applicable for RECORD fields, and not for ENUM symbols.

There is 3 built-ins strategies:

  • NoOp (default): keeps the original kotlin field name
  • SnakeCase: converts the original kotlin field name to snake_case with underscores before each uppercase letter
  • PascalCase: upper-case the first letter of the original kotlin field name

First, create your own instance of Avro with the wanted naming strategy:

val myCustomizedAvroInstance = Avro {
    fieldNamingStrategy = FieldNamingStrategy.Builtins.SnakeCase
}

Then, use this instance to generate the schema or encode/decode data:

package my.package

@Serializable
data class MyData(val myField: String)

val schema = myCustomizedAvroInstance.schema<MyData>() // {...,"fields":[{"name":"my_field",...}]}

Set a default field value

While reading avro binary data, you can miss a field (a kotlin field is present but not in the avro binary data), so Avro4k fails as it is not capable of constructing the kotlin type without the missing field value.

Note

By default, all nullable fields are optional as a default: null is automatically added to the schema (check this section to opt out from this default behavior).

@AvroDefault

To avoid this error, you can set a default value for a field by annotating it with @AvroDefault:

import com.github.avrokotlin.avro4k.AvroDefault

@Serializable
data class MyData(
    @AvroDefault("default value") val stringField: String,
    @AvroDefault("42") val intField: Int?,
    @AvroDefault("""{"stringField":"custom value"}""") val nestedType: MyData? = null
)

Note

This impacts only the schema generation and the deserialization of the field, and not the serialization.

Warning

Do not use @org.apache.avro.reflect.AvroDefault as this annotation is not visible by Avro4k.

kotlin default value

You can also set a kotlin default value, but this default won't be present into the generated schema as Avro4k is not able to retrieve it:

@Serializable
data class MyData(
    val stringField: String = "default value",
    val intField: Int? = 42,
)

This impacts only the deserialization of the field, and not the serialization or deserialization.

Add aliases

To be able of reading from different written schemas, or able of writing to different schemas, you can add aliases to a named type (record, enum) field by annotating it with @AvroAlias. The given aliases may contain the full name of the alias type or only the name.

Avro spec link

Note

Aliases are not impacted by naming strategy, so you need to provide aliases directly applying the corresponding naming strategy if you need to respect it.

import com.github.avrokotlin.avro4k.AvroAlias

@Serializable
@AvroAlias("full.name.RecordName", "JustOtherRecordName")
data class MyData(
    @AvroAlias("anotherFieldName", "old_field_name") val myField: String
)

Note

This impacts the schema generation, the serialization and the deserialization.

Warning

Do not use @org.apache.avro.reflect.AvroAlias as this annotation is not visible by Avro4k.

Add metadata to a schema (custom properties)

You can add custom properties to a schema to have additional metadata on a type. To do so, you can annotate the data class or field with @AvroProp. The value can be a regular string or any json content:

@Serializable
@AvroProp("custom_string_property", "The default non-json value")
@AvroProp("custom_int_property", "42")
@AvroProp("custom_json_property", """{"key":"value"}""")
data class MyData(
    @AvroProp("custom_field_property", "Also working on fields")
    val myField: String
)

Note

This impacts only the schema generation. For more details, check the avro specification.

Warning

Do not use @org.apache.avro.reflect.AvroMeta as this annotation is not visible by Avro4k.

Change scale and precision for decimal logical type

By default, the scale is 2 and the precision 8. To change it, annotate the field with @AvroDecimal:

@Serializable
data class MyData(
    @AvroDecimal(scale = 4, precision = 10) val myField: BigDecimal
)

Note

This impacts the schema generation, the serialization and the deserialization.

Change enum values' name

By default, enum symbols are exactly the name of the enum values in the enum classes. To change this default, you need to annotate enum values with @SerialName.

@Serializable
enum class MyEnum {
    @SerialName("CUSTOM_NAME")
    A,
    B,
    C
}

Note

This impacts the schema generation, the serialization and the deserialization.

Set enum default

When reading with a schema but was written with a different schema, sometimes the reader can miss the enum symbol that triggers an error. To avoid this error, you can set a default symbol for an enum by annotating the expected fallback with @AvroEnumDefault.

@Serializable
enum class MyEnum {
    A,

    @AvroEnumDefault
    B,

    C
}

Note

This impacts the schema generation, the serialization and the deserialization.

Change type name (RECORD and ENUM)

RECORD and ENUM types in Avro have a name and a namespace (composing a full-name like namespace.name). By default, the name is the name of the class/enum and the namespace is the package name. To change this default, you need to annotate data classes and enums with @SerialName.

Warning

@SerialName is redefining the full-name of the annotated class or enum, so you must repeat the name or the namespace if you only need to change the namespace or the name respectively.

Note

This impacts the schema generation, the serialization and the deserialization.

Changing the name while keeping the namespace

package my.package

@Serializable
@SerialName("my.package.MyRecord")
data class MyData(val myField: String)

Changing the namespace while keeping the name

package my.package

@Serializable
@SerialName("custom.namespace.MyData")
data class MyData(val myField: String)

Changing the name and the namespace

package my.package

@Serializable
@SerialName("custom.namespace.MyRecord")
data class MyData(val myField: String)

Changing the namespace of all nested named type(s)

Sometimes, using classes from other packages or libraries, you may want to change the namespace of a nested named type. This is done annotating the field with @AvroNamespaceOverride.

import kotlinx.serialization.Serializable
import com.github.avrokotlin.avro4k.AvroNamespaceOverride

@Serializable
data class MyData(
    @AvroNamespaceOverride("new.namespace") val myField: NestedRecord
)

// ...
package external.package.name

@Serializable
data class NestedRecord(val field: String)

Note

This impacts the schema generation, the serialization and the deserialization.

Change type name (FIXED only)

Warning

For the moment, it is not possible to manually change the namespace or the name of a FIXED type as the type name is coming from the field name and the namespace from the enclosing data class package.

Set a custom logical type

To create a custom logical type, you need to create a serializer that will handle the serialization and deserialization of the value, and provide a descriptor that include the @AvroLogicalType annotation. Additionally, you need to register your serializer.

Warning

When this issue is released, this section will be updated as the implementation will change.

TODO when kotlinx-serialization released the version to unwrap nullable descriptor

Skip a kotlin field

To skip a field during encoding, you can annotate it with @kotlinx.serialization.Transient. Note that you need to provide a default value for the field as the field will be totally discarded also during encoding (IntelliJ should trigger a warn).

import kotlinx.serialization.Serializable
import kotlinx.serialization.Transient

@Serializable
data class Foo(val a: String, @Transient val b: String = "default value")

Note

This impacts the schema generation, the serialization and the deserialization.

Disable implicit default: null for nullable fields

Avro4k makes by default your nullable fields optional (put default: null on all nullable fields if no other explicit default provided). You can opt out this feature by setting implicitNulls to false in the Avro configuration:

Avro {
    implicitNulls = false
}

Note

This impacts the schema generation, the serialization and the deserialization.

Nullable fields, optional fields and compatibility

With avro, you can have nullable fields and optional fields, that are taken into account for compatibility checking when using the schema registry.

But if you want to remove a nullable field that is not optional, depending on the compatibility mode, it may not be compatible because of the missing default value.

  • What is an optional field ?

An optional field is a field that have a default value, like an int with a default as -1.

  • What is a nullable field ?

A nullable field is a field that contains a null type in its type union, but it's not an optional field if you don't put default value to null.

So to mark a field as optional and facilitate avro contract evolution regarding compatibility checks, then set default to null.

Known problems

  • Kotlin 1.7.20 up to 1.8.10 cannot properly compile @SerialInfo-Annotations on enums (see Kotlin/kotlinx.serialization#2121). This is fixed with kotlin 1.8.20. So if you are planning to use any of avro4k's annotations on enum types, please make sure that you are using kotlin >= 1.8.20.

Contributions

Contributions to avro4k are always welcome. Good ways to contribute include:

  • Raising bugs and feature requests
  • Fixing bugs and enhancing the API
  • Improving the performance of avro4k
  • Adding documentation