Skip to content

Releases: avro-kotlin/avro4k

v2.1.0

17 Sep 08:10
712292d
Compare
Choose a tag to compare

What's Changed

  • feat: Allow writing object-container files from blocking and async/suspend contexts by @Chuckame in #257
    Experimental breaking change: in AvroObjectContainer, encodeToStream (including its extensions) has been replaced by openWriter which returns a writer to write elements to the container, instead of a Sequence which makes difficult to write from coroutines. Don't forget to close the stream to flush the buffered elements.

Full Changelog: v2.0.0...v2.1.0

v2.0.0

17 Jul 08:23
6e1d82a
Compare
Choose a tag to compare

Introduction of v2

Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.

Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.

A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.

I hope this major release will make Avro easier to use, even more in pure kotlin 🚀

As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !

Highlights and Breaking changes

Party hard

Performances & benchmark

Long story Well... Trying to make a similar benchmark is complicated, as the v2 adds a lot of features and fixes compared to v1.

The following benchmark is not fully representative as it is not comparing all the features.

We will compare an easy use case: encoding and decoding a simple data class with all the primitive types, a String and a list of strings:

@Serializable
data class SimpleDataClass(
    val bool: Boolean,
    val byte: Byte,
    val short: Short,
    val int: Int,
    val long: Long,
    val float: Float,
    val double: Double,
    val string: String,
    val bytes: ByteArray,
)

The benchmark has been executed on a Macbook air M2 in a mono-threaded environment.

Avro4k v2 (binary) is MUCH faster than v1 (generic records), and also now more performant than jackson and the standard apache avro (using reflection). Not tested for the moment with SpecificRecord.

Encoding Performance

Version Encoding (ops/s) Relative Difference (%)
Avro4k v1 (generic records) 109 327 0%
Jackson 134 774 +23%
Avro4k v2 (generic records) 190 365 +74%
Apache avro ReflectData (direct binary) 332 438 +204%
Avro4k v2 (direct binary) 459 751 +321% 🚀

Decoding Performance

Version Decoding (ops/s) Relative Difference (%)
Avro4k v1 (generic records) 67 825 0%
Jackson 71 146 +5%
Avro4k v2 (generic records) 114 511 +69%
Apache avro ReflectData (direct binary) 151 287 +123%
Avro4k v2 (direct binary) 174 063 +157% 🚀

Migration guide

As there is a lot of changed APIs, classes, packages, and more, here is the migration guide. Don't hesitate to file an issue if something is missing!

Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0

You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0 to use Avro4k v2.0.0+ (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).

More information here: kotlinx-serialization v1.7.0

ExperimentalSerializationApi

Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨

You can experience a lot of ExperimentalSerializationApi warnings, as everything has been reworked. The common APIs may be stable more quickly, so they could be un-annotated in the next minor release. For the more complex or less used APIs, they could be un-annotated later.

To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅

Warning

Any API removal with ExperimentalSerializationApi won't be considered as a breaking change regarding the semver standard, so given a version A.B.C, only the minor B number will be incremented, not the major A.

Direct binary serialization

Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.

Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).

Note

We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it may be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.

Support anything to encode and decode at root level

Before, we were only able to encode and decode GenericRecord. No primitive, no arrays, no value class, just generic records.

Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!

This includes any data class, enum, sealed interface or class, value class, primitive values or contextual serializers 🚀

Totally new API

The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.

There is now different entrypoints for different purposes:

  • Avro: the main entrypoint to generate schemas, encode and decode in the avro format. This is the pure raw avro format without anything else around it.
  • AvroObjectContainer: the entrypoint to encode avro data files, following the official spec, and using Avro for each value serialization.
  • AvroSingleObject: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also using Avro for value serialization.

Warning

Avro.encodeToByteArray is now encoding in pure binary avro. If you still need to encode in the object container format as the v1 (in the DATA format), you have to use AvroObjectContainer

Implicit nulls by default

Previously, when a nullable field was missing from the writer schema while decoding, then a failure happened.

Now, it decodes null and is not failing for all the nullable fields. To opt-out this feature, configure your Avro instance with implicitNulls = false.

It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.

Implicit empty maps, collections and arrays by default

Previously, when a map or collection-like field was missing from the writer schema while decoding, then a failure happened.

Now, it decodes an empty collection and is not failing (an empty map, list, array or set depending on the field type). To opt-out this feature, configure your Avro instance with implicitEmptyCollections = false.

It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.

Lenient

The apache avro library is strict regarding the types and strongly follow the avro spec. As an example, a float in kotlin can be written as a float, while being decoded as a float and a double.

Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.

A type matrix has been written inside README.

No more reflection

Thanks to this little change,

Absolutely no more reflection, so that allows you to use android or GraalVM AOT native compilation (not tested, but should work, let us know!).

Unified & cleaned annotations

  • AvroJsonProp has been merged toAvroProp: the json content is automatically detected, so any non-json content is handled as a string
  • AvroAliases has been merged toAvroAlias: there is now a varags to pass as many aliases as you want using the same annotation
  • AvroInline has been removed in favor of kotlin native value class
  • AvroEnumDefault is now to be applied directly on the default enum member
  • ScalePrecision has been renamed to AvroDecimal to keep and unify to a common prefix. Also, the decimal's scale and precision do not have defaults anymore
  • AvroNamespace and AvroName has been replaced by the native kotlinx-serialization SerialName annotation
  • AvroStringable has been added to easily for a field type to be inferred as a string (this is working for all the primitive types and the built-in logical types)
  • AvroFixed is now only applying on compatible types (ByteArray, String, decimal logical type), annotating other types will just do nothing

Only ByteArray is now handled as BYTES

Previously, all the collections-like of bytes were handled as BYTES.

Now, only ByteArray is handled as BYTES, and the other collections-like of bytes...

Read more

v2.0.0-RC7

11 Jul 16:42
eda4375
Compare
Choose a tag to compare
v2.0.0-RC7 Pre-release
Pre-release

What's Changed

Full Changelog: v2.0.0-RC6...v2.0.0-RC7

v2.0.0-RC6

25 Jun 21:59
23814d2
Compare
Choose a tag to compare
v2.0.0-RC6 Pre-release
Pre-release

What's Changed

Full Changelog: v2.0.0-RC5...v2.0.0-RC6

v2.0.0-RC5

24 Jun 04:10
9df5c94
Compare
Choose a tag to compare
v2.0.0-RC5 Pre-release
Pre-release

What's Changed

  • feat: Allow adding props to a given type using value classes by @Chuckame in #219
  • deps: Use non-RC version of kotlinx-serialization by @Chuckame in #221
  • deps: Upgrade plugins, trying to fix publication failing by @Chuckame in #222

Full Changelog: v2.0.0-RC2...v2.0.0-RC5

v2.0.0-RC2

29 May 19:54
2040d25
Compare
Choose a tag to compare
v2.0.0-RC2 Pre-release
Pre-release

Introduction of v2

Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.

Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.

A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.

I hope this major release will make Avro easier to use, even more in pure kotlin 🚀

As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !

Highlights and Breaking changes

Party hard

Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC

You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC to use Avro4k v2 (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).

More information here: kotlinx-serialization v1.7.0-RC

ExperimentalSerializationApi

Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨

To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅

Direct binary serialization

Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.

Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).

Note

We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it will be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.

Support anything to encode and decode at root level

Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!

This includes any data class, enum, sealed interface, value class, primitive or contextual values 🚀

Totally new API

The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.

There is now different entrypoints for different purposes:

  • Avro: the main entrypoint to generate schemas, encode and decode avro format. This is the pure raw avro format without anything else
  • AvroObjectContainerFile: the entrypoint to encode avro data files, following the official spec, and using Avro for each value serialization.
  • AvroSingleObject: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also using Avro for value serialization.

Here are some examples of the changes:

Pure avro serialization (no specific format, no prefix, no magic byte, just pure avro binary)
// Previously
val bytes = Avro.default.encodeToByteArray(TheDataClass.serializer(), TheDataClass(...))
Avro.default.decodeFromByteArray(TheDataClass.serializer(), bytes)

// Now
val bytes = Avro.encodeToByteArray(TheDataClass(...))
Avro.decodeFromByteArray<TheDataClass>(bytes)
generic data serialization (convert a kotlin data class to a GenericRecord to then be handled by a `GenericDatumWriter` in avro)
// Previously
val genericRecord: GenericRecord = Avro.default.toRecord(TheDataClass.serializer(), TheDataClass(...))
Avro.default.fromRecord(TheDataClass.serializer(), genericRecord)

// Now
val genericData: Any? = Avro.encodeToGenericData(TheDataClass(...))
Avro.decodeFromGenericData<TheDataClass>(genericData)
Configure the `Avro` instance
// Previously
val avro = Avro(
    AvroConfiguration(
        namingStrategy = FieldNamingStrategy.SnackCase,
        implicitNulls = true,
    ),
    SerializersModule {
         contextual(CustomSerializer())
    }
)

// Now
val avro = Avro {
    namingStrategy = FieldNamingStrategy.SnackCase
    implicitNulls = true
    serializersModule = SerializersModule {
         contextual(CustomSerializer())
    }
}
Changing the name of a record
// Previously
@AvroName("TheName")
@AvroNamespace("a.custom.namespace")
data class TheDataClass(...)

// Now
@SerialName("a.custom.namespace.TheName")
data class TheDataClass(...)
Writing an avro object container file with a custom field naming strategy
// Previously
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
    Avro(AvroConfiguration(namingStrategy = SnakeCaseNamingStrategy))
        .openOutputStream(TheDataClass.serializer()) { encodeFormat = AvroEncodeFormat.Data(CodecFactory.snappyCodec()) }
        .to(outputStream)
        .write(TheDataClass(...))
        .write(TheDataClass(...))
        .write(TheDataClass(...))
        .close()
}


// Now
val dataSequence = sequenceOf(
    TheDataClass(...),
    TheDataClass(...),
    TheDataClass(...),
)
val avro = Avro { fieldNamingStrategy = FieldNamingStrategy.SnakeCase }
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
    AvroObjectContainerFile(avro)
        .encodeToStream(dataSequence, outputStream) {
            codec(CodecFactory.snappyCodec())
            // you can also add your metadata !
            metadata("myProp", 1234L)
            metadata("a string metadata", "hello")
        }
}

Warning

Migration guide: WIP

Implicit nulls by default

Previously, when nothing were decoded for a nullable field was failing.

Now, it decodes null and is not failing. To opt-out this feature, configure your Avro instance with implicitNulls = false.

It has been enabled by default to simplify the use of Avro4k and make it

Lenient

The apache avro library is strict regarding the types and strongly follow the avro spec. An example is that a float in kotlin can be written and read as a float and a double in avro.

Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.

A type matrix has been written inside README.

No more reflection

Thanks to this little change,

Absolutely no more reflection, so that allows using android or GraalVM AOT native compilation (need kotlinx-serialization 1.7.0).

Unified & cleaned annotations

Some numbers: 4 annotations has been removed over 12!

  • AvroJsonProp has been merged toAvroProp: the json content is automatically detected, so any non-json content is handled as a string
  • AvroAliases has been merged toAvroAlias: there is now a varags to pass as many aliases as you want using the same annotation
  • AvroInline has been removed in favor of kotlin native value class
  • AvroEnumDefault is now to be applied directly on the default enum member
  • ScalePrecision has been renamed to AvroDecimal to keep a common prefix
  • AvroNamespace and AvroName has been replaced by the native kotlinx-serialization SerialName annotation
  • AvroNamespaceOverride has been created to allow replacing the namespace of a field schema (⚠️ this annotation is not stable and can disappear at any moment)

Caching

All schemas are cached using WeakIdentityHashMap to allow the GC to remove the cache entries in case of low available memory.

Also some other internal expensive parts are cached for quicker encoding and decoding.

Performances & benchmark

Warning

WIP

What's Changed

  • fix: Assume kotlin.Pair as a normal data class instead of an union by @Chuckame in #174
  • feat!: No more reflection and customizable logical types by @Chuckame in #175
  • feat: Add support for decoding with avro aliases by @Chuckame in #177
  • Generalize encoding/decoding tests (#168) by @Chuckame in #179
  • chore: Add spotless with ktlint + editorconfig by @Chuckame in #180
  • feat: Support kotlin's value classes by @Chuckame in #183
  • feat: Revamp naming strategy and related annotations by @Chuckame in #182
  • feat: Merge ScalePrecision to AvroDecimalLogicalType by @Chuckame in #191
  • chore: Upgrade github actions and use standard gradle actions by @Chuckame in #192
  • feat: revamp the schema generation by @Chuckame in #190
  • feat: New Avro entrypoint by @Chuckame in...
Read more

v1.10.1

26 Apr 16:02
1e6d6b3
Compare
Choose a tag to compare

What's Changed

  • fix(annotations): Set the @Language value to JSON by @Chuckame in #157
  • feat(aliases): Merge AvroAliases annotation to AvroAlias by @Chuckame in #156
  • Updated apache org.apache.avro:avro to resolve CVE-2023-39410 by @TNijman1990 in #172
  • chore(build): Replace buildSrc by gradle's versionCatalogs by @Chuckame in #173
  • Added support for decoding with avro alias by @trdw in #171
  • Generalize encoding/decoding tests by @thake in #168
  • fix: Allow encoding null array items or null map values by @Chuckame in #197

New Contributors

Full Changelog: v1.10.0...v1.10.1

v1.10.0

25 Aug 20:23
18dd189
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.9.0...v1.10.0

v1.9.0

18 Aug 13:59
a1352b5
Compare
Choose a tag to compare

What's Changed

  • feat: Set default to null when the field type is nullable (activable by the configuration) by @Chuckame in #140
  • Bump snappy-java version by @AdamBlance in #146

New Contributors

Full Changelog: v1.8.0...v1.9.0

v1.8.0

06 Jun 05:03
f6a7c0c
Compare
Choose a tag to compare
Merge pull request #132 from Chuckame/fix/value-class-map

fix: Allow value classes and primitive schemas generation