Skip to content

Commit

Permalink
feat: Add implicitEmptyCollections configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
Chuckame committed Jul 15, 2024
1 parent eda4375 commit 999a90c
Show file tree
Hide file tree
Showing 92 changed files with 513 additions and 1,409 deletions.
55 changes: 54 additions & 1 deletion Migrating-from-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,63 @@ data class TheDataClass(
)

// Now
// ... Nothing, as it is the default behavior!
data class TheDataClass(
// ... Nothing, as it is the default behavior!
val field: String?
)

// Or
val avro = Avro { implicitNulls = false }
data class TheDataClass(
@AvroDefault("null")
val field: String?
)
```

## Set a field default value to empty array

```kotlin
// Previously
data class TheDataClass(
@AvroDefault("[]")
val field: List<String>
)

// Now
data class TheDataClass(
// ... Nothing, as it is the default behavior!
val field: List<String>
)

// Or
val avro = Avro { implicitEmptyCollections = false }
data class TheDataClass(
@AvroDefault("[]")
val field: List<String>
)
```

## Set a field default value to empty map

```kotlin
// Previously
data class TheDataClass(
@AvroDefault("{}")
val field: Map<String, String>
)

// Now
data class TheDataClass(
// ... Nothing, as it is the default behavior!
val field: Map<String, String>
)

// Or
val avro = Avro { implicitEmptyCollections = false }
data class TheDataClass(
@AvroDefault("{}")
val field: Map<String, String>
)
```

## generic data serialization
Expand Down
58 changes: 38 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Here are the main features:
- **Encode and decode** anything to and from binary format, and also in generic data :toolbox:
- **Generate schemas** based on your values and data classes :pencil:
- **Customize** the generated schemas and encoded data with annotations :construction_worker:
- **Fast** as it is reflection-less :rocket:
- **Simple API** to get started quickly, also with native support of `java.time`, `BigDecimal`, `BigInteger` and `UUID` classes :1st_place_medal:
- **Fast** as it is reflection-less :rocket: (check the benchmarks [here](benchmark/README.md#results))
- **Simple API** to get started quickly, also with native support of java standard classes like `UUID`, `BigDecimal`, `BigInteger` and `java.time` module :1st_place_medal:
- **Relaxed matching** for easy schema evolution as it natively [adapts compatible types](#types-matrix) :cyclone:

> [!WARNING]
Expand Down Expand Up @@ -325,6 +325,37 @@ yourAvroInstance.schema<Pizza>()

# Usage

## Customizing the configuration

By default, `Avro` is configured with the following behavior:
- `implicitNulls`: The nullable fields are considered null when decoding if the writer record's schema does not contain this field.
- `implicitEmptyCollections`: The non-nullable map and collection fields are considered empty when decoding if the writer record's schema does not contain this field.
- If `implicitNulls` is true, it takes precedence so the empty collections are set as null if the value is missing instead of an empty collection.
- `validateSerialization`: There is no validation of the schema when encoding or decoding data, which means that serializing using a custom serializer could lead to unexpected behavior. Be careful with your custom serializers. More details [in this section](#set-a-custom-schema).
- `fieldNamingStrategy`: The record's field naming strategy is using the original kotlin field name. To change it, [check this section](#changing-records-field-name).

So each time you call a method on the `Avro` object implicitely invoke the default configuration. Example:

```kotlin
Avro.encodeToByteArray(MyData("value"))
Avro.decodeFromByteArray(bytes)
Avro.schema<MyData>()
```

If you need to change the default behavior, you need to create your own instance of `Avro` with the wanted configuration:

```kotlin
val yourAvroInstance = Avro {
fieldNamingStrategy = FieldNamingStrategy.Builtins.SnakeCase
implicitNulls = false
implicitEmptyCollections = false
validateSerialization = true
}
yourAvroInstance.encodeToByteArray(MyData("value"))
yourAvroInstance.decodeFromByteArray(bytes)
yourAvroInstance.schema<MyData>()
```

## Types matrix

| Kotlin type | Generated schema type | Other compatible writer types | Compatible logical type | Note / Serializer class |
Expand Down Expand Up @@ -529,6 +560,7 @@ There is 3 built-ins strategies:
- `NoOp` (default): keeps the original kotlin field name
- `SnakeCase`: converts the original kotlin field name to snake_case with underscores before each uppercase letter
- `PascalCase`: upper-case the first letter of the original kotlin field name
- If you need more, please [file an issue](https://github.com/avro-kotlin/avro4k/issues/new/choose)

First, create your own instance of `Avro` with the wanted naming strategy:

Expand Down Expand Up @@ -556,9 +588,9 @@ val schema = myCustomizedAvroInstance.schema<MyData>() // {...,"fields":[{"name"
While reading avro binary data, you can miss a field (a kotlin field is present but not in the avro binary data), so Avro4k fails as it is not capable of constructing the kotlin
type without the missing field value.

> [!NOTE]
> By default, all nullable fields are optional as a `default: null` is automatically added to the schema ([check this section](#disable-implicit-default-null-for-nullable-fields)
> to opt out from this default behavior).
By default:
- nullable fields are optional and `default: null` is automatically added to the field ([check this section](#disable-implicit-default-null-for-nullable-fields) to opt out from this default behavior).
- nullable fields are optional and `default: null` is automatically added to the field ([check this section](#disable-implicit-default-null-for-nullable-fields) to opt out from this default behavior).

### @AvroDefault

Expand Down Expand Up @@ -593,7 +625,7 @@ data class MyData(
)
```

> This impacts only the deserialization of the field, and not the serialization or deserialization.
> This impacts only the deserialization of the field, and not the serialization or the schema generation.
## Add aliases

Expand Down Expand Up @@ -775,20 +807,6 @@ data class Foo(val a: String, @Transient val b: String = "default value")
> [!NOTE]
> This impacts the schema generation, the serialization and the deserialization.
## Disable implicit `default: null` for nullable fields

Avro4k makes by default your nullable fields optional (put `default: null` on all nullable fields if no other explicit default provided).
You can opt out this feature by setting `implicitNulls` to `false` in the `Avro` configuration:

```kotlin
Avro {
implicitNulls = false
}
```

> [!NOTE]
> This impacts the schema generation, the serialization and the deserialization.
## Force a field to be a `string` type

You can force a field (or the value class' property) to have its inferred schema as a `string` type by annotating it with `@AvroString`.
Expand Down
13 changes: 8 additions & 5 deletions api/avro4k-core.api
Original file line number Diff line number Diff line change
Expand Up @@ -27,28 +27,31 @@ public synthetic class com/github/avrokotlin/avro4k/AvroAlias$Impl : com/github/
}

public final class com/github/avrokotlin/avro4k/AvroBuilder {
public final fun build ()Lcom/github/avrokotlin/avro4k/AvroConfiguration;
public final fun getFieldNamingStrategy ()Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;
public final fun getImplicitEmptyCollections ()Z
public final fun getImplicitNulls ()Z
public final fun getSerializersModule ()Lkotlinx/serialization/modules/SerializersModule;
public final fun getValidateSerialization ()Z
public final fun setFieldNamingStrategy (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;)V
public final fun setImplicitEmptyCollections (Z)V
public final fun setImplicitNulls (Z)V
public final fun setSerializersModule (Lkotlinx/serialization/modules/SerializersModule;)V
public final fun setValidateSerialization (Z)V
}

public final class com/github/avrokotlin/avro4k/AvroConfiguration {
public fun <init> ()V
public fun <init> (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZ)V
public synthetic fun <init> (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public fun <init> (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZZ)V
public synthetic fun <init> (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun component1 ()Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;
public final fun component2 ()Z
public final fun component3 ()Z
public final fun copy (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZ)Lcom/github/avrokotlin/avro4k/AvroConfiguration;
public static synthetic fun copy$default (Lcom/github/avrokotlin/avro4k/AvroConfiguration;Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZILjava/lang/Object;)Lcom/github/avrokotlin/avro4k/AvroConfiguration;
public final fun component4 ()Z
public final fun copy (Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZZ)Lcom/github/avrokotlin/avro4k/AvroConfiguration;
public static synthetic fun copy$default (Lcom/github/avrokotlin/avro4k/AvroConfiguration;Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;ZZZILjava/lang/Object;)Lcom/github/avrokotlin/avro4k/AvroConfiguration;
public fun equals (Ljava/lang/Object;)Z
public final fun getFieldNamingStrategy ()Lcom/github/avrokotlin/avro4k/FieldNamingStrategy;
public final fun getImplicitEmptyCollections ()Z
public final fun getImplicitNulls ()Z
public final fun getValidateSerialization ()Z
public fun hashCode ()I
Expand Down
18 changes: 9 additions & 9 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@ Each benchmark is executed with the following configuration:
Computer: Macbook air M2

```
Benchmark Mode Cnt Score Error Units Relative Difference (%)
Avro4kBenchmark.read thrpt 5 21443.935 ± 2215.328 ops/s 0.00%
ApacheAvroReflectBenchmark.read thrpt 5 19803.543 ± 485.869 ops/s -7.64%
Avro4kGenericWithApacheAvroBenchmark.read thrpt 5 8836.787 ± 404.874 ops/s -58.79%
Avro4kBenchmark.write thrpt 5 50565.556 ± 849.344 ops/s 0.00%
ApacheAvroReflectBenchmark.write thrpt 5 46872.768 ± 2406.622 ops/s -7.30%
JacksonAvroBenchmark.write thrpt 5 32349.182 ± 10105.111 ops/s -36.01%
Avro4kGenericWithApacheAvroBenchmark.write thrpt 5 27471.887 ± 315.498 ops/s -45.67%
Benchmark Mode Cnt Score Error Units Relative Difference (%)
Avro4kBenchmark.read thrpt 5 22306.113 ± 208.516 ops/s 0.00%
ApacheAvroReflectBenchmark.read thrpt 5 21048.047 ± 3974.761 ops/s -5.65%
Avro4kGenericWithApacheAvroBenchmark.read thrpt 5 8366.754 ± 975.268 ops/s -62.49%
Avro4kBenchmark.write thrpt 5 54307.187 ± 789.593 ops/s 0.00%
ApacheAvroReflectBenchmark.write thrpt 5 48056.580 ± 2290.755 ops/s -11.52%
JacksonAvroBenchmark.write thrpt 5 36193.366 ± 1124.036 ops/s -33.34%
Avro4kGenericWithApacheAvroBenchmark.write thrpt 5 28268.377 ± 117.031 ops/s -47.96%
```

> [!WARNING]
Expand Down
6 changes: 5 additions & 1 deletion src/main/kotlin/com/github/avrokotlin/avro4k/Avro.kt
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,18 @@ public class AvroBuilder internal constructor(avro: Avro) {
@ExperimentalSerializationApi
public var implicitNulls: Boolean = avro.configuration.implicitNulls

@ExperimentalSerializationApi
public var implicitEmptyCollections: Boolean = avro.configuration.implicitEmptyCollections

@ExperimentalSerializationApi
public var validateSerialization: Boolean = avro.configuration.validateSerialization
public var serializersModule: SerializersModule = EmptySerializersModule()

public fun build(): AvroConfiguration =
internal fun build(): AvroConfiguration =
AvroConfiguration(
fieldNamingStrategy = fieldNamingStrategy,
implicitNulls = implicitNulls,
implicitEmptyCollections = implicitEmptyCollections,
validateSerialization = validateSerialization
)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ public data class AvroConfiguration(
*/
@ExperimentalSerializationApi
val implicitNulls: Boolean = true,
/**
* By default, set to `true`, the array & map fields that haven't any default value are set as an empty array or map if the value is missing. It also adds `"default": []` for arrays or `"default": {}` for maps to those fields when generating schema using avro4k.
*
* If `implicitNulls` is true, the empty collections are set as null if the value is missing.
*
* When set to `false`, during decoding, any missing content for an array or a map field without its empty default value is failing.
*/
@ExperimentalSerializationApi
val implicitEmptyCollections: Boolean = true,
/**
* **To be removed when binary support is stable.**
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,36 +139,46 @@ internal class RecordResolver(
if (visited) return@forEachIndexed

val readerDefaultAnnotation = classDescriptor.findElementAnnotation<AvroDefault>(elementIndex)
// TODO try to fallback on the default value of the writer schema field if no readerDefaultAnnotation
val readerField = readerSchema.fields[elementIndex]

decodingSteps +=
if (readerDefaultAnnotation != null) {
val elementSchema = readerSchema.fields[elementIndex].schema()
DecodingStep.GetDefaultValue(
elementIndex = elementIndex,
schema = elementSchema,
defaultValue = readerDefaultAnnotation.parseValueToGenericData(elementSchema)
schema = readerField.schema(),
defaultValue = readerDefaultAnnotation.parseValueToGenericData(readerField.schema())
)
} else if (classDescriptor.isElementOptional(elementIndex)) {
DecodingStep.IgnoreOptionalElement(elementIndex)
} else if (avro.configuration.implicitNulls &&
(
classDescriptor.getElementDescriptor(elementIndex).isNullable ||
classDescriptor.getElementDescriptor(elementIndex).isInline && classDescriptor.getElementDescriptor(elementIndex).getElementDescriptor(0).isNullable
)
) {
} else if (avro.configuration.implicitNulls && readerField.schema().isNullable) {
DecodingStep.GetDefaultValue(
elementIndex = elementIndex,
schema = NULL_SCHEMA,
schema = readerField.schema().asSchemaList().first { it.type === Schema.Type.NULL },
defaultValue = null
)
} else if (avro.configuration.implicitEmptyCollections && readerField.schema().isTypeOf(Schema.Type.ARRAY)) {
DecodingStep.GetDefaultValue(
elementIndex = elementIndex,
schema = readerField.schema().asSchemaList().first { it.type === Schema.Type.ARRAY },
defaultValue = emptyList<Any>()
)
} else if (avro.configuration.implicitEmptyCollections && readerField.schema().isTypeOf(Schema.Type.MAP)) {
DecodingStep.GetDefaultValue(
elementIndex = elementIndex,
schema = readerField.schema().asSchemaList().first { it.type === Schema.Type.MAP },
defaultValue = emptyMap<String, Any>()
)
} else {
DecodingStep.MissingElementValueFailure(elementIndex)
}
}
return decodingSteps.toTypedArray()
}

private fun Schema.isTypeOf(expectedType: Schema.Type): Boolean {
return asSchemaList().any { it.type === expectedType }
}

private fun computeEncodingSteps(
classDescriptor: SerialDescriptor,
writerSchema: Schema,
Expand Down Expand Up @@ -265,7 +275,9 @@ internal sealed interface DecodingStep {

/**
* The element is present in the class descriptor but not in the writer schema, so the default value is used.
* Also, if the [com.github.avrokotlin.avro4k.AvroConfiguration.implicitNulls] is enabled, the default value is `null`.
* Also:
* - if the [com.github.avrokotlin.avro4k.AvroConfiguration.implicitNulls] is enabled, the default value is `null`.
* - if the [com.github.avrokotlin.avro4k.AvroConfiguration.implicitEmptyCollections] is enabled, the default value is an empty array or map.
*/
data class GetDefaultValue(
override val elementIndex: Int,
Expand Down Expand Up @@ -404,6 +416,4 @@ private fun Schema.resolveUnion(
throw SerializationException("Union type does not contain one of ${expectedTypes.asList()}, unable to convert default value '$value' for schema $this")
}
return types[index]
}

private val NULL_SCHEMA = Schema.create(Schema.Type.NULL)
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package com.github.avrokotlin.avro4k.internal.decoder.direct
import com.github.avrokotlin.avro4k.Avro
import com.github.avrokotlin.avro4k.internal.DecodingStep
import com.github.avrokotlin.avro4k.internal.decoder.generic.AvroValueGenericDecoder
import com.github.avrokotlin.avro4k.internal.nonNullSerialName
import kotlinx.serialization.DeserializationStrategy
import kotlinx.serialization.SerializationException
import kotlinx.serialization.builtins.ByteArraySerializer
Expand All @@ -14,13 +15,13 @@ import org.apache.avro.generic.GenericFixed
import org.apache.avro.io.Decoder

internal class RecordDirectDecoder(
recordSchema: Schema,
private val writerRecordSchema: Schema,
descriptor: SerialDescriptor,
avro: Avro,
binaryDecoder: org.apache.avro.io.Decoder,
) : AbstractAvroDirectDecoder(avro, binaryDecoder) {
// from descriptor element index to schema field. The missing fields are at the end to decode the default values
private val classDescriptor = avro.recordResolver.resolveFields(recordSchema, descriptor)
private val classDescriptor = avro.recordResolver.resolveFields(writerRecordSchema, descriptor)
private lateinit var currentDecodingStep: DecodingStep.ValidatedDecodingStep
private var nextDecodingStepIndex = 0

Expand All @@ -40,7 +41,11 @@ internal class RecordDirectDecoder(

is DecodingStep.SkipWriterField -> binaryDecoder.skip(field.schema)
is DecodingStep.MissingElementValueFailure -> {
throw SerializationException("No writer schema field matching element index ${field.elementIndex} in descriptor $descriptor")
throw SerializationException(
"Reader field '${descriptor.nonNullSerialName}.${descriptor.getElementName(
field.elementIndex
)}' has no corresponding field in writer schema $writerRecordSchema"
)
}

is DecodingStep.DeserializeWriterField -> {
Expand Down
Loading

0 comments on commit 999a90c

Please sign in to comment.