-
Notifications
You must be signed in to change notification settings - Fork 663
Avro producer #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avro producer #28
Conversation
This includes some substantial refactoring in order to support multiple formats with different producer types, API endpoints that use different content types to target different producers with different serializers, internal support for translating the raw JSON data that is initially decoded into Avro GenericRecords that can be used with the Avro serializer, and support for passing schemas into requests and returning schema ID information with responses.
|
@nehanarkhede Do you want to review this? The reasoning behind some of the design is sometimes subtle -- only discovered after trying to implement it other, simpler ways. Tried to leave notes where the design isn't necessarily intuitive. This patch is pretty big mostly because handling different input formats and the translation from JSON -> GenericRecord turned out to be kind of tricky. I think the consumer side should be simpler. |
|
Blocked by confluentinc/schema-registry#94 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this API is unused. Do you have an expected usage for this API in a future patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think I lost one of my changes during rebase. Fixed (including fix to AvroProducerTest to verify the offsets this is used to set).
This provides the producer side of the Avro patch. This turned out to be tricky to get right because of a number of semi-conflicting goals. Getting reuse between the input entity classes, including for both producing to topcis and partitions, using a single set of producer code that supports all the formats, handling the two formats required to support Avro (the original JSON input + GenericRecord and translating between them at the right time), and avoiding complete duplication of test code was difficult to balance. Most of the more unusual looking classes (e.g. SchemaHolder, ProduceRecordBase) are due to trying to balance this.
I think the main implementation eventually ended up working out OK even if it is a bit more convoluted than I would have liked. I am less happy with the tests, where there is more duplication of code than I would like. I think we could probably get more reuse, but I was also concerned with that code becoming too confusing if it used too many generic helper methods that could be shared between the binary and Avro producers. Suggestions for how to improve that code would be welcome.