-
Notifications
You must be signed in to change notification settings - Fork 917
Add support for fastavro when reader schema isn't provided. #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fastavro when reader schema isn't provided. #773
Conversation
It looks like @arthurandres hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here. Once you've signed reply with Appreciation of efforts, clabot |
[clabot:check] |
@confluentinc It looks like @arthurandres just signed our Contributor License Agreement. 👍 Always at your service, clabot |
@edenhill, could you have a look at this PR? I've tried to write tests but I'm unable to set up my working environment correctly. I find the I've ran |
@arthurandres we are currently in the process of wrapping up a refactor of the avro module which completely removes apache avro in favor of first class FastAvro support. This eliminates the clunky fallback logic in the serializer and cuts down on the amount of cache lookups which speeds things up considerably. Functionally this will be a backwards compatible meaning the reader schema functionality will be present. The pr isn't open yet because there are some generic serialization api discussions underway so we can add protobuf and json support in an upcoming release as well. I have a few things to update from our API discussion then I'll push the PR. n the meantime please do feel free to poke around what exists today. https://github.com/confluentinc/confluent-kafka-python/tree/FirstClassFastAvro/confluent_kafka/avro |
@rnpridgeon , thanks for the update. |
Is it possible to merge this in and make a release for now while you're kicking around the new API? We haven't been able to serialize Decimal types for ages and only today after many hours of debugging did we notice that this was because |
If it's really an issue you can monkey patch |
@rnpridgeon It looks like the first class fastavro branch was merged in #787 but it doesn't actually fix the bug this PR addresses. Any update on when we could expect that bugfix to land in a release? |
Is this line causing the issue:
@dylrich it shouldn't be too hard in the interim for you to create your own Deserializer and provide it when creating the consuer. |
I've monkey patched all of our code as you suggested up above, but I'd really like to delete all of that nonsense given that this is a super simple bugfix. |
Causing CI to break due to the "unknown repository" - This needs to be closed to get around that. If this change is still required, please re-do it. |
Related issue: #616
This is causing issues for people who want to use fastavro but don't want to provide a reader schema.
It's not just a performance issue, as records are different when using fastavro (for example logical date time are represented as python
datetime
in fastavro, instead ofint
).I've tested it on my local environment but it'd be better if we can write unit test for this.
Also it'd be good to get rid of this try catch block that surrounds this statement and swallows important errors. When fastavro isn't installed but doens't work, it should be explicit.