Skip to content

Add support for fastavro when reader schema isn't provided. #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add support for fastavro when reader schema isn't provided. #773

wants to merge 1 commit into from

Conversation

0x26res
Copy link

@0x26res 0x26res commented Jan 30, 2020

Related issue: #616

This is causing issues for people who want to use fastavro but don't want to provide a reader schema.

It's not just a performance issue, as records are different when using fastavro (for example logical date time are represented as python datetime in fastavro, instead of int).

I've tested it on my local environment but it'd be better if we can write unit test for this.

Also it'd be good to get rid of this try catch block that surrounds this statement and swallows important errors. When fastavro isn't installed but doens't work, it should be explicit.

@ghost
Copy link

ghost commented Jan 30, 2020

It looks like @arthurandres hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@0x26res 0x26res requested a review from edenhill January 30, 2020 10:42
@0x26res
Copy link
Author

0x26res commented Jan 30, 2020

[clabot:check]

@ghost
Copy link

ghost commented Jan 30, 2020

@confluentinc It looks like @arthurandres just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@0x26res
Copy link
Author

0x26res commented Feb 6, 2020

@edenhill, could you have a look at this PR?

I've tried to write tests but I'm unable to set up my working environment correctly. I find the DEVELOPER.md a bit lacking on the subject.

I've ran python setup.py install successfully but I still get No module named 'confluent_kafka.cimpl when running ./test/run.sh unit

@rnpridgeon
Copy link
Contributor

@arthurandres we are currently in the process of wrapping up a refactor of the avro module which completely removes apache avro in favor of first class FastAvro support. This eliminates the clunky fallback logic in the serializer and cuts down on the amount of cache lookups which speeds things up considerably. Functionally this will be a backwards compatible meaning the reader schema functionality will be present. The pr isn't open yet because there are some generic serialization api discussions underway so we can add protobuf and json support in an upcoming release as well. I have a few things to update from our API discussion then I'll push the PR. n the meantime please do feel free to poke around what exists today.

https://github.com/confluentinc/confluent-kafka-python/tree/FirstClassFastAvro/confluent_kafka/avro

@0x26res
Copy link
Author

0x26res commented Feb 13, 2020

@rnpridgeon , thanks for the update.

@dylrich
Copy link
Contributor

dylrich commented Mar 13, 2020

Is it possible to merge this in and make a release for now while you're kicking around the new API? We haven't been able to serialize Decimal types for ages and only today after many hours of debugging did we notice that this was because fastavro was never actually being used post confluent_kafka 1.0.0. We use the registry and don't want to pass in reader schemas, and the silent fallback to Apache avro (because of the bug this PR fixes) fooled us.

@0x26res
Copy link
Author

0x26res commented Mar 15, 2020

Is it possible to merge this in and make a release for now while you're kicking around the new API? We haven't been able to serialize Decimal types for ages and only today after many hours of debugging did we notice that this was because fastavro was never actually being used post confluent_kafka 1.0.0. We use the registry and don't want to pass in reader schemas, and the silent fallback to Apache avro (because of the bug this PR fixes) fooled us.

If it's really an issue you can monkey patch MessageSerializer._get_decoder_func. We've resolved to using this in the interim and it works well for us, though it's not pretty.

@dylrich
Copy link
Contributor

dylrich commented Apr 9, 2020

@rnpridgeon It looks like the first class fastavro branch was merged in #787 but it doesn't actually fix the bug this PR addresses. Any update on when we could expect that bugfix to land in a release?

@0x26res
Copy link
Author

0x26res commented Apr 10, 2020

Is this line causing the issue:

self._reader_schema = parse_schema(loads(schema_str))
?

@dylrich it shouldn't be too hard in the interim for you to create your own Deserializer and provide it when creating the consuer.

@dylrich
Copy link
Contributor

dylrich commented Apr 10, 2020

I've monkey patched all of our code as you suggested up above, but I'd really like to delete all of that nonsense given that this is a super simple bugfix.

@andrewegel
Copy link

Causing CI to break due to the "unknown repository" - This needs to be closed to get around that. If this change is still required, please re-do it.

@andrewegel andrewegel closed this Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants