-
Notifications
You must be signed in to change notification settings - Fork 915
Use schemaless reader to handle complex schema #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use schemaless reader to handle complex schema #251
Conversation
It looks like @fpietka hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here. Once you've signed reply with Appreciation of efforts, clabot |
[clabot:check] |
@confluentinc It looks like @fpietka just signed our Contributor License Agreement. 👍 Always at your service, clabot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fpietka Can you explain why this would be correct? schemaless_reader
doesn't look like it is intended to be public API (in fact, it looks like we should be using from fastavro import load
). And read_data
invokes schemaless_reader
, so why would omitting the other stuff read_data
does be correct?
I'll try to explain as I understood how it works. Avro messages in Kafka have special headers to keep schema ID (https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html#wire-format) Which means I also think that neither In my case, I was getting data from Debezium. Binlogs have a before and an after key which are identical. In the schema, the after key was a named reference to the part defined in the before key. When we call directly Since here we bypass the regular header, calling I hope it clarifies my intention in this PR. |
I thought I might try to provide a little context as I added the As for |
…th Cython as well
I changed the import because it wasn't working with Cython. Now this works for both. |
Ok, finally had a chance to look at this again. I think I don't totally understand fastavro's APIs, but I think I understand what the |
I bumped into an issue lately trying to decode payloads coming from Debezium (MySQL binlogs). Fastavro wasn't used because of a
KeyError
exception in the library.I discussed the issue here and found out
read_data()
wasn't able to decode avro messages with named types in their schema.Here I replaced it with
schemaless_reader()
which acquire the schema tp be able to decode it:https://github.com/tebeka/fastavro/blob/master/fastavro/reader.py#L596-L607