Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding avro schema #1917

Merged
merged 12 commits into from
Jun 11, 2018
Merged

adding avro schema #1917

merged 12 commits into from
Jun 11, 2018

Conversation

jerrypeng
Copy link
Contributor

No description provided.

@jerrypeng jerrypeng requested review from sijie and srkukarni June 5, 2018 23:13
@jerrypeng
Copy link
Contributor Author

@mgodave please review

@jerrypeng jerrypeng self-assigned this Jun 5, 2018
@jerrypeng jerrypeng added this to the 2.1.0-incubating milestone Jun 5, 2018
@jerrypeng jerrypeng requested a review from merlimat June 5, 2018 23:16
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for Avro it would be good to have schema validation since it explicitely supports that.

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.2</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version should be defined in top-level pom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jerrypeng
Copy link
Contributor Author

@merlimat if you try a to deserialize a object with not the expected schema, an exception will be thrown

@jerrypeng
Copy link
Contributor Author

retest this please

@merlimat
Copy link
Contributor

merlimat commented Jun 6, 2018

@merlimat if you try a to deserialize a object with not the expected schema, an exception will be thrown

It needs to prevent from creating a producer if the topic has a different schema. Otherwise it's not a "schema" but just a "serde"

@sijie
Copy link
Member

sijie commented Jun 6, 2018

@mgodave please review it.

@sijie
Copy link
Member

sijie commented Jun 6, 2018

At least for Avro it would be good to have schema validation since it explicitely supports that.

I don't think there is a path to do a validation for now, unless I am not wrong. @mgodave ?

I think we should do the validation in a separate PR, more I think we need to think validation along with how SQL use schema registry.

@merlimat
Copy link
Contributor

merlimat commented Jun 6, 2018

I don't think there is a path to do a validation for now, unless I am not wrong. @mgodave ?

I think there should be some kind of validation already for JSON. Also we need to validate that the schema is there. eg: topic is set to Avro, a producer cannot connect and send "bytes".

Avro has a way to extract schema info. That is what should be stored in the registry. Otherwise the whole "registry" part is completely useless.

I think we should do the validation in a separate PR, more I think we need to think validation along with how SQL use schema registry.

Yes, if we don't store the "type" info, we cannot getting that back and discover how the data looks like

@sijie
Copy link
Member

sijie commented Jun 6, 2018

Yes, if we don't store the "type" info, we cannot getting that back and discover how the data looks like

I think this PR already store the type info, no? It does same work as json schema. or I am missing something here?

@merlimat
Copy link
Contributor

merlimat commented Jun 6, 2018

I think this PR already store the type info, no? It does same work as json schema. or I am missing something here?

Sure, though I'm not sure where the validation is being done. We should have few unit tests to verify the behavior. eg what happens when :

  • The topic is json and trying to publish avro
  • The avro schema for producer is not compatibile with existing avro schema
  • The avro schema represents an "allowed" evolution of schema
  • ..

@sijie
Copy link
Member

sijie commented Jun 6, 2018

@mgodave can you comment on this question? and do we have the test coverage?

@merlimat
Copy link
Contributor

merlimat commented Jun 6, 2018

@sijie I think that in any case, the validation would have to be done in server side, comparing the schema from producer and the latest schema from registry

@merlimat
Copy link
Contributor

merlimat commented Jun 6, 2018

I mean: Avro specific validation on the server side

@sijie
Copy link
Member

sijie commented Jun 6, 2018

Avro specific validation on the server side

I image there should be a validation method in the Schema instance. so when broker get the schema info and retrieve the schema instance, it can use that schema instance to do validation.

the validation should be part of Schema interface or a class in the schema package, and called in broker. but I think we are missing that, that's why I was suggesting us doing this in a separate PR.

@mgodave
Copy link
Contributor

mgodave commented Jun 6, 2018

Implement SchemaCompatibilityCheck and add it to the config for AVRO

@mgodave
Copy link
Contributor

mgodave commented Jun 6, 2018

by default everything is "compatible", adding the checker allows you to customize that behavior.

@jerrypeng
Copy link
Contributor Author

@merlimat @mgodave I have added the avro schema check. Though from testing, I can't seem to get the schema check to be invoked. Not sure if there is a bug or I am doing some wrong

@jerrypeng
Copy link
Contributor Author

retest this please

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM. when the question on the tests is resolved it should be good to go

@jerrypeng
Copy link
Contributor Author

@merlimat I have shaded the avro dependences. Please check if I got them all

@@ -86,6 +86,14 @@
<include>org.apache.httpcomponents:httpclient</include>
<include>commons-logging:commons-logging</include>
<include>org.apache.httpcomponents:httpcore</include>
<include>org.apache.avro:avro</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerrypeng there's also pulsar-client-kafka :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup forgot you said that one as well

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@merlimat merlimat merged commit 2dae33d into apache:master Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants