Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support TableSchema Catalog to manage table schema (schema registry) #5230

Open
coolderli opened this issue Oct 23, 2024 · 4 comments
Labels
feature New feature or request

Comments

@coolderli
Copy link
Collaborator

Describe the feature

In Kafka and Fileset, we may need a table schema to deserialize data. We can manage the external schema registry in Gravitino.

Motivation

  • Manage the table schema in Gravitino.
  • Using SQL to read the Kafka topic or Fileset when binding the table schema to the topic or fileset.

Describe the solution

We can introduce a TableSchemaCatalog to manage the TableSchema.

We can bind a table schema such as catalog.schema.table-schema to a topic or fileset when needed. So we can get the table schema from the external schema registry. We can also add a schema registry managed by gravitino, so we can directory save the table schema to the gravitino metastore.

img_v3_02fu_4b0b7b27-c424-4a5b-a0fc-42c6fb54318l

Additional context

No response

@coolderli coolderli added the feature New feature or request label Oct 23, 2024
@coolderli
Copy link
Collaborator Author

@jerryshao @shaofengshi @caican00 @xloya @lw-yang What do you think? Any other thoughts about this?

@xloya
Copy link
Collaborator

xloya commented Oct 24, 2024

I think it is a good idea to manage Schema as a resource at the same level as Table/Fileset/Messaging. In this way, we can distinguish between Managed Schema (data type is based on Gravitino) and External Schema (data type is based on the existing external Schema Registry or other systems). Then, in resources that require a specific Schema (such as some Filesets), we can bind a Schema to it. When obtaining Fileset metadata, we will also obtain the corresponding Schema and use it in some clients.

@jerryshao
Copy link
Contributor

jerryshao commented Oct 24, 2024

It's a bit strange that "schema" is an entity. Theoretically, the entity maps a data object, whereas "schema" is binding to the entity. We should think more about how to support this scenario.

@coolderli
Copy link
Collaborator Author

@jerryshao Yes. But we can manage the 'schema' in gravitino for better binding. We can locate the schema with catalog.schema.table-schema, it's easier to bind it to a data object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants