Skip to content

[FLINK-33045] Make it possible to disable auto-registering schema in Schema Registry #26662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,13 @@ Format 参数
<td>String</td>
<td>The URL of the Confluent Schema Registry to fetch/register schemas.</td>
</tr>
<tr>
<td><h5>auto.register.schemas</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">true</td>
<td>Boolean</td>
<td>Whether to automatically register schemas with the Confluent Schema Registry if they don't exist. When set to <code>false</code>, schemas must be manually registered in the Schema Registry before being used. When set to <code>true</code>, schemas will be automatically registered during serialization if they don't already exist. The default value is <code>true</code>.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we document anywhere in which scenario we auto register schemas for example when reading or writing from/to a table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fapaul I guess saying serialization implies that we creating an outbound flow with the schema id- e.g. writing to a sink. We could document this more explicitly.

</tr>
</tbody>
</table>

Expand Down
8 changes: 8 additions & 0 deletions docs/content/docs/connectors/table/formats/avro-confluent.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,14 @@ Format Options
<td>String</td>
<td>The URL of the Confluent Schema Registry to fetch/register schemas.</td>
</tr>
<tr>
<td><h5>auto.register.schemas</h5></td>
<td>optional</td>
<td>yes</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this line here but not in the Chinese?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't know Chinese :) So I'll follow the process that's at https://flink.apache.org/how-to-contribute/contribute-documentation/#chinese-documentation-translation after this PR is merged

Copy link
Contributor

@davidradl davidradl Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed we would just add here as the optional above is not translated. <td>yes</td> as per the note in the link you sent. But it sounds like you have this in hand.

<td style="word-wrap: break-word;">true</td>
<td>Boolean</td>
<td>Whether to automatically register schemas with the Confluent Schema Registry if they don't exist. When set to <code>false</code>, schemas must be manually registered in the Schema Registry before being used. When set to <code>true</code>, schemas will be automatically registered during serialization if they don't already exist. The default value is <code>true</code>.</td>
</tr>
</tbody>
</table>

Expand Down
157 changes: 108 additions & 49 deletions flink-end-to-end-tests/flink-confluent-schema-registry/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -43,101 +43,160 @@ under the License.
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<artifactId>flink-table-common</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka -->

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka</artifactId>
<version>3.0.0-1.17</version>
<artifactId>flink-table-api-java-bridge</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<!-- Make sure that Shaded Guava matches the one used in the flink-connector-kafka,
or remove when FLINK-32462 is resolved -->

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-guava</artifactId>
<version>30.1.1-jre-16.1</version>
<artifactId>flink-end-to-end-tests-common</artifactId>
<version>${project.version}</version>
</dependency>

<!-- This enables the WebUI during tests. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-avro</artifactId>
<artifactId>flink-runtime-web</artifactId>
<version>${project.version}</version>
<scope>test</scope>
</dependency>

<!-- The following dependencies are for connector/format sql-jars that
we copy using the maven-dependency-plugin. When extending the test
to cover more connectors/formats, add a dependency here and an entry
to the dependency-plugin configuration below.
This ensures that all modules we actually need (as defined by the
dependency-plugin configuration) are built before this module. -->
<dependency>
<!-- Used by maven-dependency-plugin -->
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-kafka</artifactId>
<version>4.0.0-2.0</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-avro-confluent-registry</artifactId>
<artifactId>flink-sql-avro-confluent-registry</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.9.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>kafka</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>7.2.2</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik the serializer is versioned similarly to kafka. 7.2.2 should be Kafka 3.2. Can we upgrade the serializer to 7.9.0 to be inline with the used kafka version 3.9.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good one!

<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.3</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<scope>test</scope>
</dependency>

</dependencies>

<dependencyManagement>
<dependencies>
<dependency>
<!-- Pick an arbitrary version here to satisfy the enforcer-plugin,
as we neither access nor package the kafka dependencies -->
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.9.0</version>
</dependency>
</dependencies>
</dependencyManagement>

<build>
<plugins>
<!-- Build toolbox jar. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<id>TestAvroConsumerConfluent</id>
<!-- Use a special execution id to be ignored by license/optional checks -->
<id>e2e-dependencies</id>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>TestAvroConsumerConfluent</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.apache.flink.schema.registry.test.TestAvroConsumerConfluent</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro.version}</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/target/generated-sources/</outputDirectory>
<fieldVisibility>PRIVATE</fieldVisibility>
<includes>
<include>**/*.avsc</include>
</includes>
<finalName>SqlToolbox</finalName>
</configuration>
</execution>
</executions>
</plugin>

<!-- Copy SQL jars into dedicated "sql-jars" directory. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>dependency-convergence</id>
<id>copy</id>
<phase>package</phase>
<goals>
<goal>enforce</goal>
<goal>copy</goal>
</goals>
<configuration>
<skip>true</skip>
<outputDirectory>${project.build.directory}/sql-jars</outputDirectory>
<!-- List of currently provided SQL jars.
When extending this list please also add a dependency
for the respective module. -->
<artifactItems>
<artifactItem>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-kafka</artifactId>
<version>4.0.0-2.0</version>
<type>jar</type>
</artifactItem>
<artifactItem>
<groupId>org.apache.flink</groupId>
<artifactId>flink-test-utils</artifactId>
<version>${project.version}</version>
<type>jar</type>
</artifactItem>
<artifactItem>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-avro-confluent-registry</artifactId>
<version>${project.version}</version>
<type>jar</type>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<configuration>
<excludes>**/example/avro/*</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@
* limitations under the License.
*/

{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string", "default": ""},
{"name": "favoriteNumber", "type": "string", "default": ""},
{"name": "favoriteColor", "type": "string", "default": ""},
{"name": "eventType","type": {"name": "EventType","type": "enum", "symbols": ["meeting"] }}
]
}
{
"namespace": "org.apache.flink.avro.generated",
"type": "record",
"name": "record",
"fields": [
{"name": "name", "type": ["null", "string"], "default": null},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why did you change the used type of the example? Is this related to registration?

Copy link
Contributor Author

@MartijnVisser MartijnVisser Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling auto registration only means that Flink won't try to register the schema in Schema Registry during every run. However, it still means that the schema that has been registered in Schema Registry by external service, is either exactly how Flink would have registered it (so with that specific namespace), or how the user would have provided it via the avro-confluent.schema table property.

{"name": "favoriteNumber", "type": ["null", "string"], "default": null},
{"name": "favoriteColor", "type": ["null", "string"], "default": null},
{"name": "eventType", "type": ["null", "string"], "default": null}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

{
"namespace": "org.apache.flink.avro.generated",
"type": "record",
"name": "record",
"fields": [
{"name": "name", "type": ["null", "string"], "default": null},
{"name": "favoriteNumber", "type": ["null", "string"], "default": null},
{"name": "favoriteColor", "type": ["null", "string"], "default": null},
{"name": "eventType", "type": ["null", "string"], "default": null}
]
}
Loading