Skip to content

Commit ee47797

Browse files
adamjshookhashhar
authored andcommitted
Document Protobuf Any support in Kafka connector
1 parent 5ebbd2e commit ee47797

File tree

1 file changed

+56
-2
lines changed

1 file changed

+56
-2
lines changed

docs/src/main/sphinx/connector/kafka.rst

Lines changed: 56 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,8 @@ Property name Description
106106
``kafka.hide-internal-columns`` Controls whether internal columns are part of the table schema or not.
107107
``kafka.internal-column-prefix`` Prefix for internal columns, defaults to ``_``
108108
``kafka.messages-per-split`` Number of messages that are processed by each Trino split; defaults to ``100000``.
109+
``kafka.protobuf-any-support-enabled`` Enable support for encoding Protobuf ``any`` types to ``JSON`` by setting the property to ``true``,
110+
defaults to ``false``.
109111
``kafka.timestamp-upper-bound-force-push-down-enabled`` Controls if upper bound timestamp pushdown is enabled for topics using ``CreateTime`` mode.
110112
``kafka.security-protocol`` Security protocol for connection to Kafka cluster; defaults to ``PLAINTEXT``.
111113
``kafka.ssl.keystore.location`` Location of the keystore file.
@@ -438,7 +440,7 @@ table description supplier are:
438440
* New tables can be defined without a cluster restart.
439441
* Schema updates are detected automatically.
440442
* There is no need to define tables manually.
441-
* Some Protobuf specific types like ``oneof`` are supported and mapped to JSON.
443+
* Some Protobuf specific types like ``oneof`` and ``any`` are supported and mapped to JSON.
442444

443445
When using Protobuf decoder with the Confluent table description supplier, some
444446
additional steps are necessary. For details, refer to :ref:`kafka-requirements`.
@@ -1453,8 +1455,61 @@ Trino data type Allowed Protobuf data type
14531455
``ARRAY`` Protobuf type with ``repeated`` field
14541456
``MAP`` ``Map``
14551457
``TIMESTAMP`` ``Timestamp``, predefined in ``timestamp.proto``
1458+
``JSON`` ``oneof`` (Confluent table supplier only), ``Any``
14561459
===================================== =======================================
14571460

1461+
any
1462+
+++
1463+
1464+
Message types with an `Any <https://protobuf.dev/programming-guides/proto3/#any>`_
1465+
field contain an arbitrary serialized message as bytes and a type URL to resolve
1466+
that message's type with a scheme of ``file://``, ``http://``, or ``https://``.
1467+
The connector reads the contents of the URL to create the type descriptor
1468+
for the ``Any`` message and convert the message to JSON. This behavior is enabled
1469+
by setting ``kafka.protobuf-any-support-enabled`` to ``true``.
1470+
1471+
The descriptors for each distinct URL are cached for performance reasons and
1472+
any modifications made to the type returned by the URL requires a restart of
1473+
Trino.
1474+
1475+
For example, given the following Protobuf schema which defines ``MyMessage``
1476+
with three columns:
1477+
1478+
.. code-block:: text
1479+
1480+
syntax = "proto3";
1481+
1482+
message MyMessage {
1483+
string stringColumn = 1;
1484+
uint32 integerColumn = 2;
1485+
uint64 longColumn = 3;
1486+
}
1487+
1488+
And a separate schema which uses an ``Any`` type which is a packed message
1489+
of the above type and a valid URL:
1490+
1491+
.. code-block:: text
1492+
1493+
syntax = "proto3";
1494+
1495+
import "google/protobuf/any.proto";
1496+
1497+
message schema {
1498+
google.protobuf.Any any_message = 1;
1499+
}
1500+
1501+
The corresponding Trino column is named ``any_message`` of type ``JSON``
1502+
containing a JSON-serialized representation of the Protobuf message:
1503+
1504+
.. code-block:: text
1505+
1506+
{
1507+
"@type":"file:///path/to/schemas/MyMessage",
1508+
"longColumn":"493857959588286460",
1509+
"numberColumn":"ONE",
1510+
"stringColumn":"Trino"
1511+
}
1512+
14581513
Protobuf schema evolution
14591514
+++++++++++++++++++++++++
14601515

@@ -1481,7 +1536,6 @@ The schema evolution behavior is as follows:
14811536
Protobuf limitations
14821537
++++++++++++++++++++
14831538

1484-
* Protobuf specific types like ``any``, ``oneof`` are not supported.
14851539
* Protobuf Timestamp has a nanosecond precision but Trino supports
14861540
decoding/encoding at microsecond precision.
14871541

0 commit comments

Comments
 (0)