Skip to content

Commit a4c715a

Browse files
committed
Describe collections format in README.md
1 parent 4249cc9 commit a4c715a

File tree

1 file changed

+21
-1
lines changed

1 file changed

+21
-1
lines changed

README.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ The connector has the following limitations:
2626
- Only row-level operations are produced (`INSERT`, `UPDATE`, `DELETE`):
2727
- Partition deletes - those changes are ignored
2828
- Row range deletes - those changes are ignored
29-
- No support for collection types (`LIST`, `SET`, `MAP`) and `UDT` - columns with those types are omitted from generated messages
3029
- No support for preimage and postimage - changes only contain those columns that were modified, not the entire row before/after change. More information [here](#cell-representation)
3130

3231
## Connector installation
@@ -192,6 +191,27 @@ If the operation did not modify the `v` column, the data event will contain the
192191

193192
See `UPDATE` example for full data change event's value.
194193

194+
#### Collections
195+
Connector supports both frozen and non-frozen collections.
196+
Format for frozen collections is as follows (those structs will be stored in "Cell" mentioned above):
197+
- `List` and `Set` of type T are represented as `Schema.array(T)`. In the JSON format, this is also an array.
198+
- `Map` with key type K and value type V is represented as `Schema.map(K, V)`. In JSON, this is an array (not object!) of 2-element arrays (first element is key, second is value).
199+
- `UDT` is represented as a struct. In JSON, this is an object.
200+
201+
Non-frozen collections are a bit more complicated. `scylla.collections.mode` config defines which representation will be used. Currently, only `delta` mode is supported. In the future, more modes (e.g. preimage / postimage) may be added.
202+
203+
##### Non-frozen collections: delta mode.
204+
Each non-frozen collection column is represented as a struct, with fields `mode` and `elements`. This struct will be stored in "Cell" described previously.
205+
`mode` can be:
206+
- `MODIFY` - elements were added or deleted.
207+
- `OVERWRITE` - whole content of collection was removed, and new elements were added. If no elements were added (meaning the collection was just removed), this mode won't be used - instead, whole struct (stored in `field` value of "Cell" struct, as mentioned previously) will be null.
208+
209+
Type of `elements` field depends on collection type:
210+
- For `Set` of type T it will be `Schema.map(T, Schema.BOOLEAN_SCHEMA)`. The boolean value signals wheter value was added (true) or removed (false) from set.
211+
- For `List` of type T, it will be `Schema.map(Schema.STRING_SCHEMA, T)` - key of this map is timeuuid, as described in https://docs.scylladb.com/using-scylla/cdc/cdc-advanced-types/#lists. Removed elements are marked by null value.
212+
- For `Map` with key K and value V, it will be `Schema.map(K, V)` (same as in frozen collection). Removed elements are marked by null value.
213+
- For `UDT` it will be struct representing this UDT, bit a bit differently than in frozen UDT: each field of this struct is a "Cell" (a struct with a single field, `value`). "Cell" is used the same way as with columns - null means that the field wasn't changed, "Cell" with null value means field was removed, field with non-null value means that field was overwritten.
214+
195215
#### ScyllaExtractNewState transformer
196216
Connector provides one single message transformation (SMT), `ScyllaExtractNewState` (class: `com.scylladb.cdc.debezium.connector.transforms.ScyllaExtractNewState`).
197217
This SMT works like exactly like `io.debezium.transforms.ExtractNewRecordState` (in fact it is called underneath), but also flattens structure by extracting values from aforementioned single-field structures.

0 commit comments

Comments
 (0)