Skip to content

Commit 1be2c0c

Browse files
kbatuigasFeediver1paulohtb6
authored
Include kafka_internal usage in Manage Disk Space (#1257)
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com> Co-authored-by: Paulo Borges <paulohtb6@gmail.com>
1 parent 296afaf commit 1be2c0c

File tree

3 files changed

+67
-29
lines changed

3 files changed

+67
-29
lines changed

modules/develop/pages/transactions.adoc

Lines changed: 46 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@
66

77
Redpanda supports Apache Kafka®-compatible transaction semantics and APIs. For example, you can fetch messages starting from the last consumed offset and transactionally process them one by one, updating the last consumed offset and producing events at the same time.
88

9+
A transaction can span partitions from different topics, and a topic can be deleted while there are active transactions on one or more of its partitions. In-flight transactions can detect deletion events, remove the deleted partitions (and related messages) from the transaction scope, and commit changes to the remaining partitions.
10+
911
If a producer is sending multiple messages to the same or different partitions, and network connectivity or broker failure cause the transaction to fail, then it's guaranteed that either all messages are written to the partitions or none. This is important for applications that require strict guarantees, like financial services transactions.
1012

11-
Transactions guarantee both exactly-once semantics (EOS) and atomicity.
13+
Transactions guarantee both exactly-once semantics (EOS) and atomicity:
1214

1315
* EOS helps developers avoid the anomalies of at-most-once processing (with potential lost events) and at-least-once processing (with potential duplicated events). Redpanda supports EOS when transactions are used in combination with xref:develop:produce-data/idempotent-producers.adoc[idempotent producers].
1416
* Atomicity additionally commits a set of messages across partitions as a unit: either all messages are committed or none. Encapsulated data received or sent across multiple topics in a single operation can only succeed or fail globally.
@@ -23,31 +25,18 @@ endif::[]
2325

2426
== Use transactions
2527

26-
By default, the `enable_transactions` cluster configuration property is set to true. However, in the following use cases, clients must explicitly use the Transactions API to perform operations within a transaction.
27-
28-
The required `transactional.id` property acts as a producer identity. It enables reliability semantics that span multiple producer sessions by allowing the client to guarantee that all transactions issued by the client with the same ID have completed prior to starting any new transactions.
29-
30-
The two primary use cases for transactions are:
28+
By default, the config_ref:enable_transactions,true,properties/cluster-properties[`enable_transactions`] cluster configuration property is set to true. However, in the following use cases, clients must explicitly use the Transactions API to perform operations within a transaction:
3129

3230
* xref:develop:transactions.adoc#atomic-publishing-of-multiple-messages[Atomic (all or nothing) publishing of multiple messages]
3331
* xref:develop:transactions.adoc#exactly-once-stream-processing[Exactly-once stream processing]
3432

35-
=== Transaction usage tips
36-
37-
* A transaction can span partitions from different topics, and a topic can be deleted while there are active transactions on one or more of its partitions. In-flight transactions can detect deletion events, remove the deleted partitions (and related messages) from the transaction scope, and commit changes to the remaining partitions.
38-
* Ongoing transactions can prevent consumers from advancing. To avoid this, don't set transaction timeout (`transaction.timeout.ms` in Java client) to high values: the higher the timeout, the longer consumers may be blocked. By default, it's about a minute, but it's a client setting that depends on the client.
39-
ifndef::env-cloud[]
40-
* When running transactional workloads from clients, tune xref:reference:cluster-properties#max_transactions_per_coordinator[`max_transactions_per_coordinator`] to the number of active transactions that you expect your clients to run at any given time (if your client transaction IDs are not reused). The total number of transactions in the cluster at any one time is `max_transactions_per_coordinator * transaction_coordinator_partitions` (default is 50). When the threshold is exceeded, Redpanda terminates old sessions. If an idle producer corresponding to the terminated session wakes up and produces, its batches are rejected with the message `invalid producer epoch` or `invalid_producer_id_mapping`, depending on where it is in the transaction execution phase.
41-
Be aware that if you keep the default as 50 and your clients create a new ID for every transaction, the total continues to accumulate, which bloats memory.
42-
* When upgrading a self-managed deployment, make sure to use maintenance mode with a glossterm:rolling upgrade[].
43-
44-
endif::[]
33+
When you use transactions, you must set the https://kafka.apache.org/documentation/#producerconfigs_transactional.id[`transactional.id`^] property in the producer configuration. This property uniquely identifies the producer and enables reliable semantics across multiple producer sessions. It ensures that all transactions issued by a given producer are completed before any new transactions are started.
4534

4635
=== Atomic publishing of multiple messages
4736

48-
With its event sourcing microservice architecture, a banking IT system illustrates the necessity for transactions well. A bank has multiple branches, and each branch is an independent microservice that manages its own non-intersecting set of accounts. Each branch keeps its own ledger, which is represented as a Redpanda partition. When a branch representing a microservice starts, it replays its ledger to reconstruct the actual state.
37+
A banking IT system with an event-sourcing microservice architecture illustrates why transactions are necessary. In this system, each bank branch is implemented as an independent microservice that manages its own distinct set of accounts. Every branch maintains its own transaction history, stored as a Redpanda partition. When a branch starts, it replays the transaction history to reconstruct its current state.
4938

50-
Financial transactions (money transfers) require the following guarantees:
39+
Financial transactions such as money transfers require the following guarantees:
5140

5241
* A sender can't withdraw more than the account withdrawal limit.
5342
* A recipient receives exactly the same amount sent.
@@ -61,6 +50,9 @@ Things get more complex with cross-branch financial transactions, because they i
6150

6251
Redpanda natively supports transactions, so it's possible to atomically update several ledgers at the same time. For example:
6352

53+
.Show multi-ledger transaction example:
54+
[%collapsible]
55+
====
6456
[,java]
6557
----
6658
Properties props = new Properties();
@@ -131,6 +123,7 @@ while (true) {
131123
// TIP: notify the initiator of a transaction about the success
132124
}
133125
----
126+
====
134127

135128
When a transaction fails before a `commitTransaction` attempt completes, you can assume that it is not executed. When a transaction fails after a `commitTransaction` attempt completes, the true transaction status is unknown. Redpanda only guarantees that there isn't a partial result: either the transaction is committed and complete, or it is fully rolled back.
136129

@@ -154,16 +147,19 @@ Postgresql -> topic(1) transform topic(2) -> warehouse
154147

155148
The transformation reads a record from `topic(1)`, processes it, and writes it to `topic(2)`. Without transactions, an intermittent error can cause a message to be lost or processed several times. With transactions, Redpanda guarantees exactly-once semantics. For example:
156149

150+
.Show exactly-once processing example:
151+
[%collapsible]
152+
====
157153
[,java]
158154
----
159155
var source = "source-topic";
160-
var target = "target-topic"
156+
var target = "target-topic";
161157
162158
Properties pprops = new Properties();
163159
pprops.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "...");
164160
pprops.put(ProducerConfig.ACKS_CONFIG, "all");
165161
pprops.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
166-
pprops.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.newUUID());
162+
pprops.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString());
167163
168164
Properties cprops = new Properties();
169165
cprops.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "...");
@@ -260,19 +256,41 @@ while (true) {
260256
}
261257
}
262258
----
259+
====
263260

264-
Different transactions require different approaches to handling failures within the application. Consider the approaches to failed or timed-out transactions in the provided use cases.
265-
266-
* Publishing of multiple messages: The request came from outside the system, and it is the application's responsibility to discover the true status of a timed-out transaction. (This example doesn't use consumer groups to distribute partitions between consumers.)
267-
* Exactly-once streaming (consume-transform-loop): This is a closed system. Upon re-initialization of the consumer and producer, the system automatically discovers the moment it was interrupted and continues from that place. Additionally, this automatically scales by the number of partitions. Run another instance of the application, and it starts processing its share of partitions in the source topic.
268-
269-
== Enabling exactly-once processing
261+
==== Exactly-once processing configuration requirements
270262

271-
The default configuration of Redpanda supports exactly-once processing. Preserving this capability requires maintaining the following settings:
263+
Redpanda’s default configuration supports exactly-once processing. To preserve this capability, ensure the following settings are maintained:
272264

273265
* `enable_idempotence = true`
274266
* `enable_transactions = true`
275-
* `transaction_coordinator_delete_retention_ms >= transactional_id_expiration_ms`
267+
* `transaction_coordinator_delete_retention_ms` is greater than or equal to `transactional_id_expiration_ms`
268+
269+
== Best practices
270+
271+
To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda:
272+
273+
* If a consumer is configured to use the read_committed isolation level, it can only process successfully committed transactions. As a result, an ongoing transaction with a large timeout that becomes stuck could prevent the consumer from processing other committed transactions.
274+
+
275+
To avoid this, don't set the transaction timeout client setting (`transaction.timeout.ms` in the Kafka Java client implementation) to a value that is too high. The longer the timeout, the longer consumers may be blocked.
276+
ifndef::env-cloud[]
277+
* When running transactional workloads from clients, tune xref:reference:cluster-properties#max_transactions_per_coordinator[`max_transactions_per_coordinator`] to match the number of concurrent transactions your clients run (if your client transaction IDs are not reused).
278+
+
279+
The total number of transactions allowed in the cluster at any time is determined by `max_transactions_per_coordinator * transaction_coordinator_partitions` (default is 50 partitions). When the limit is exceeded, Redpanda terminates old sessions. If an idle producer corresponding to a terminated session becomes active and tries to produces again, Redpanda rejects its batches with an `invalid producer epoch` or `invalid_producer_id_mapping` error, depending on where it is in the transaction execution phase.
280+
+
281+
Be aware that if you keep the `transaction_coordinator_partitions` at the default of 50 and your clients create a new ID for every transaction, the total continues to accumulate, which bloats memory.
282+
* Transactional metadata is stored in the internal topic `kafka_internal/tx`. Over time, this topic can consume disk space. You can manage its disk usage by tuning the `transaction_coordinator_delete_retention_ms` and `transactional_id_expiration_ms` cluster properties.
283+
+
284+
See also: xref:manage:cluster-maintenance/disk-utilization.adoc#manage-transaction-coordinator-disk-usage[Manage Disk Space]
285+
* When upgrading a self-managed deployment, make sure to use maintenance mode with a glossterm:rolling upgrade[].
286+
endif::[]
287+
288+
== Handle transaction failures
289+
290+
Different transactions require different approaches to handling failures within the application. Consider the approaches to failed or timed-out transactions in the provided use cases:
291+
292+
* Publishing of multiple messages: The request came from outside the system, and it is the application's responsibility to discover the true status of a timed-out transaction. (This example doesn't use consumer groups to distribute partitions between consumers.)
293+
* Exactly-once streaming (consume-transform-loop): This is a closed system. Upon re-initialization of the consumer and producer, the system automatically discovers the moment it was interrupted and continues from that place. Additionally, this automatically scales by the number of partitions. Run another instance of the application, and it starts processing its share of partitions in the source topic.
276294

277295
== Transactions with compacted segments
278296

modules/manage/pages/cluster-maintenance/disk-utilization.adoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,24 @@ For periodic offset expiration, set the retention duration of consumer group off
163163

164164
Redpanda supports group offset deletion with the Kafka OffsetDelete API through rpk with the xref:reference:rpk/rpk-group/rpk-group-offset-delete.adoc[`rpk group offset-delete`] command. The offset delete API provides finer control over culling consumer offsets. For example, it enables the manual removal of offsets that are tracked by Redpanda within the `__consumer_offsets` topic. The offsets requested to be removed will be removed only if either the group in question is in a dead state, or the partitions being deleted have no active subscriptions.
165165

166+
== Manage transaction coordinator disk usage
167+
168+
Redpanda uses the internal topic `kafka_internal/tx` to store transaction metadata for exactly-once and transactional producers. The log files contain all historical transactions, both committed and current open ones. Over time, this topic can consume excessive disk space in niche use cases that generate a large number of transactional sessions.
169+
170+
You can manage the disk usage of `kafka_internal/tx` by tuning the following cluster properties:
171+
172+
* config_ref:transaction_coordinator_delete_retention_ms,true,properties/cluster-properties[`transaction_coordinator_delete_retention_ms`]. Default: `604800000` (7 days).
173+
* config_ref:transactional_id_expiration_ms,true,properties/cluster-properties[`transactional_id_expiration_ms`]. Default: `604800000` (7 days).
174+
175+
To mitigate unbounded growth of `kafka_internal/tx` disk usage and manage its storage consumption more effectively, <<monitor-disk-space,monitor your storage metrics>> and lower the values of the relevant properties as needed.
176+
177+
To adjust these properties, run:
178+
179+
[,bash]
180+
----
181+
rpk cluster config set transaction_coordinator_delete_retention_ms=<milliseconds> transactional_id_expiration_ms=<milliseconds>
182+
----
183+
166184
== Configure segment size
167185

168186
The `log_segment_size` property specifies the size of each log segment within the partition. Redpanda closes segments after they exceed this size and messages begin filling a new segment.

modules/reference/pages/properties/cluster-properties.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6259,7 +6259,9 @@ Cleanup policy for a transaction coordinator topic.
62596259

62606260
=== transaction_coordinator_delete_retention_ms
62616261

6262-
Delete segments older than this age. To ensure transaction state is retained as long as the longest-running transaction, make sure this is no less than <<transactional_id_expiration_ms,`transactional_id_expiration_ms`>>.
6262+
Delete segments older than this age. To ensure transaction state is retained for as long as the longest-running transaction, make sure this is greater than or equal to <<transactional_id_expiration_ms,`transactional_id_expiration_ms`>>.
6263+
6264+
For example, if your typical transactions run for one hour, consider setting both `transaction_coordinator_delete_retention_ms` and `transactional_id_expiration_ms` to at least 3600000 (one hour), or a little over.
62636265

62646266
*Unit:* milliseconds
62656267

0 commit comments

Comments
 (0)