Update readme v1.0.0 mk #59

klippx · 2018-05-16T07:08:14Z

Comments for @tulios

How does one implement a custom decoder for consuming?
Elaborate on the statement "It will also provide more utility functions to give your code more flexibility" for eachBatch
In resolveOffset I understand that the default behaviour is that processed messages within a batch will be commited in case of errors, while the rest will remain unprocessed. But it is not clear to me why I would like to change this. What is the use case? It is only adding confusion for me.
Seek: you don't need to await consumer#run - can you elaborate why this is the case?
Pause & Resume: I removed mentions of "unsupported by the library"
Pause & Resume: The example is very weird. Maybe its fine, but its not the way ppl use topics. Maybe use 1 topic and pause because of a dependency being down which seems more like a useful scenario?

Nevon · 2018-05-16T07:38:24Z

README.md


-Some use cases can be optimized by dealing with batches rather than single messages. This handler will feed your function batches and some utility functions to give your code more flexibility. Be aware that using `eachBatch` is considered a more advanced use case since you will have to understand how session timeouts and heartbeats are connected. All resolved offsets will be automatically committed after the function is executed.
+In order to process huge volumes of messages in a responsive manner, you need consider the `eachBatch` API. Dealing with batches rather than single messages reduces the network traffic and the communication overhead with the broker, allowing your consumer group to eat away at your partition lag in orders of magnitudes faster than `eachMessage`.


This isn't really true though. eachMessage is essentially a default implementation of eachBatch. We still consume a batch at a time from the brokers, and the offset is only resolved, not committed, on each message.

Situations where you might want to use eachBatch is where you need to do something with all messages at once. For example, maybe you want to send some data from each message to a remote batch API, so that for N messages you call the remote API once.

Nevon · 2018-05-16T08:13:09Z

The example is very weird. Maybe its fine, but its not the way ppl use topics. Maybe use 1 topic and pause because of a dependency being down which seems more like a useful scenario?

We actually do something similar, where we expose an HTTP endpoint to pause/resume. When I wrote the example, I just didn't want to make assumptions about how the pause/resume would be invoked, so invoking it via a Kafka message seemed appropriate. 😅

If you have an idea for a clearer example, I'm all for it. I just can't think of one that wouldn't involve more non-KafkaJS related cruft. For example, if we take the example of your dependency responding with 429:

await consumer.connect()

await consumer.subscribe({ topic: 'jobs' })

await consumer.run({ eachMessage: async ({ topic, message }) => {
  try {
    await sendToDependency(message)
  } catch (e) {
    if (e instanceof TooManyRequestsError) {
      consumer.pause([{ topic }])
      setTimeout(() => consumer.resume([{ topic }]), e.retryAfter * 1000)
    }

    throw e
  }
}})

tulios · 2018-05-31T04:49:53Z

README.md

@@ -116,9 +115,37 @@ new Kafka({
 })
 ```

-#### <a name="setup-client-default-retry"></a> Default Retry
+### <a name="configuration-default-retry"></a> Default Retry


I like the detailed explanation but can we move it to the bottom? I think the beginning of the readme should provide a quick setup and the most common use case, detailed information such as the custom logger or how the retry mechanism works could live further down. WDYT?

definately, good point, it clutters up too much

tulios · 2018-05-31T04:53:02Z

README.md


 In order to pause and resume consuming from one or more topics, the `Consumer` provides the methods `pause` and `resume`. Note that pausing a topic means that it won't be fetched in the next cycle. You may still receive messages for the topic within the current batch.

+Calling `pause` with a topic that the consumer is not subscribed to is a no-op, calling `resume` with a topic that is not paused is also a no-op.
+
+Example: A situation where this could be useful is when an external dependency used by the consumer is under too much load. Here we want to `pause` consumption from a topic when this happens, and after a predefined interval we `resume` again:


tulios · 2018-06-04T07:49:26Z

@klippx

How does one implement a custom decoder for consuming?

The codec works for both consumers and producers; the consumer will detect that the message is compressed with some codec and it will use the appropriate decompress function do decompress. If the codec is implemented in KafkaJS, it'll just work.

Elaborate on the statement "It will also provide more utility functions to give your code more flexibility" for eachBatch

The eachBatch functions receive 3 helpers besides the batch: resolveOffset, heartbeat,
and isRunning.

In resolveOffset I understand that the default behavior is that processed messages within a batch will be committed in case of errors, while the rest will remain unprocessed. But it is not clear to me why I would like to change this. What is the use case? It is only adding confusion for me.

When we shut down the consumer, it will wait for the full batch to be processed before it exit, if you use isRunning to stop processing the messages the consumer will automatically commit the last offset, skipping the messages you didn't consume. If you make sure you call resolveOffset and you disable the auto-resolve, you can quickly shut down the consumer without losing/skipping any messages.

Seek: you don't need to await consumer#run - can you elaborate why this is the case?

It's possible to optionally await consumer.run, this will make sure your promise resolves when the consumer successfully receives its first batch, this isn't necessary since this operation will happen multiple times throughout the app lifecycle but it gives an extra check. If you call seek after waiting for run, it means that the consumer received the first batch and then performed the seek operation. To make sure your consumers start from the seek definition you have to call run without await.

The reason for all of that is that we need to initialize the consumer before we invoke the seek operation and run is the best place to initialize the consumer.

Nevon requested changes May 16, 2018

View reviewed changes

tulios reviewed May 31, 2018

View reviewed changes

tulios approved these changes May 31, 2018

View reviewed changes

tulios reviewed May 31, 2018

View reviewed changes

klippx added 7 commits May 31, 2018 09:41

Add more info about retry times

0fca30e

Update producer, start on consumer

358d252

Remove todos

a8bb2bc

Run through the rest of doc

97ac101

Updated the errenous statements about eachMessage and eachBatch

4af4bb2

Update with Nevons suggested example

b561909

Move detailed example to the bottom

d591e37

klippx force-pushed the update-readme-v1.0.0-mk branch from cd22f99 to d591e37 Compare May 31, 2018 07:42

Merge branch 'master' into update-readme-v1.0.0-mk

0c7f930

Nevon approved these changes Jun 4, 2018

View reviewed changes

klippx added 2 commits June 4, 2018 14:03

Review the last items with @tulios

549a3c4

Merge branch 'master' into update-readme-v1.0.0-mk

8194151

tulios merged commit 7be1ff0 into tulios:master Jun 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update readme v1.0.0 mk #59

Update readme v1.0.0 mk #59

klippx commented May 16, 2018 •

edited

Loading

Nevon May 16, 2018

Nevon commented May 16, 2018 •

edited

Loading

tulios May 31, 2018

klippx May 31, 2018

tulios May 31, 2018

tulios commented Jun 4, 2018


		Some use cases can be optimized by dealing with batches rather than single messages. This handler will feed your function batches and some utility functions to give your code more flexibility. Be aware that using `eachBatch` is considered a more advanced use case since you will have to understand how session timeouts and heartbeats are connected. All resolved offsets will be automatically committed after the function is executed.
		In order to process huge volumes of messages in a responsive manner, you need consider the `eachBatch` API. Dealing with batches rather than single messages reduces the network traffic and the communication overhead with the broker, allowing your consumer group to eat away at your partition lag in orders of magnitudes faster than `eachMessage`.

Update readme v1.0.0 mk #59

Update readme v1.0.0 mk #59

Conversation

klippx commented May 16, 2018 • edited Loading

Nevon May 16, 2018

Choose a reason for hiding this comment

Nevon commented May 16, 2018 • edited Loading

tulios May 31, 2018

Choose a reason for hiding this comment

klippx May 31, 2018

Choose a reason for hiding this comment

tulios May 31, 2018

Choose a reason for hiding this comment

tulios commented Jun 4, 2018

klippx commented May 16, 2018 •

edited

Loading

Nevon commented May 16, 2018 •

edited

Loading