Make stage_size & stage computation thread safe#2734
Make stage_size & stage computation thread safe#2734repeatedly merged 2 commits intofluent:masterfrom
Conversation
bf1c94c to
4383919
Compare
Signed-off-by: Harish Nelakurthi <harish.nelakurthi@illumio.com>
4383919 to
a1731f1
Compare
|
Thanks!. We will check issue and patch later. |
Signed-off-by: Harish Nelakurthi <harish.nelakurthi@illumio.com>
64a72d5 to
065751f
Compare
|
@repeatedly Can we get this merged into the upcoming release if it looks good as this is causing a critical issue of data loss. We have tested this in large scale environment and the fix seems to work. Thanks! |
ganmacs
left a comment
There was a problem hiding this comment.
Sorry for the delay. I added a comment.
| # but this block **may** run multiple times from write_step_by_step and previous write may be rollbacked | ||
| # So we should be counting the stage_size only for the last successful write | ||
| # | ||
| staged_bytesizes_by_chunk[chunk] = adding_bytesize |
There was a problem hiding this comment.
I think the following situation can happen. what do you think of this?
staged_bytesizes_by_chunkstores chunk1- rollback happens in write_step_by_step (ShouldRetry raises)
- another thread euqueues chunk1 before this thread enters into
fluentd/lib/fluent/plugin/buffer.rb
Line 673 in 065751f
- create new chunk(chunk2) in
fluentd/lib/fluent/plugin/buffer.rb
Line 663 in 065751f
staged_bytesizes_by_chunkstores chunk2 (still remtains chunk1 instaged_bytesizes_by_chunk)
There was a problem hiding this comment.
@ganmacs Steps (3) can't happen because enqueue_chunk can't get a lock and waits until the lock is released by the thread which is writing to the chunk.
The lock is acquired here: https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/buffer.rb#L279
And released here:
https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/buffer.rb#L319
So enqueue_chunk can happen only after the commit is done.
Hope this helps.
There was a problem hiding this comment.
Make sense. You're right. other thread can get the chunk but can't enter
fluentd/lib/fluent/plugin/buffer.rb
Line 674 in 065751f
| # but this block **may** run multiple times from write_step_by_step and previous write may be rollbacked | ||
| # So we should be counting the stage_size only for the last successful write | ||
| # | ||
| staged_bytesizes_by_chunk[chunk] = adding_bytesize |
There was a problem hiding this comment.
Make sense. You're right. other thread can get the chunk but can't enter
fluentd/lib/fluent/plugin/buffer.rb
Line 674 in 065751f
|
Thanks! |
Which issue(s) this PR fixes:
#2712
Fixes #
What this PR does / why we need it:
When writing chunks to buffer, the
stage_sizecomputation can be thread-unsafe as mentioned in the bug which can result inBufferOverflowErrorseven though the buffer is not actually full. This happened on a live long running fluentd. The methodbuffer.writeis supposed to write to a chunk atmost once and hencestage_sizeshould be added for a chunk only once but due towrite_step_by_stepthe block which updates the value ofstaged_bytesizemay be called multiple times and the chunk can be rollbacked but the value ofstaged_bytesizeis not reverted causing thestage_sizeto be more than the actual value.This PR addresses this issue.
Docs Changes:
Release Note: