Skip to content

Conversation

@tonysun83
Copy link

@tonysun83 tonysun83 commented Sep 8, 2018

This release's base should be based on j7137/j7138, which should use base commit
17aec1f3339f864eb08869d6235753f0fdd28259

FogBugzId: 110017

Two items are included in this patch:

  1. A fix for a race condition that surfaces in erl 20 environments
  2. A new config for off-heap message

History for Cherry-Pick:

git cherry-pick 2195eb352952e66494ec520e920733808e028c0d^..ee3c4b5d2e58e35a11e6995d47580554a7d7aca8
[release-candidate-110017 df0496c03] Fix couch_server:terminate/2
 Author: Paul J. Davis <paul.joseph.davis@gmail.com>
 Date: Wed Sep 5 16:24:17 2018 -0500
 1 file changed, 5 insertions(+), 1 deletion(-)
[release-candidate-110017 953c27905] Reproduce race condition in couch_server
 Author: Paul J. Davis <paul.joseph.davis@gmail.com>
 Date: Wed Sep 5 16:25:41 2018 -0500
 1 file changed, 173 insertions(+)
[release-candidate-110017 9af00bef2] Fix couch_server concurrency error
 Author: Paul J. Davis <paul.joseph.davis@gmail.com>
 Date: Thu Sep 6 11:01:07 2018 -0500
 2 files changed, 21 insertions(+), 11 deletions(-)
[release-candidate-110017 9f914a401] Allow disabling off-heap messages
 Author: Nick Vatamaniuc <vatamane@apache.org>
 Date: Thu Sep 6 17:31:34 2018 -0400
 6 files changed, 17 insertions(+), 12 deletions(-)

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

davisp and others added 4 commits September 7, 2018 18:53
If couch_server terminates while there is an active open_async process
it will throw a function_clause exception because `couch_db:get_pid/1`
will fail due to the `#entry.db` member being undefined. Simple fix is
to just filter those out.
A rather uncommon bug found in production. Will write more as this is
just for show and tell.

For now this test case just demonstrates the issue that was discovered.
A fix is still being pondered.
Its possible that a busy couch_server and a specific ordering and timing
of events can end up with an open_async message in the mailbox while a
new and unrelated open_async process is spawned. This change just ensure
that if we encounter any old messages in the mailbox that we ignore
them.

The underlying issue here is that a delete request clears out the state
in our couch_dbs ets table while not clearing out state in the message
queue. In some fairly specific circumstances this leads to the message
on in the mailbox satisfying an ets entry for a newer open_async
process. This change just includes a match on the opener process.
Anything unmatched came before the current open_async request which
means it should be ignored.
Off-heap messages is an Erlang 19 feature:

http://erlang.org/doc/man/erlang.html#process_flag_message_queue_data

It is adviseable to use that setting for processes which expect to receive a
lot of messages. CouchDB sets it for couch_server, couch_log_server and bunch
of others as well.

In some cases the off-heap behavior could alter the timing of message receives
and expose subtle bugs that have been lurking in the code for years. Or could
slightly reduce performance, so a safety measure allow disabling it.
@jiangphcn
Copy link

+1

Copy link

@jiangphcn jiangphcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tonysun83 tonysun83 merged commit 0006925 into release-110017 Sep 10, 2018
@tonysun83 tonysun83 deleted the release-candidate-110017 branch September 12, 2018 06:07
garrensmith pushed a commit that referenced this pull request Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants