Skip to content

Commit aae623b

Browse files
committed
Don't wait indefinitely for replication jobs to stop
Previously we used `gen_server:stop/3` with an infinity timeout. We have observed that it's possible for jobs to be stuck waiting for network requests so they may take indefinitely to process the shutdown request (and call their `terminate/2` callback) and that can block the replicator scheduler. To fix it add a 5 second timeout to the stop call and then forcibly kill the process.
1 parent 8f2c253 commit aae623b

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

src/couch_replicator/src/couch_replicator_scheduler_job.erl

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
-define(LOWEST_SEQ, 0).
4848
-define(DEFAULT_CHECKPOINT_INTERVAL, 30000).
4949
-define(STARTUP_JITTER_DEFAULT, 5000).
50+
-define(STOP_TIMEOUT_MSEC, 5000).
5051

5152
-record(rep_state, {
5253
rep_details,
@@ -110,7 +111,8 @@ stop(Pid) when is_pid(Pid) ->
110111
% won't return ok but exit the calling process, usually the scheduler, so
111112
% we guard against that. See:
112113
% www.erlang.org/doc/apps/stdlib/gen_server.html#stop/3
113-
catch gen_server:stop(Pid, shutdown, infinity),
114+
catch gen_server:stop(Pid, shutdown, ?STOP_TIMEOUT_MSEC),
115+
exit(Pid, kill),
114116
receive
115117
{'DOWN', Ref, _, _, Reason} -> Reason
116118
end,

0 commit comments

Comments
 (0)