Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:Race condition on instance variable assignment #1154

Merged
merged 1 commit into from
Aug 31, 2020

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Aug 21, 2020

This PR fixes most recent CI timeouts, specially the ones seen for JRuby:

Failures:

  1) Datadog::Workers::Polling when included into a worker #perform by default 
     Failure/Error: raise StandardError, 'Wait time exhausted!' if attempts <= 0
     
     StandardError:
       Wait time exhausted!
     # ./spec/support/synchronization_helpers.rb:41:in `block in try_wait_until'
     # ./spec/support/synchronization_helpers.rb:36:in `try_wait_until'
     # ./spec/ddtrace/workers/polling_spec.rb:27:in `block in <main>'

These timeouts happen because probing worker.run_loop? can have the non-thread-safe side-effect of setting @run_loop = false:

def run_async?
  @run_async = false unless instance_variable_defined?(:@run_async)
  @run_async == true
end

If the check for instance_variable_defined?(:@run_async) is executed then the Ruby VM context switches to:

def perform_loop
  @run_loop = true

  loop do
    # ...
    return unless run_loop? || work_pending?
  end
end

which sets it to true: @run_loop = true.

Then context switches back and executes the assignment part of:

@run_async = false unless instance_variable_defined?(:@run_async)

@run_async will be set to false prematurely, effectively terminating the worker.

One alternative solution to the changes in this PR is to wrap #run_async? in a mutex, but this is not necessary, as the changes introduced here respect strict Happened-before semantics: we wait until instance_variable_defined?(:@run_async) returns true for us to ever evaluate @run_async, and assigning a value to @run_async is guaranteed to happen before instance_variable_defined?(:@run_async) returns true, so we now only read it when it's guaranteed to be available.
Also we stop assigning a value @run_async, which means this method does not have to worry about read/write race conditions that it could cause.

I've applied this fix to all instances of such pattern I could find in our code base.

@marcotc marcotc added the dev/testing Involves testing processes (e.g. RSpec) label Aug 21, 2020
@marcotc marcotc requested a review from a team August 21, 2020 21:29
@marcotc marcotc self-assigned this Aug 21, 2020
@marcotc
Copy link
Member Author

marcotc commented Aug 28, 2020

Pinging @DataDog/apm-ruby, as builds continue to fail (slowing productivity) due to reasons solved by this PR.

@marcotc marcotc merged commit 3bb7f74 into master Aug 31, 2020
@marcotc marcotc deleted the fix/concurrency-assignment branch August 31, 2020 16:05
@marcotc marcotc added this to the 0.40.0 milestone Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/testing Involves testing processes (e.g. RSpec)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants