Resque workers start hanging with activated ddtrace

We are currently in the process of rolling out datadog over our applications. However, we encountered a quite severe issue which we believe we tracked down to the `ddtrace` gem.

### Observed behaviour

Resque workers will eventually end in a state where they are deadlocked. This means the worker is alive, but the processing of the current job never finishes.
Subjectively this happens faster for workers that process lots of jobs (always have a full queue) than for workers that have not as much to do.

`strace` tells us, that the deadlocked processes are waiting for a user space mutex. One example:

```
$ strace -p 8806
Process 8806 attached - interrupt to quit
futex(0x7fc970d543c0, FUTEX_WAIT_PRIVATE, 2, NULL
```

When we remove the `ddtrace` gem and its configuration from our project, we don't get any deadlocking resque workers.

So far we observed this behaviour **only on resque workers**. We got no timeouts whatsoever for our web servers.

### Environment

These are the versions of a few gems in that project:
```
ddtrace (0.12.1)
rails (3.2.22.5)
redis (3.3.3)
resque (1.27.4)

ruby 2.2.4
```

These are the specs for the application where we observed this issue, I will try to reproduce it in another environment with more recent ruby/rails versions.

edit: Reproduced the error on more environments by now

```
ddtrace (0.12.1)
ruby 2.2.4
rails (4.2.7.1)
redis (3.3.5)
resque (1.27.4)
```
```
ddtrace (0.13.0)
ruby 2.5.0
rails (5.1.6)
redis (4.0.1)
resque (1.27.4)
```

This is our datadog initializer:

```ruby
yml_config = YAML.safe_load(File.read("#{Rails.root}/config/datadog.yml"))
                 .fetch(Rails.env)
                 .symbolize_keys

service = ->(name) { "#{yml_config[:service_name]}-#{name}" }

Datadog.configure do |c|
  c.tracer enabled: yml_config[:enabled]
  c.use :rails, service_name: yml_config[:service_name]
  c.use :graphql, service_name: service.call('graphql'),
                  schemas: [KaeuferportalSchema]
  c.use :http, service_name: service.call('external')
  c.use :redis, service_name: service.call('redis')

  # Note: The error also occurs with the line below being deleted
  c.use :resque, service_name: service.call('resque'), workers: [ApplicationJob]
end
```

### Bisecting

We updated `ddtrace` from `0.12.0` to `0.12.1` when encountering the issue, so both versions are affected for us.

Before the problems started we already had `ddtrace` running on the affected application, but in version `0.11.4` and without explicitly activating any integration (thus just rails + whatever that loads automatically were active). In that state we have **not had any problems**.

Thanks for your support, tell me if you need any further information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resque workers start hanging with activated ddtrace #466

NobodysNightmare
openedon Jun 18, 2018

Observed behaviour

Environment

Bisecting

Assignees

Labels

Type

Projects

Milestone

Relationships

Development