Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure traces sent before process ends #195

Merged
merged 10 commits into from
Oct 2, 2017
Merged

Conversation

ryplo
Copy link
Contributor

@ryplo ryplo commented Sep 22, 2017

Overview

Fix for a problem found when implementing the Resque integration - the Resque process would end before the thread could finish sending the traces. This PR is to implement a shutdown method similar to that of the Python client: https://github.com/DataDog/dd-trace-py/blob/master/ddtrace/writer.py#L104-L121

Major changes

To ensure trace sending is completed before a process exits, call the shutdown method after the span should be finished.

Datadog.tracer.trace('web.request', service: 'my-web-site') do |span|
  do_something
end

Datadog.tracer.shutdown!

You can pass it in a tracer or the method will just use the default Datadog.tracer:
def self.shutdown(tracer = Datadog.tracer)

User impact

Trace sending completion is ensured for short-running processes

@palazzem palazzem added this to the 0.9.0 milestone Sep 22, 2017
@palazzem palazzem added the core Involves Datadog core libraries label Sep 22, 2017
@ryplo ryplo added the do-not-merge/WIP Not ready for merge label Sep 22, 2017
@@ -18,6 +18,8 @@ module Datadog
# of these function calls and sub-requests would be encapsulated within a single trace.
# rubocop:disable Metrics/ClassLength
class Tracer
DEFAULT_TIMEOUT = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this one in the AsyncTransport class?

@ryplo ryplo force-pushed the rachel/implement-shutdown branch 2 times, most recently from 045040d to 4cb2199 Compare September 25, 2017 21:31
@palazzem palazzem removed the do-not-merge/WIP Not ready for merge label Sep 26, 2017
Copy link
Contributor

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is good to me! Good job! The only thing we're missing is just a couple of comments in our new methods to describe better the API. I'd say also to add an example in the tracer.shutdown! so that people know how to use it:

# script.rb

Datadog.tracer.trace('operation_name', service='rake_tasks') do |span|
  span.set_tag('task.name', 'script')
end

Datadog.tracer.shutdown!

@@ -57,6 +57,11 @@ def self.debug_logging
log.level == Logger::DEBUG
end

def shutdown!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'd say to write some documentation about this method:

# Shorthand that calls the `shutdown!` method of a registered worker.
# It's useful to ensure that the Trace Buffer is properly flushed before 
# shutting down the application.

@@ -87,6 +92,20 @@ def stop
@run = false
end

def shutdown!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should write a couple of lines to document what this method is doing. I'll let you describe this (i.e. "this method closes all available queues and wait till the trace buffer and service are flushed").

palazzem
palazzem previously approved these changes Sep 26, 2017
Copy link
Contributor

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to me! great job! let's apply this improvement to our Resque integration

@ryplo ryplo changed the base branch from master to rachel/services-flush September 27, 2017 19:29
@ryplo ryplo changed the base branch from rachel/services-flush to master September 27, 2017 21:28
@ryplo ryplo force-pushed the rachel/implement-shutdown branch 3 times, most recently from f04f617 to 770366d Compare September 28, 2017 23:25
@ryplo ryplo added the do-not-merge/WIP Not ready for merge label Sep 29, 2017
Copy link
Contributor

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question about the shutdown!; if I'm not wrong, we may simplify it a bit removing all the timeout / sleep block

@shutting_down = true
@trace_buffer.close
@service_buffer.close
sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need any sleep at all? if we close both buffers and we stop the worker, technically we should just join no? looking here:

while @run
callback_traces
callback_services
sleep(@flush_interval) if @run
end

both callback_traces (and callback_services):

  • empties the buffer
  • sends the buffers

So if we join we simply wait that the flushing thread exits the while, so for sure buffers are empty AND sent. The join has a timeout of 5 seconds, that is x5 times our flushing interval so it's reasonable high to ensure that traces are flushed.

Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryplo since this approach works for Resque, we can merge the PR and make a new one to improve that code block.

@palazzem palazzem removed the do-not-merge/WIP Not ready for merge label Oct 2, 2017
Copy link
Contributor

@palazzem palazzem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge the PR! Thanks for porting the shutdown! method!

@shutting_down = true
@trace_buffer.close
@service_buffer.close
sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryplo since this approach works for Resque, we can merge the PR and make a new one to improve that code block.

@palazzem palazzem merged commit 1a610f7 into master Oct 2, 2017
@palazzem palazzem deleted the rachel/implement-shutdown branch October 2, 2017 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants