-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown statsbeat after failure threshold is met #1127
Conversation
not state.get_statsbeat_initial_success(): | ||
# If ingestion threshold during statsbeat initialization is reached, return back code to shut it down | ||
if _statsbeat_failed_to_ingest(): | ||
return -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to use some kind of constant instead of the value here, -2 is shutdown signal, but looking at the code here I have no clue what -1 means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 is the exception signal for telemetry in general. -2 is the shutdown signal for only statsbeat exporter. I agree a constant would be better but that would probably require a refactor of all the return signals which I would prefer leaving to a different pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well using these kinds of numbers instead of enumerators or constants is usually a pretty bad practice in other languages, code is harder to understand and maintain by other developers, maybe this is the way to go in Python, just my two cents here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -2 introduce in this PR? if so, how much work does it take to refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hectorhdzg @heyams
Created new issue to track this refactor #1128
@@ -143,6 +155,17 @@ def _transmit(self, envelopes): | |||
data = json.loads(text) | |||
except Exception: | |||
pass | |||
|
|||
if self._is_stats_exporter() and \ | |||
not state.get_statsbeat_shutdown() and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see you check if shutdown was called in several places, is the exporter process for Statsbeat expected to keep running after shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for very specific race conditions in which multiple threads could be accessing the same piece of "check if we need to shutdown" logic. It also serves as a good sanity check to prevent from any statsbeat logic from executing if the statsbeat exporte ris already shutdown.
@lzchen you're tagging the wrong helen. |
contrib/opencensus-ext-azure/opencensus/ext/azure/common/transport.py
Outdated
Show resolved
Hide resolved
@@ -71,6 +71,11 @@ def export_metrics(self, metrics): | |||
for batch in batched_envelopes: | |||
batch = self.apply_telemetry_processors(batch) | |||
result = self._transmit(batch) | |||
# If statsbeat exporter and received signal to shutdown | |||
if self._is_stats_exporter() and result == -2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_statsbeat_failed_to_ingest above can return the counter and here just check if the counter is >= 3. -2 seems so random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_statsbeat_failed_to_ingest
is a private function used only within transport
to handle the count as well as determining whether it is reached. The returning of the result code back to the exporter is by design. I agree the codes are a bit random (-1, -2, etc) but changing them can be part of a different PR. See my response here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Following specs
Similar to Node js, retry only occurs on successes (200) so shutdown occurs only when 3 attempts are reached.
@hectorhdzg @heyams