Skip to content
This repository was archived by the owner on Oct 3, 2023. It is now read-only.

Clarify Record semantics#159

Merged
semistrict merged 5 commits intocensus-instrumentation:masterfrom
semistrict:clarify-record-semantics
Sep 7, 2018
Merged

Clarify Record semantics#159
semistrict merged 5 commits intocensus-instrumentation:masterfrom
semistrict:clarify-record-semantics

Conversation

@semistrict
Copy link
Copy Markdown
Contributor

I'd like to propose that we clarify the semantics of record in the following way to (a) make it clear what happens if overloaded, and (b) allow us to implement a more scalable view updating mechanism by not guaranteeing ordering.

stats/Record.md Outdated
may not take effect right away. No guarantees are necessarily provided about the order in
which Record calls are processed.

Record calls should never block. In case of overload (e.g. views cannot be updated fast enough),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not start this complicated argument. I would prefer not to have this last sentence in the specs.

First sentence is great.

@semistrict
Copy link
Copy Markdown
Contributor Author

@bogdandrutu changed wording from should to may

note that the statement about dropping data is logically entailed by the statement about reordering: it's a special case where some record calls are reordered until after the program has terminated :-)

@bogdandrutu
Copy link
Copy Markdown
Contributor

@Ramonza I think your comment refers specifically to overload:
"In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data."

I think allowing to drop records during the high traffic may generate wrong results (e.g. your 99%tile will not be that accurate during that time, and arguably that is the moment when you need them the most if you offer any SLA). An option may be to do uniform sampling which may give you better results.

I think we should focus on making the library fast enough before we start dropping events. Do we have an use-case where we need to drop events?

@Ramonza I think dropping records that are recorded after the library is shutdown is perfectly fine.

@semistrict
Copy link
Copy Markdown
Contributor Author

See the reference PR for Go. It's a choice between dropping events or blocking the caller of Record. Currently we block the caller. I would prefer to drop events. Are you saying we should rather block the caller?

stats/Record.md Outdated
may not take effect right away. No guarantees are necessarily provided about the order in
which Record calls are processed.

In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not have this sentence in the current PR and if needed open an issue to discuss sampling/dropping records policies.

@codefromthecrypt
Copy link
Copy Markdown

one good thing we can try is in language like this differentiate stats from tracing. ex

we guard stats against accidental sampling (ex we don't drop recording on overload)... blah 99% during load mess up stats.. This is different than tracing which is lossy by design (explicit sampling)

doesn't have to happen here just seeding the thought

@codefromthecrypt
Copy link
Copy Markdown

codefromthecrypt commented Aug 25, 2018 via email

@bogdandrutu
Copy link
Copy Markdown
Contributor

@adriancole agreed that something about having a way to shutdown is a good idea. At least mention that if implemented async then the implementation must offer a shutdown mechanism that application owner should use when shutdown the app.

@semistrict
Copy link
Copy Markdown
Contributor Author

@bogdandrutu @adriancole updated with details about a Flush call. I thought that Flush would be better than Shutdown since it seems like the goal here is really to all the data has been written not to ensure that future data is not ever written (like a Close/Shutdown method would be expected to do).

stats/Record.md Outdated
}
```

## Flush
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This semantic should be added for trace as well. You can just open an issue for that or do the change, up to you.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that the library should provide a top-level opencensus.Flush() that flushes everything?

If so, I agree. But I think this then needs to be documented somewhere else.

stats/Record.md Outdated

## Flush

All libraries should provide a Flush function that, once it returns, guarantees that all
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a statement about the fact that this is not recommended to be called be applications (can add overhead) on general use and would be better to leave the library to schedule the flush calls if possible for performance reasons.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean that the library should install a shutdown hook or similar. In Go there is nothing like "atexit" or ShutdownHook - the application needs to call Flush explicitly before main returns.

In any case, IMHO a library should never install shutdown hooks, even if they are available in the language.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I want to discourage people to call this randomly (I finished one request let's call flush) and to rely on the library that stats will be exported every X seconds normally.

In some specific cases this may be needed for example when run on AppEngine or Lambdas etc but not for normal applications running in containers/VMs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean people should call it randomly. I mean it should be called at the end of the program (at the end of the main function).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be clearly documented. Because based on Google experience people are calling this in their application in random stages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative which would prevent that would be to provide a Shutdown function that actually prevents any traces or stats from being written after it is called (would probably need to log errors if more traces or stats were recorded).

@semistrict semistrict force-pushed the clarify-record-semantics branch from e049cfd to e86dbd9 Compare September 7, 2018 05:23
@semistrict
Copy link
Copy Markdown
Contributor Author

Flush will be addressed by #152

@semistrict semistrict merged commit 8e1be27 into census-instrumentation:master Sep 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants