Clarify Record semantics#159
Conversation
stats/Record.md
Outdated
| may not take effect right away. No guarantees are necessarily provided about the order in | ||
| which Record calls are processed. | ||
|
|
||
| Record calls should never block. In case of overload (e.g. views cannot be updated fast enough), |
There was a problem hiding this comment.
Let's not start this complicated argument. I would prefer not to have this last sentence in the specs.
First sentence is great.
|
@bogdandrutu changed wording from should to may note that the statement about dropping data is logically entailed by the statement about reordering: it's a special case where some record calls are reordered until after the program has terminated :-) |
|
@Ramonza I think your comment refers specifically to overload: I think allowing to drop records during the high traffic may generate wrong results (e.g. your 99%tile will not be that accurate during that time, and arguably that is the moment when you need them the most if you offer any SLA). An option may be to do uniform sampling which may give you better results. I think we should focus on making the library fast enough before we start dropping events. Do we have an use-case where we need to drop events? @Ramonza I think dropping records that are recorded after the library is shutdown is perfectly fine. |
|
See the reference PR for Go. It's a choice between dropping events or blocking the caller of Record. Currently we block the caller. I would prefer to drop events. Are you saying we should rather block the caller? |
stats/Record.md
Outdated
| may not take effect right away. No guarantees are necessarily provided about the order in | ||
| which Record calls are processed. | ||
|
|
||
| In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data. |
There was a problem hiding this comment.
Lets not have this sentence in the current PR and if needed open an issue to discuss sampling/dropping records policies.
|
one good thing we can try is in language like this differentiate stats from tracing. ex we guard stats against accidental sampling (ex we don't drop recording on overload)... blah 99% during load mess up stats.. This is different than tracing which is lossy by design (explicit sampling) doesn't have to happen here just seeding the thought |
|
ps I dont know if it should be a spec issue but it is annoyingly common
that people trying libraries use asynchronous exporting but quit without
closing. this leads to people thinking things are broke and the answer is
commonly shut down hook in nature. different concern I know but honestly
not sure to go with it.
…On Thu, 23 Aug 2018, 05:49 Bogdan Drutu, ***@***.***> wrote:
@Ramonza <https://github.com/ramonza> I think your comment refers
specifically to overload:
"In particular, in case of overload (e.g. views cannot be updated fast
enough), Record may drop data."
I think allowing to drop records during the high traffic may generate
wrong results (e.g. your 99%tile will not be that accurate during that
time, and arguably that is the moment when you need them the most if you
offer any SLA). An option may be to do uniform sampling which may give you
better results.
I think we should focus on making the library fast enough before we start
dropping events. Do we have an use-case where we need to drop events?
@Ramonza <https://github.com/ramonza> I think dropping records that are
recorded after the library is shutdown is perfectly fine.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD618RPn-8Buw5nLw72G4NSUROnctV2ks5uTdIGgaJpZM4WAuFN>
.
|
|
@adriancole agreed that something about having a way to shutdown is a good idea. At least mention that if implemented async then the implementation must offer a shutdown mechanism that application owner should use when shutdown the app. |
22fdb27 to
30c0a65
Compare
|
@bogdandrutu @adriancole updated with details about a Flush call. I thought that Flush would be better than Shutdown since it seems like the goal here is really to all the data has been written not to ensure that future data is not ever written (like a Close/Shutdown method would be expected to do). |
stats/Record.md
Outdated
| } | ||
| ``` | ||
|
|
||
| ## Flush |
There was a problem hiding this comment.
This semantic should be added for trace as well. You can just open an issue for that or do the change, up to you.
There was a problem hiding this comment.
Are you suggesting that the library should provide a top-level opencensus.Flush() that flushes everything?
If so, I agree. But I think this then needs to be documented somewhere else.
stats/Record.md
Outdated
|
|
||
| ## Flush | ||
|
|
||
| All libraries should provide a Flush function that, once it returns, guarantees that all |
There was a problem hiding this comment.
Also add a statement about the fact that this is not recommended to be called be applications (can add overhead) on general use and would be better to leave the library to schedule the flush calls if possible for performance reasons.
There was a problem hiding this comment.
I assume you mean that the library should install a shutdown hook or similar. In Go there is nothing like "atexit" or ShutdownHook - the application needs to call Flush explicitly before main returns.
In any case, IMHO a library should never install shutdown hooks, even if they are available in the language.
There was a problem hiding this comment.
No, I want to discourage people to call this randomly (I finished one request let's call flush) and to rely on the library that stats will be exported every X seconds normally.
In some specific cases this may be needed for example when run on AppEngine or Lambdas etc but not for normal applications running in containers/VMs.
There was a problem hiding this comment.
I don't mean people should call it randomly. I mean it should be called at the end of the program (at the end of the main function).
There was a problem hiding this comment.
This should be clearly documented. Because based on Google experience people are calling this in their application in random stages.
There was a problem hiding this comment.
The alternative which would prevent that would be to provide a Shutdown function that actually prevents any traces or stats from being written after it is called (would probably need to log errors if more traces or stats were recorded).
e049cfd to
e86dbd9
Compare
|
Flush will be addressed by #152 |
I'd like to propose that we clarify the semantics of record in the following way to (a) make it clear what happens if overloaded, and (b) allow us to implement a more scalable view updating mechanism by not guaranteeing ordering.