Clarify Record semantics by semistrict · Pull Request #159 · census-instrumentation/opencensus-specs

semistrict · 2018-08-16T23:13:38Z

I'd like to propose that we clarify the semantics of record in the following way to (a) make it clear what happens if overloaded, and (b) allow us to implement a more scalable view updating mechanism by not guaranteeing ordering.

bogdandrutu · 2018-08-22T18:50:54Z

stats/Record.md

+may not take effect right away. No guarantees are necessarily provided about the order in
+which Record calls are processed.
+
+Record calls should never block. In case of overload (e.g. views cannot be updated fast enough),


Let's not start this complicated argument. I would prefer not to have this last sentence in the specs.

First sentence is great.

semistrict · 2018-08-22T21:27:08Z

@bogdandrutu changed wording from should to may

note that the statement about dropping data is logically entailed by the statement about reordering: it's a special case where some record calls are reordered until after the program has terminated :-)

bogdandrutu · 2018-08-22T21:49:58Z

@Ramonza I think your comment refers specifically to overload:
"In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data."

I think allowing to drop records during the high traffic may generate wrong results (e.g. your 99%tile will not be that accurate during that time, and arguably that is the moment when you need them the most if you offer any SLA). An option may be to do uniform sampling which may give you better results.

I think we should focus on making the library fast enough before we start dropping events. Do we have an use-case where we need to drop events?

@Ramonza I think dropping records that are recorded after the library is shutdown is perfectly fine.

semistrict · 2018-08-22T21:52:18Z

See the reference PR for Go. It's a choice between dropping events or blocking the caller of Record. Currently we block the caller. I would prefer to drop events. Are you saying we should rather block the caller?

bogdandrutu · 2018-08-24T20:36:26Z

stats/Record.md

+may not take effect right away. No guarantees are necessarily provided about the order in
+which Record calls are processed.
+
+In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data.


Lets not have this sentence in the current PR and if needed open an issue to discuss sampling/dropping records policies.

codefromthecrypt · 2018-08-24T23:46:52Z

one good thing we can try is in language like this differentiate stats from tracing. ex

we guard stats against accidental sampling (ex we don't drop recording on overload)... blah 99% during load mess up stats.. This is different than tracing which is lossy by design (explicit sampling)

doesn't have to happen here just seeding the thought

codefromthecrypt · 2018-08-25T03:13:50Z

ps I dont know if it should be a spec issue but it is annoyingly common that people trying libraries use asynchronous exporting but quit without closing. this leads to people thinking things are broke and the answer is commonly shut down hook in nature. different concern I know but honestly not sure to go with it.

…

On Thu, 23 Aug 2018, 05:49 Bogdan Drutu, ***@***.***> wrote: @Ramonza <https://github.com/ramonza> I think your comment refers specifically to overload: "In particular, in case of overload (e.g. views cannot be updated fast enough), Record may drop data." I think allowing to drop records during the high traffic may generate wrong results (e.g. your 99%tile will not be that accurate during that time, and arguably that is the moment when you need them the most if you offer any SLA). An option may be to do uniform sampling which may give you better results. I think we should focus on making the library fast enough before we start dropping events. Do we have an use-case where we need to drop events? @Ramonza <https://github.com/ramonza> I think dropping records that are recorded after the library is shutdown is perfectly fine. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#159 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD618RPn-8Buw5nLw72G4NSUROnctV2ks5uTdIGgaJpZM4WAuFN> .

bogdandrutu · 2018-08-25T15:41:35Z

@adriancole agreed that something about having a way to shutdown is a good idea. At least mention that if implemented async then the implementation must offer a shutdown mechanism that application owner should use when shutdown the app.

semistrict · 2018-08-28T09:37:18Z

@bogdandrutu @adriancole updated with details about a Flush call. I thought that Flush would be better than Shutdown since it seems like the goal here is really to all the data has been written not to ensure that future data is not ever written (like a Close/Shutdown method would be expected to do).

bogdandrutu · 2018-08-29T18:57:48Z

stats/Record.md

 }
 ```
+
+## Flush


This semantic should be added for trace as well. You can just open an issue for that or do the change, up to you.

Are you suggesting that the library should provide a top-level opencensus.Flush() that flushes everything?

If so, I agree. But I think this then needs to be documented somewhere else.

bogdandrutu · 2018-08-29T18:59:35Z

stats/Record.md

+
+## Flush
+
+All libraries should provide a Flush function that, once it returns, guarantees that all


Also add a statement about the fact that this is not recommended to be called be applications (can add overhead) on general use and would be better to leave the library to schedule the flush calls if possible for performance reasons.

I assume you mean that the library should install a shutdown hook or similar. In Go there is nothing like "atexit" or ShutdownHook - the application needs to call Flush explicitly before main returns.

In any case, IMHO a library should never install shutdown hooks, even if they are available in the language.

No, I want to discourage people to call this randomly (I finished one request let's call flush) and to rely on the library that stats will be exported every X seconds normally.

In some specific cases this may be needed for example when run on AppEngine or Lambdas etc but not for normal applications running in containers/VMs.

I don't mean people should call it randomly. I mean it should be called at the end of the program (at the end of the main function).

This should be clearly documented. Because based on Google experience people are calling this in their application in random stages.

The alternative which would prevent that would be to provide a Shutdown function that actually prevents any traces or stats from being written after it is called (would probably need to log errors if more traces or stats were recorded).

semistrict · 2018-09-07T05:24:46Z

Flush will be addressed by #152

semistrict requested a review from bogdandrutu August 16, 2018 23:13

semistrict mentioned this pull request Aug 22, 2018

Record should not block if the work queue is full census-instrumentation/opencensus-go#874

Closed

bogdandrutu reviewed Aug 22, 2018

View reviewed changes

bogdandrutu approved these changes Aug 24, 2018

View reviewed changes

bogdandrutu mentioned this pull request Aug 25, 2018

Stats: flush data to exporters #152

Open

semistrict force-pushed the clarify-record-semantics branch from 22fdb27 to 30c0a65 Compare August 28, 2018 09:34

bogdandrutu approved these changes Aug 29, 2018

View reviewed changes

Ramon Nogueira added 4 commits September 7, 2018 07:23

Clarify Record semantics

a25316d

reword

6205e4d

Remove note about dropping data, add Flush

63e27c8

Remove Flush

e86dbd9

semistrict force-pushed the clarify-record-semantics branch from e049cfd to e86dbd9 Compare September 7, 2018 05:23

Merge branch 'master' into clarify-record-semantics

29607fd

semistrict merged commit 8e1be27 into census-instrumentation:master Sep 7, 2018

semistrict mentioned this pull request Sep 21, 2018

stats/view: implement Flush census-instrumentation/opencensus-go#922

Open


		## Flush

		All libraries should provide a Flush function that, once it returns, guarantees that all

Conversation

semistrict commented Aug 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semistrict commented Aug 22, 2018

Uh oh!

bogdandrutu commented Aug 22, 2018

Uh oh!

semistrict commented Aug 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codefromthecrypt commented Aug 24, 2018

Uh oh!

codefromthecrypt commented Aug 25, 2018 via email

Uh oh!

bogdandrutu commented Aug 25, 2018

Uh oh!

semistrict commented Aug 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semistrict commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants