Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Time all tasks #361

Merged
merged 7 commits into from
Dec 5, 2018
Merged

Time all tasks #361

merged 7 commits into from
Dec 5, 2018

Conversation

shemnon
Copy link
Contributor

@shemnon shemnon commented Dec 4, 2018

PR description

This is fairly high touch consisting of 3 things:

  • Moving to Prometheus's Summary for timers
    • Timing at .2, .5, .8, .9, .99, and 1.0 (1.0 actually gets max I believe)
  • Timing all abstract EthTasks
  • The vast bulk of the changes: plumbing the timing context everywhere we need it

A sample of the metrics can be seen here: https://gist.github.com/shemnon/13d1f57858f8b8f8ec0edf8cb4e3d0ab

I'm going to investigate alternatives to dragging the metric all across all those constructors.

This is fairly high touch consisting of 3 things:
* Moving to Prometheus's Summary for timers
  * Timing at .2, .5, .8, .9, .99, and 1.0 (1.0 actually gets max I believe)
* Timing all abstract EthTasks
* The bulk of the changes: plumbing the timing context everywhere we need it
@shemnon shemnon added the work in progress Work on this pull request is ongoing label Dec 4, 2018
Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general just a few small suggestions.

public void executeTaskTimed() {
final OperationTimer.TimingContext timingContext = taskTimer.startTimer();
executeTask();
taskTimeInSec = timingContext.stopTimer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use the try (final OperationTimer.TimingContext timingContext = taskTimer.startTimer()) or an explicit final block here to ensure the timer completes even if the task fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the final block because close doesn't return a time and we want to keep that for logging at higher levels.

count,
skip,
reverse,
labels -> () -> () -> 0.0D);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this a constant somewhere? It's a very cryptic line to have scattered around the code base so much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made one in NoOpMetricsSystem

@@ -17,7 +17,8 @@
RPC("rpc"),
JVM("jvm", false),
PROCESS("process", false),
BLOCKCHAIN("blockchain");
BLOCKCHAIN("blockchain"),
TASKS("tasks");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TASKS feels too generic as a category - is this more SYNCHRONIZER?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronizer is similarly opaque to me. How about EthScheduler?

new Observation(RPC, "request", null, asList("method", "0.99")),
new Observation(RPC, "request", null, asList("method", "1.0")),
new Observation(RPC, "request_sum", null, singletonList("method")),
new Observation(RPC, "request_count", null, singletonList("method")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For histograms we deliberately converted the _sum and _count metrics to labels instead of completely separate metrics. It comes out much nicer in the JSON-RPC that way. The logic is in convertHistogramSampleNamesToLabels and is probably very similar to what summaries would require.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I left the histogram branch in place in case we ever come back to it.

@shemnon shemnon mentioned this pull request Dec 4, 2018
* Change from TASKS to ETH_SCHEDULER
* Wrap timer read in a finally block
* different debug_metrics output for summary that is closer to what
  we had for histograms
* constant for lambda magic - 'labels -> () -> () -> 0'
@shemnon shemnon removed the work in progress Work on this pull request is ongoing label Dec 4, 2018
Copy link
Contributor

@ajsutton ajsutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@shemnon shemnon merged commit 17a4e88 into PegaSysEng:master Dec 5, 2018
@shemnon shemnon deleted the metrics branch December 6, 2018 04:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants