Skip to content

Conversation

dashpole
Copy link
Collaborator

@dashpole dashpole commented Oct 23, 2017

This is my take on implementing #1247.
This PR does the following:

  1. Adds an UpdateStats field to v2 request options. Setting this to true causes cAdvisor to update stats for that container before returning.
  2. Adds an OnDemandHousekeeping function to the containerData object, which allows housekeeping to be triggered before the interval.
  3. This removes the check with instantaneous CPU usage if the difference is too small. This check could result in not getting instantaneous CPU usage if OnDemandHousekeeping is triggered right after the periodic collection.

There is still only a single thread which calls housekeepingTick().
The UpdateStats field is part of the v2 API in cAdvisor, but will be used only for node-level metrics in kubernetes. See kubernetes/kubernetes@master...dashpole:on_demand_metrics for the way this would be enabled in kubernetes.

@dashpole
Copy link
Collaborator Author

cc @derekwaynecarr

@@ -234,6 +234,8 @@ type RequestOptions struct {
Count int `json:"count"`
// Whether to include stats for child subcontainers.
Recursive bool `json:"recursive"`
// Whether to update stats before returning
UpdateStats bool `json:"update_stats"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it would be more useful to specify this as MaxAge, and update the stats if they are older. 0 means always update, and something else (-1?) means never update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it is clearer. UpdateStats seemed vague, but I couldn't think of anything clearer. I have no plans to do anything other than trigger an update, mainly for responding when memcg thresholds are crossed. Do you think it is still worth doing MaxAge if we wont use it in k8s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's still preferable. It's definitely useful outside of k8s, but even for the k8s usecase, there is probably some max age you could set for the memcg trigger (e.g. 10ms). It means that if a burst of requests came in simultaneously (for whatever reason), they can reuse the cached results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed it to MaxAgeMS *int32. Although I was wondering if just using time.Duration directly would be better?

for {
houseKeepingTimer := time.NewTimer(c.nextHousekeepingInterval())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: create the timer outside the loop, and use Reset here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// notify the calling function that housekeeping has completed
close(finishedChan)
case <-houseKeepingTimer.C:
c.housekeepingTick(longHousekeeping)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drain the finishedChan after this step. I.e.

for {
  select finishedChan := <-c.onDemandChan:
    close(finishedChan)
  default:
    break
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, although I dont think we want a for loop around it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not? If there are multiple routines waiting for new stats and we just computed new stats, we should unblock them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zSys-bot
Copy link

zSys-bot commented Nov 7, 2017

Build finished.

1 similar comment
@zSys-bot
Copy link

zSys-bot commented Nov 7, 2017

Build finished.

@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

1 similar comment
@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

1 similar comment
@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

1 similar comment
@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

1 similar comment
@zSys-bot
Copy link

zSys-bot commented Nov 8, 2017

Build finished.

@dashpole
Copy link
Collaborator Author

FYI, I plan to cut cadvisor v0.28.1, which will correspond to k8s v1.9.0 this Friday, 11/17.

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

@@ -236,6 +236,9 @@ type RequestOptions struct {
Count int `json:"count"`
// Whether to include stats for child subcontainers.
Recursive bool `json:"recursive"`
// Update stats if they are older than MaxAge
// nil indicates no update, and 0 will always trigger an update.
MaxAge *time.Duration `json:"max_age"`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duration is just a wrapper around int64, so this should be safe for the api
https://golang.org/src/time/time.go?s=21095:21114#L610

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

@zSys-bot
Copy link

Build finished.

3 similar comments
@zSys-bot
Copy link

Build finished.

@zSys-bot
Copy link

Build finished.

@zSys-bot
Copy link

Build finished.

@dashpole
Copy link
Collaborator Author

Unit testing is complete. I had to plum through a clock. It should be fine to use apimachinery, right?

}
waitForHousekeeping.Done()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I think you can get away with not mocking this:

onDemandCache := make(chan chan struct{}, numConcurrentCalls)
for i := 0; i < numConcurrentCalls; i++ {
	go func() {
		c.OnDemandHousekeeping(0*time.Second)
		waitForHousekeeping.Done()
	}
	// Wait for work to be queued
	onDemandCache = append(onDemandCache, <-c.onDemandChan)
}
// Requeue work:
for _, ch := range onDemandCache {
	c.OnDemandChan <- ch
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch to that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dashpole
Copy link
Collaborator Author

I tested this in kubernetes/kubernetes and verified that it passes the summary API test as well

@tallclair
Copy link
Contributor

Thanks for the tests. We've tried to avoid dependencies on Kuberneties to avoid the circular dependency, but I think it's ok in this case since it's a small utility package, and only used for testing.

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

@dashpole
Copy link
Collaborator Author

Ok, it should be ready

waitForHousekeeping.Done()
}()
// Wait for work to be queued
onDemandCache <- <-cd.onDemandChan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer a slice for the cache. It makes the requeue more straightforward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but the double arrow looks really cool. Done

@zSys-bot
Copy link

Build finished.

1 similar comment
@zSys-bot
Copy link

Build finished.

Copy link
Contributor

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

thanks!

@dashpole dashpole merged commit 17dcf1c into google:master Nov 20, 2017
@dashpole dashpole deleted the on_demand_metrics branch November 20, 2017 23:06
@derekwaynecarr
Copy link
Contributor

@dashpole --- this is really awesome.

@derekwaynecarr
Copy link
Contributor

fyi @sjenning xmas came early!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants