-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for cgroups in MEMORY_SIZE detection #2345
Conversation
Based on running the two jobs at 7544 and 7545 on a host that was showing problems it is now doing a better job of restricting itself. The load on the host is around 15-17 when running on two 8-core docker images, and the RAM usage of each is about 2Gb out of the 8Gb they have allocated. This contrasts with a load of over 100 and all 8Gb in use before this patch was applied. |
16ff43f
to
5df1afa
Compare
Taking back to draft as |
Signed-off-by: Stewart X Addison <sxa@redhat.com>
Signed-off-by: Stewart X Addison <sxa@redhat.com>
Updated to use a slightly messy stripping of the last 5 characters from Final (hopefully!) testing of
[For historic reference, the previous version of this PR that didn't work well used a count of entries in |
@sxa Sorry, but what is this Are you sure you'll need to do anything in terms of cgroups CPU quotas? As an initial shot I'd only factor in the cgroup (v1!) memory limit and tests will likely run better with it in place. It's probably what you are observing now anyway. If you really need CPU too, this is a bit of a can of worms as it might get cpu limited in a variety of ways. This will get even more fun once you add in cgroups v2 hosts into the mix and docker containers. |
I'm not sure of anything at the moment :-) I was just trying to find something that would stop the jobs failing due to being overloaded by detecting significantly more resources than the containers had and this was the first option that seemed to work. This close to the JDK16 GA I'm keen to get something in that will help even if we change it to something more suitable later. I fully admit that the quick look I had last night didn't give me a good answer on how to get a suitable figure so I'm very much open to other options 😁 I'll give it a shot with just the memory limit today and see if that works. Is certainly a far more definite thing than the CPU values I'm picking up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch seems OK.
Looks to be working - the container I'm testing with is sitting at less than it's capacity with just the memory check :-) |
Signed-off-by: Stewart X Addison <sxa@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sxa !
This is a bit unpleasant but if we're in a restricted docker container we need to restrict the number of CPUs and RAM quantity selected otherwise it'll mis-detect the physical machine's quantity and overwhelm the container and probably exhaust the container's RAM. While it won't affect other containers, but will likely cause lots of timeouts as the container tries to deal with all the process contention and result in potentially intermittent hard-to-diagnose test failures. I'm not sure how many Linux systems won't have /sys/fs/cgroup but I've put some detection in to cover the case where that doesn't exist (It should fall back to the same values as before)
This is an enhancement to #1427 and will hopefully resolve adoptium/infrastructure#2002 (Also Ref: General container issue #2138)
I'm going to run some more testing on this, but it seems to do the right thing in several of the checks I've done.
Signed-off-by: Stewart X Addison sxa@redhat.com