Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set NSS_SDB_USE_CACHE=no to avoid memory growth #1716

Merged
merged 1 commit into from
Sep 12, 2019

Conversation

jordansissel
Copy link
Contributor

@jordansissel jordansissel commented Sep 11, 2019

On affected systems[1], the readinessProbe costs 5mb of memory (file system cache) per invocation which is observed in some cases[2] to never be reclaimed. This is a bug in the kernel and/or kubernetes, but in the meantime, this env var can help prevent the readinessProbe from causing memory usage growth and eventually causing Kubernetes to evict pods.

[1] Certain versions of CentOS and GKE are known affected.
[2] It is observed in production that for pods without a strict memory limit, the container's file system cache (dentry) is not reclaimed appropriately, causing the kubelet to believe the pod is using far more memory than is actually being used by processes in the pod.

References: #1635

This PR is a proposal to work around a bug in the kernel/kubernetes. The reason I propose it is that this issue took 5 working days to investigate[3] and eventually find a workaround. I would spare others this same fate by having the workaround enabled by default ;)

[3] The investigation traveled the gradient from elasticsearch, to kubernetes, to the linux kernel, and all things in between. It was an interesting but ultimately unenjoyable adventure given I really didn't want to be spending a week digging into the kernel.

@jordansissel
Copy link
Contributor Author

We have this deployed by setting env vars in one production cluster (ECK 0.9.0):

        containers: 
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: -Xms5g -Xmx5g
          # Curl in the readinessProbe triggers a kernel memory accounting bug(feature?)
          # So we disable it that part of curl.
          - name: NSS_SDB_USE_CACHE
            value: "no"

The impact of this change is observed in the following charts:

Memory usage in a one-week period where I was engaged with this exciting issue. Note the massive memory growth (caused by readinessProbe/curl) and high plateau (far above the 'memory request'). The right-side of the chart shows the beginning of my test where I set the NSS_SDB_USE_CACHE=no env var.

image

If we zoom into a 24 hour period just before I enabled this ENV var, we can see that memory usage is quite stable:

image

@jordansissel jordansissel force-pushed the issue/disable-curl-cache branch from 708a864 to b1366d6 Compare September 11, 2019 22:28
@jordansissel
Copy link
Contributor Author

If there's 👍 on this proposed implementation, I can write tests. Let me know whatcha like :)

@sebgl
Copy link
Contributor

sebgl commented Sep 12, 2019

Awesome ❤️
We probably need to update unit tests accordingly - something the team can do if you're busy @jordansissel.

@jordansissel
Copy link
Contributor Author

need to update unit tests accordingly

Ahh, env vars need to be sorted (I see WithEnv sorts them, but the expected := ... in the test suite doesn't sort them.). I pushed a new commit which should pass this particular failure (does locally). Waiting on CI now.

On affected systems[1], the readinessProbe costs 5mb of memory per invocation which is observed in some cases[2] to never be reclaimed. This is a bug in the kernel and/or kubernetes, but in the meantime, this env var can help prevent the readinessProbe from causing memory usage growth and eventually causing Kubernetes to evict pods.

[1] Certain versions of CentOS and GKE are known affected.
[2] It is observed in production that for pods without a strict memory `limit`, the container's file system cache (dentry) is not reclaimed appropriately, causing the kubelet to believe the pod is using far more memory than is actually being used by processes in the pod.

Fix unit tests to sort env vars so that comparisons will succeed.

References: #1635
@jordansissel jordansissel force-pushed the issue/disable-curl-cache branch from 95a6ee9 to c25a794 Compare September 12, 2019 19:03
@@ -180,6 +180,13 @@ func TestBuildPodTemplateSpec(t *testing.T) {
},
}

// pods built with BuildPodTemplateSpec sort env vars, so our expected result must be sorted as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 My initial idea was to use go-spew (which also might make the error output below easier to read), but that will only sort map keys

Copy link
Contributor

@anyasabo anyasabo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding and including the issue links in the comments

@jordansissel
Copy link
Contributor Author

The elasticsearch-ci/docs check is stalled (or dead?) due to problems on elasticsearch-ci. The infra team is aware and working on the problem.

@jordansissel jordansissel merged commit d083077 into master Sep 12, 2019
@jordansissel
Copy link
Contributor Author

CI finally finished (<3 to our infra team who've been working on stabilizing Jenkins resources much of today)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants