Skip to content

Commit

Permalink
Merge pull request #29060 from ehashman/swap-blog
Browse files Browse the repository at this point in the history
1.22 feature blog for alpha swap support
  • Loading branch information
k8s-ci-robot authored Aug 9, 2021
2 parents 9c7c238 + 39e39c0 commit 3e4fc78
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions content/en/blog/_posts/2021-08-09-alpha-swap-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
layout: blog
title: 'New in Kubernetes v1.22: alpha support for using swap memory'
date: 2021-08-09
slug: run-nodes-with-swap-alpha
---

**Author:** Elana Hashman (Red Hat)

The 1.22 release introduced alpha support for configuring swap memory usage for
Kubernetes workloads on a per-node basis.

In prior releases, Kubernetes did not support the use of swap memory on Linux,
as it is difficult to provide guarantees and account for pod memory utilization
when swap is involved. As part of Kubernetes' earlier design, swap support was
considered out of scope, and a kubelet would by default fail to start if swap
was detected on a node.

However, there are a number of [use cases](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#user-stories)
that would benefit from Kubernetes nodes supporting swap, including improved
node stability, better support for applications with high memory overhead but
smaller working sets, the use of memory-constrained devices, and memory
flexibility.

Hence, over the past two releases, [SIG Node](https://github.com/kubernetes/community/tree/master/sig-node#readme) has
been working to gather appropriate use cases and feedback, and propose a design
for adding swap support to nodes in a controlled, predictable manner so that
Kubernetes users can perform testing and provide data to continue building
cluster capabilities on top of swap. The alpha graduation of swap memory
support for nodes is our first milestone towards this goal!

## How does it work?

There are a number of possible ways that one could envision swap use on a node.
To keep the scope manageable for this initial implementation, when swap is
already provisioned and available on a node, [we have proposed](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#proposal)
the kubelet should be able to be configured such that:

- It can start with swap on.
- It will direct the Container Runtime Interface to allocate zero swap memory
to Kubernetes workloads by default.
- You can configure the kubelet to specify swap utilization for the entire
node.

Swap configuration on a node is exposed to a cluster admin via the
[`memorySwap` in the KubeletConfiguration](/docs/reference/config-api/kubelet-config.v1beta1/).
As a cluster administrator, you can specify the node's behaviour in the
presence of swap memory by setting `memorySwap.swapBehavior`.

This is possible through the addition of a `memory_swap_limit_in_bytes` field
to the container runtime interface (CRI). The kubelet's config will control how
much swap memory the kubelet instructs the container runtime to allocate to
each container via the CRI. The container runtime will then write the swap
settings to the container level cgroup.

## How do I use it?

On a node where swap memory is already provisioned, Kubernetes use of swap on a
node can be enabled by enabling the `NodeSwap` feature gate on the kubelet, and
disabling the `failSwapOn` [configuration setting](/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration)
or the `--fail-swap-on` command line flag.

You can also optionally configure `memorySwap.swapBehavior` in order to
specify how a node will use swap memory. For example,

```yaml
memorySwap:
swapBehavior: LimitedSwap
```
The available configuration options for `swapBehavior` are:

- `LimitedSwap` (default): Kubernetes workloads are limited in how much swap
they can use. Workloads on the node not managed by Kubernetes can still swap.
- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they
request, up to the system limit.

If configuration for `memorySwap` is not specified and the feature gate is
enabled, by default the kubelet will apply the same behaviour as the
`LimitedSwap` setting.

The behaviour of the `LimitedSwap` setting depends if the node is running with
v1 or v2 of control groups (also known as "cgroups"):

- **cgroups v1:** Kubernetes workloads can use any combination of memory and
swap, up to the pod's memory limit, if set.
- **cgroups v2:** Kubernetes workloads cannot use swap memory.

### Caveats

Having swap available on a system reduces predictability. Swap's performance is
worse than regular memory, sometimes by many orders of magnitude, which can
cause unexpected performance regressions. Furthermore, swap changes a system's
behaviour under memory pressure, and applications cannot directly control what
portions of their memory usage are swapped out. Since enabling swap permits
greater memory usage for workloads in Kubernetes that cannot be predictably
accounted for, it also increases the risk of noisy neighbours and unexpected
packing configurations, as the scheduler cannot account for swap memory usage.

The performance of a node with swap memory enabled depends on the underlying
physical storage. When swap memory is in use, performance will be significantly
worse in an I/O operations per second (IOPS) constrained environment, such as a
cloud VM with I/O throttling, when compared to faster storage mediums like
solid-state drives or NVMe.

Hence, we do not recommend the use of swap for certain performance-constrained
workloads or environments. Cluster administrators and developers should
benchmark their nodes and applications before using swap in production
scenarios, and [we need your help](#how-do-i-get-involved) with that!

## Looking ahead

The Kubernetes 1.22 release introduces alpha support for swap memory on nodes,
and we will continue to work towards beta graduation in the 1.23 release. This
will include:

* Adding support for controlling swap consumption at the Pod level via cgroups.
* This will include the ability to set a system-reserved quantity of swap
from what kubelet detects on the host.
* Determining a set of metrics for node QoS in order to evaluate the
performance and stability of nodes with and without swap enabled.
* Collecting feedback from test user cases.
* We will consider introducing new configuration modes for swap, such as a
node-wide swap limit for workloads.

## How can I learn more?

You can review the current [documentation](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory)
on the Kubernetes website.

For more information, and to assist with testing and provide feedback, please
see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its
[design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md).

## How do I get involved?

Your feedback is always welcome! SIG Node [meets regularly](https://github.com/kubernetes/community/tree/master/sig-node#meetings)
and [can be reached](https://github.com/kubernetes/community/tree/master/sig-node#contact)
via [Slack](https://slack.k8s.io/) (channel **#sig-node**), or the SIG's
[mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node).
Feel free to reach out to me, Elana Hashman (**@ehashman** on Slack and GitHub)
if you'd like to help.

0 comments on commit 3e4fc78

Please sign in to comment.