Disable xattrs if layer has no xattrs or opaque directories #1021

Kern-- · 2024-01-04T22:47:35Z

Issue #, if available:

Description of changes:
This is a rebase + update to #718

Original change

The original insight is that GetXAttr and ListXAttr are the most expensive FUSE operations, accounting for significant SOCI overhead in some use cases, but most layers don't actually contain any xattrs. If we can disable xattrs in these cases, then we can get better runtime performance with SOCI.

Since image layers are immutable, if the layer tarball doesn't contain any xattrs, then we can disable xattrs for the FUSE mount and the kernel will stop issuing the calls to the snapshotter.

Updates

Fix for opaque directories

The original insight is mostly correct, but it doesn't handle OveralyFS opaque directories. To delete a file in OverlayFS, you can either create a whiteout file in a high layer (.wh.$ORIGINAL_FILE_NAME) or you can implicitly delete all files in a directory by returning an opaque xattr (trusted.overlay.opaque=y). The image spec encodes opaque xattrs into the layer as a special file ($DIR/.wh..wh..opq) instead of directly putting the xattr into the tarball, and the snapshotter translates that back into an xattr at runtime. With the original change xattrs would be disabled so the opaque file wouldn't be converted back into an xattr which means that the files would be visible.

This change updates the logic for determining whether to disable xattrs to include checking for opaque directories

`--xattr-optimization` -> `--optimizations xattr`

Following a suggestion on the original PR, I renamed the flag to --optimizations so that we can experiment with optimizations in the future. I'm on the fence on whether there should be a concept of "experimental optimizations" and "stable optimizations". I can also imagine that this xattrs optimization eventually just becomes on-by-default and disappears from the CLI entirely. Thoughts on this are welcome.

Everything is now "Disable Xattrs"

In the original PR, the discussions were back and forth on whether we should use variations of "Has xattrs", "No xattrs", or "Disable Xattrs". Since now it also includes whether the layer has whiteouts, I opted to stick to "Disable Xattrs" everywhere since it's not directly related to whether xattrs exist in the layer, but whether xattrs should be enabled for OverlayFS. This also has the nice property that the default false ignores the optimization as pointed out in the original PR.

Testing performed:

make check && make test && make integration

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Scott Buckfelder <buckscot@amazon.com> Co-authored-by: Kern Walster <walster@amazon.com>

austinvazquez

Thanks for the detailed description and connecting the all the dots. Change LGTM. I am pretty aligned with the --optimizations approach but it would be nice if you could add docs/optimizations.md with a short writeup on the disable xattr optimization and the state of the optimization. i.e. experimental and disabled by default.

This change updates the xattr optimizaion by: 1) Properly handling opaque directories 2) Changing the CLI flag to `--optimizations xattr` 3) Changing the label to `disable-xattrs` Signed-off-by: Kern Walster <walster@amazon.com>

sondavidb

Current changes LGTM, really like the new structure to allow us to do more experimental optimizations in the future.

Seconding Austin's comment on the documentation so that users have clarity over what exactly these optimizations do.

I can also imagine that this xattrs optimization eventually just becomes on-by-default and disappears from the CLI entirely.

I think we should have discussions/guidelines on how we decide if features are experimental or not. To me, this change seems like it should be a default behavior, since I view it as a raw performance increase for the majority of users, and inconsequential for the others. I'd like to know if you had a particularly strong reason for making this experimental/opt-out by default, and from there assess how we want to evaluate future features.

For now though, we should definitely keep it as is, since it would justify the new Optimizations type.

cmd/soci/commands/create.go

soci/soci_index.go

sondavidb · 2024-03-21T00:29:31Z

Did some analysis on this and wanted to report my findings:

TL;DR is that this seems to deliver some very promising performance increases.

The images were tested by first starting up the SOCI snapshotter, waiting 15 seconds, pulling and running the container, then exiting 40 seconds after initially starting up the snapshotter, then waiting another 15 seconds before killing the process.

The Prometheus metrics pictures are slightly edited just to show the exact values at each time frame, instead of having to send 4 separate pictures. FlameGraphs are taken from the time window 20-40 seconds after starting the snapshotter.

The first image that I tested was an alpine image with the fio package, which makes many kernel calls. The layer itself only has two lazy-loaded layers. One had xattrs while the other didnt.

Prometheus metrics (first one before change, second one after change)

![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/64456a6b-3bda-4f3d-bcb8-895969c41910)

Flamegraphs (first one before change, second one after change)

![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/8fb994d8-a3c4-4ed2-8b7f-3a478fae33c3) ![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/a8f099cf-f4d0-4086-93cc-8ca8c56d755c)

We can see that in the Prometheus metrics, total CPU usage was cut by over 33%. In the FlameGraphs, we can see the server loop uses also has over 33% less CPU usage.

The other test was run with a rabbitmq image. Same was applied. This one had three lazy-loaded layers, and none had xattrs.

Prometheus metrics (first one before change, second one after change)

![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/cc7a9a72-5b77-4ce5-a605-e53f9f8adfe8)

Flamegraphs (first one before change, second one after change)

![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/d2bb72c5-484a-4105-8b5e-f3026d048900) ![image](https://github.com/awslabs/soci-snapshotter/assets/55555210/b650fc9f-966a-4ef2-8b3f-7221d110daa5)

Here we see total CPU usage down by about 45% in the Flamegraphs, and nearly 50% in the Prometheus metrics.

sondavidb · 2024-03-21T21:32:47Z

One final note (for another PR, not this one) — I think that this should be default behavior. As I mentioned above, I don't see a use case where this degrades performance, but also would like to have this structure in place for future experimental optimizations we decide to implement.

For DisableXAttrs, we should probably add a flag/config value to enable/disable this behavior during runtime, as well as log whenever DisableXAttrs is enabled for each layer. This is to ensure we have visibility on whether or not this functionality is enabled.

Unsure if I should open a new PR or just ship this and add this as a followup, though.

added xattr flag when creating soci index

eb627c4

Signed-off-by: Scott Buckfelder <buckscot@amazon.com> Co-authored-by: Kern Walster <walster@amazon.com>

austinvazquez previously approved these changes Jan 5, 2024

View reviewed changes

Fix xattr optimization

de43387

This change updates the xattr optimizaion by: 1) Properly handling opaque directories 2) Changing the CLI flag to `--optimizations xattr` 3) Changing the label to `disable-xattrs` Signed-off-by: Kern Walster <walster@amazon.com>

Kern-- dismissed austinvazquez’s stale review via de43387 January 8, 2024 23:26

Kern-- force-pushed the disable-xattrs branch from 9b5c8bd to de43387 Compare January 8, 2024 23:26

Kern-- marked this pull request as ready for review January 8, 2024 23:28

Kern-- requested a review from a team as a code owner January 8, 2024 23:28

sondavidb reviewed Jan 10, 2024

View reviewed changes

cmd/soci/commands/create.go Show resolved Hide resolved

cmd/soci/commands/create.go Show resolved Hide resolved

turan18 reviewed Jan 11, 2024

View reviewed changes

soci/soci_index.go Show resolved Hide resolved

sondavidb approved these changes Mar 21, 2024

View reviewed changes

henry118 approved these changes Mar 22, 2024

View reviewed changes

sondavidb merged commit 715ac46 into awslabs:main Mar 22, 2024
4 checks passed

austinvazquez mentioned this pull request Mar 22, 2024

added xattr flag when creating soci index #718

Closed

sondavidb mentioned this pull request Mar 23, 2024

Disable xattrs by default #1133

Merged

sondavidb mentioned this pull request Apr 1, 2024

[FEATURE] Investigate GetXAttr performance #971

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable xattrs if layer has no xattrs or opaque directories #1021

Disable xattrs if layer has no xattrs or opaque directories #1021

Kern-- commented Jan 4, 2024

austinvazquez left a comment

sondavidb left a comment

sondavidb commented Mar 21, 2024 •

edited

Loading

sondavidb commented Mar 21, 2024

Disable xattrs if layer has no xattrs or opaque directories #1021

Disable xattrs if layer has no xattrs or opaque directories #1021

Conversation

Kern-- commented Jan 4, 2024

Original change

Updates

Fix for opaque directories

--xattr-optimization -> --optimizations xattr

Everything is now "Disable Xattrs"

austinvazquez left a comment

Choose a reason for hiding this comment

sondavidb left a comment

Choose a reason for hiding this comment

sondavidb commented Mar 21, 2024 • edited Loading

sondavidb commented Mar 21, 2024

`--xattr-optimization` -> `--optimizations xattr`

sondavidb commented Mar 21, 2024 •

edited

Loading