Skip to content

Conversation

@spredolac
Copy link

Motivation
If we already have huge and empty page it may be more efficient to donate it to another hpa shard, then purge and fault it again. This also helps somewhat with virtual address space as one would get the already allocated hpdata object if it is available in the pool before creating another one.
On some of the internal workloads we see improvement in memory, and on some we saw small CPU improvements when using pool.

What
Two commits are plain refactor - moving central related code out of the hpa.c and moving some utility functions into a separate module. Last commit is adding simple page pool which is effectively mutex guarded two lists (purged and non-purged empty folios) that each shard can borrow from.

Testing
Added some unit tests, and also tested in production using internal workloads.

@meta-cla meta-cla bot added the cla signed label Oct 28, 2025
return ps;
}

/*
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explains why certain things are needed in the following commit when this comment is not necessary anymore.

typedef struct hpa_pool_s hpa_pool_t;
struct hpa_pool_s {
/*
* Pool of empty huge pages to be shared between shards that are
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this formatting

@guangli-dai
Copy link

Discussed offline and concluding the discussion here:

  1. The two refactoring commits look good. They isolate different modules of HPA codes more reasonably. We want to take the refactoring commits in first.
  2. Adding a central pool is surely what we want in the long term. However, when to donate an empty 2MB units to the pool needs more discussion. If we donate soon, we can theoretically reduce the fragmentation. However, the contention in the central pool also grows rapidly. We want to use bpf to understand the frequency of such contentions and will likely go with a time-based demoting in this case. In short, the pool part needs more experiments and discussions to make the CPU-memory trade-off more efficient as well as configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants