Skip to content

Remove equivalence cache from the scheduler code base #71013

Closed
@bsalamat

Description

What would you like to be added:
Remove equivalence cache from the scheduler code base.

Why is this needed:
The equivalence cache (eCache) was added to the scheduler as a mechanism to improve performance of running predicate functions. The equivalence cache stores the results of predicates for pods and as long as conditions of a node are not changed, it uses the cached results for pods which have the same scheduling requirements.
While on paper, this should have improved performance of the scheduler significantly, in practice it slowed the scheduler down for many common scenarios. The reason turned out to be lock contention, and accessing a three-level cache which was sometime slower than running the predicates themselves.
We then tried to optimize the locking mechanism. It helped improve performance over the previous implementation, but it was still causing slow down compared to the scheduler without eCache, for pods that didn't have complex scheduling requirements. It improved performance for pods with inter-pod affinity/anti-affinity though. This was before we added further optimizations that improved performance of affinity/anti-affinity by 5x. So, performance improvements for affinity/anti-affinity is much smaller now, but it is still significant enough to consider having eCache. However, it turned out that eCache complicates our code base much and invalidating the eCache at various events turns out to be error prone and makes building some of the new scheduling features harder. For example, ensuring that dynamic volume binding works with eCache proved to be non-trivial. As a result we have decided to remove the current implementation of eCache.

Our plan is to redesign the equivalence cache with a different mechanism to ensure that the scheduler does not keep retrying scheduling a large number of equivalent pods after it finds one of them unschedulable. When one pod is determined unschedulable, all other equivalent pods will be unschedulable as well. So, the scheduler can save CPU cycles and try other pods.

/kind cleanup

/sig scheduling

Metadata

Assignees

Labels

kind/cleanupCategorizes issue or PR as related to cleaning up code, process, or technical debt.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions