Support interpodaffinity and podtopologyspread #61

slipegg · 2024-08-06T08:35:28Z

I am the participant of the godel-scheduler project in OSPP. I have added the interpodaffinity and podtopologyspread plugins and related test files to godel-scheduler as required.

Considering the overall structure of the godel-scheduler project:

The interpodaffinity plugin refers to the implementation of kubenetes@release-1.20
The podtopologyspread plugin refers to the implementation of kubenetes@release-1.19.

I have completed testing the basic functions locally. If there are any areas that need improvement, I would appreciate your suggestions.

CLAassistant · 2024-08-06T08:35:35Z

All committers have signed the CLA.

pkg/scheduler/factory.go

pkg/scheduler/framework/plugins/interpodaffinity/filtering.go

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

pkg/binder/framework/plugins/podtopologyspread/podtopologyspread.go

rookie0080 · 2024-09-24T10:04:07Z

pkg/scheduler/framework/plugins/podtopologyspread/scoring.go

+// `maxSkew-1` is added to the score so that differences between topology
+// domains get watered down, controlling the tolerance of the score to skews.
+func scoreForCount(cnt int64, maxSkew int32, tpWeight float64) float64 {
+	return float64(cnt)*tpWeight + float64(maxSkew-1)


What are the parameters maxSkew and tpWeight for? What will be the difference if these two parameters are removed (i.e. score = cnt)?

This is the original calculation method of k8s. I am not very clear about why it is calculated in this way. I can try to explain it as follows:

maxSkew: represents the maximum imbalance value allowed by this constraint, maxSkew = the maximum number of pods in a topology domain - the minimum number of pods in a topology domain

tpWeight: represents the number of filtered nodes tpWeight = (log(len(filterNodes+2)))

cnt: the number of matched pods

In this calculation formula, float64(cnt)*tpWeight shows that when the number of pods is the same, it is more inclined to schedule to the topology domain with more nodes, and +float64(maxSkew-1) in the formula is a small factor, which only represents the difference between different constraints to a certain extent, it also can make the scheduling more stable when the pods scheduled at the beginning are very small

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

slipegg · 2024-10-14T03:44:33Z

I manually tested the InterPodAffinity and PodTopologySpread scheduling plug-ins, and both passed the verification. The document (Chinese version) can be viewed here: https://xeh61jru4u.feishu.cn/docx/NwrMdx3Hjoi47fxXJf4cXNNOnof.

test/e2e/scheduling/hard_constraints.go

test/e2e/scheduling/soft_constraints.go

rookie0080 · 2024-10-24T14:55:52Z

pkg/binder/framework/plugins/podtopologyspread/podtopologyspread.go

+		return framework.NewStatus(framework.Error, err.Error())
+	}
+
+	state := utils.GetPreFilterState(pod, nodeInfos, constraints)


There is an uncommon case: the pod can be scheduled only after preempting the victims. In this case, the removal of the victims should be awared in the state, just as the AddPod and RemovePod functions of InterPodAffininity/PodTopologySpread plugins do.

Does this mean I need to add a RemoveVictim function? How does this differ from the existing RemovePod function?

rookie0080 · 2024-10-28T09:21:45Z

I manually tested the InterPodAffinity and PodTopologySpread scheduling plug-ins, and both passed the verification. The document (Chinese version) can be viewed here: https://xeh61jru4u.feishu.cn/docx/NwrMdx3Hjoi47fxXJf4cXNNOnof.

Please also attach the benchmark testing doc for other reviewers.

slipegg · 2024-10-28T11:52:29Z

I manually tested the InterPodAffinity and PodTopologySpread scheduling plug-ins, and both passed the verification. The document (Chinese version) can be viewed here: https://xeh61jru4u.feishu.cn/docx/NwrMdx3Hjoi47fxXJf4cXNNOnof.

Please also attach the benchmark testing doc for other reviewers.

I used kind+kwok to simulate 5,000 nodes and tested the performance of the two scheduling plugins, InterPodAffinity and PodTopologySpread. Here is the detailed test document (Chinese version): https://xeh61jru4u.feishu.cn/docx/JhNIdmyiZo19Ibxns2ucin0OnZd

The test results are as follows:

Constraint Type	Case	Maximum e2e Scheduling Speed (pods/s)
Filter (Hard Constraints)	MaxSkew of 1 for topology uniform distribution at region level	151
	Inter-pod affinity constraints at region level	107
	Inter-pod anti-affinity constraints at region level	99.5
	Inter-pod anti-affinity constraints with existing Pods at region level	123
Score (Soft Constraints)	PodTopologySpread with MaxSkew of 1 for topology uniform distribution at region level	383 (Flattened Dispatcher)
	Inter-pod affinity constraints at region level	336 (Flattened Dispatcher)
	Inter-pod anti-affinity constraints at region level	342 (Flattened Dispatcher)
	Inter-pod anti-affinity constraints with existing Pods at region level	314 (Flattened Dispatcher)

slipegg · 2024-09-25T01:25:33Z

pkg/binder/framework/plugins/podtopologyspread/podtopologyspread.go

+
+func (pl *PodTopologySpreadCheck) getTopologyCondition(pod *v1.Pod) (*TopologySpreadCondition, error) {
+	constraints := []podtopologyspreadScheduler.TopologySpreadConstraint{}
+	allNodes, err := pl.frameworkHandle.SharedInformerFactory().Core().V1().Nodes().Lister().List(labels.Everything())


Now I know that a node can be an NMNode, which means it can be managed by the node manager. So when getting all nodes here, we should get nodeInfo instead of just getting nodes managed by kubelet. But I am not sure how to get all nodeInfo through frameworkHandle. I have not found any relevant code so far. If you can tell me how to get it, it will be very helpful to me.

Ohhh~I find that I can get NMNodes by BinderFrameworkHandle.CRDClientSet. I will fix related code in binder.

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

pkg/scheduler/framework/plugins/interpodaffinity/filtering.go

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

test/e2e/scheduling/hard_constraints.go

binacs

Great work, but there are some comments

binacs · 2024-10-29T08:56:46Z

pkg/binder/cache/cache.go

@@ -34,6 +34,7 @@ import (
 	commoncache "github.com/kubewharf/godel-scheduler/pkg/common/cache"
 	commonstore "github.com/kubewharf/godel-scheduler/pkg/common/store"
 	framework "github.com/kubewharf/godel-scheduler/pkg/framework/api"
+	schedulerCache "github.com/kubewharf/godel-scheduler/pkg/scheduler/cache"


We shouldn't import scheduler cache in binder.

Move NodeSlice to util pkg if needed.

Ok, I have move the NodeSlice to public pkg/common/store.

binacs · 2024-10-29T09:00:17Z

pkg/binder/cache/cache.go

@@ -217,3 +227,7 @@ func (cache *binderCache) FindStore(storeName commonstore.StoreName) commonstore
 	defer cache.mu.RUnlock()
 	return cache.CommonStoresSwitch.Find(storeName)
 }
+
+func (cache *binderCache) List() []framework.NodeInfo {
+	return append(cache.nodeSlices.InPartitionNodeSlice.Nodes(), cache.nodeSlices.OutOfPartitionNodeSlice.Nodes()...)


Need lock.

BTW it's dangerous to expose all of the nodes.

Yes, I have add lock.

Because the binder needs all the nodeInfo information when performing topology checks for InterPodAffinity and PodTopologySpread, I added this function. Is there a safer method?

binacs · 2024-10-29T09:02:24Z

pkg/binder/factory.go

@@ -67,7 +69,10 @@ func DefaultUnitQueueSortFunc() framework.UnitLessFunc {
 func NewBasePlugins(victimsCheckingPlugins []*framework.VictimCheckingPluginCollectionSpec) *apis.BinderPluginCollection {
 	// TODO add some default plugins later
 	basicPlugins := apis.BinderPluginCollection{
-		CheckTopology: []string{},
+		CheckTopology: []string{


nit: add these plugins on demand. We don't need to prepare date when the incoming Pod doesn't have cross-node constraints.

After considering the InterPodAffinity constraint, the antiAffinity of the existing pod will affect the pod to be checked, so we need to check InterPodAffinity, and it is difficult to determine in advance whether InterPodAffinity is needed. The approach here is to check InterPodAffinity and PodTopologySpread by default, but in the check, we will check in advance whether the pod to be checked really has this constraint, and return quickly if not.

binacs · 2024-10-29T09:04:00Z

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

+	if status != nil {
+		return status
+	}
+	// topologyLabels := nodeInfo.GetNodeLabels(podLauncher)


binacs · 2024-10-29T09:04:10Z

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go

+	}, nil
+}
+
+// func (pl *InterPodAffinity) getNodesWithSameTopologyLabels(topologyLabels map[string]string) ([]framework.NodeInfo, error) {


binacs · 2024-10-29T09:09:08Z

pkg/framework/api/nodeinfo_podinfo.go

@@ -311,7 +311,7 @@ func (m *PodInfoMaintainer) ReservedPodsNum() int {
 func (m *PodInfoMaintainer) GetPods() []*PodInfo {
 	pods := make([]*PodInfo, 0, m.bePodsMayBePreempted.Len()+m.gtPodsMayBePreempted.Len()+len(m.neverBePreempted))
 	rangeSplay := func(s splay.Splay) {
-		s.Range(func(so splay.StoredObj) {
+		s.RangeNoOrder(func(so splay.StoredObj) {


Do not modify underlying basic implementation for specific plugins

Ok, I have cancelled this commit.

The reason I modified this part before was that InterPodAffinity and PodTopologySpread plugins need to frequently obtain nodeInfo pods, and the original design was to obtain all pods through the dfs method, which had poor performance due to frequent recursion.

Maybe in the future I can try a more elegant method to optimize the performance of GetPods through another PR.

binacs · 2024-10-29T09:09:20Z

pkg/framework/api/nodeinfo_podinfo_test.go

@@ -38,126 +36,6 @@ func makePriority(priority int32) *int32 {
 	return &priority
 }

-func TestPodInfoMaintainer_GetPods(t *testing.T) {


Why delete this?

I have cancelled this commit.

The reason I deleted this is because I changed GetPods to unordered for higher performance, and this test requires GetPods to be ordered.

binacs · 2024-10-29T09:10:57Z

pkg/plugins/helper/node_info.go

+			}
+
+			for _, node := range allV1Nodes {
+				nodeInfo := frameworkHandle.GetNodeInfo(node.Name)


NodeInfo may be nil

This function is no longer used. I have deleted it.

binacs · 2024-10-29T09:12:32Z

pkg/plugins/helper/node_info.go

+	nodeInfoMap := map[string]framework.NodeInfo{}
+	for _, podLauncher := range podutil.PodLanucherTypes {
+		if podLauncher == podutil.Kubelet && frameworkHandle.SharedInformerFactory() != nil {
+			allV1Nodes, err := frameworkHandle.SharedInformerFactory().Core().V1().Nodes().Lister().List(labels.Everything())


I don't understand, why not schedule pods based on Cache?

cc @rookie0080

ohhh~This is my oversight. Actually, this is the previous implementation and this function is no longer used. Now I have changed to use cache. I will delete this function.

This function is no longer used. I have deleted it.

binacs

Just a comment. Otherwise lgtm

binacs · 2024-10-30T08:42:18Z

pkg/common/store/nodeslices.go

+
+var GlobalNodeInfoPlaceHolder = framework.NewNodeInfo()
+
+type NodeSlices struct {


Please move these code to pkg/framework/api/nodeinfo_hashslice.go

The data structure is not bound with the CommonStore Mechanism.

Ok, I have move it to pkg/framework/api/nodeinfo_hashslice.go.

binacs · 2024-10-30T13:36:23Z

lgtm, please squash the commits

cc @rookie0080

…read plugins

slipegg requested review from binacs, XinyiSong and NickrenREN as code owners August 6, 2024 08:35

rookie0080 reviewed Sep 9, 2024

View reviewed changes

pkg/scheduler/factory.go Outdated Show resolved Hide resolved

pkg/scheduler/framework/plugins/interpodaffinity/filtering.go Show resolved Hide resolved

slipegg requested a review from Liumeng-lambert as a code owner September 19, 2024 06:04

slipegg force-pushed the develop branch from 8aedc61 to dc9f785 Compare September 19, 2024 07:20

slipegg requested review from zryfish and lzlaa as code owners September 20, 2024 10:11

slipegg force-pushed the develop branch 8 times, most recently from 373d7da to 743b782 Compare September 21, 2024 02:43

rookie0080 reviewed Sep 23, 2024

View reviewed changes

pkg/scheduler/framework/plugins/interpodaffinity/filtering.go Outdated Show resolved Hide resolved

pkg/binder/framework/plugins/interpodaffinity/interpodaffinity.go Outdated Show resolved Hide resolved

rookie0080 reviewed Sep 24, 2024

View reviewed changes

pkg/binder/framework/plugins/podtopologyspread/podtopologyspread.go Outdated Show resolved Hide resolved

rookie0080 reviewed Sep 24, 2024

View reviewed changes

rookie0080 reviewed Sep 25, 2024

View reviewed changes

slipegg force-pushed the develop branch from d7fc2ed to e0e8153 Compare September 28, 2024 01:54

rookie0080 reviewed Oct 17, 2024

View reviewed changes

test/e2e/scheduling/hard_constraints.go Outdated Show resolved Hide resolved

test/e2e/scheduling/hard_constraints.go Show resolved Hide resolved

rookie0080 reviewed Oct 17, 2024

View reviewed changes

test/e2e/scheduling/soft_constraints.go Outdated Show resolved Hide resolved

slipegg force-pushed the develop branch from 954cfc5 to faed7e3 Compare October 17, 2024 12:47

rookie0080 reviewed Oct 24, 2024

View reviewed changes

slipegg commented Oct 28, 2024

View reviewed changes

slipegg force-pushed the develop branch 2 times, most recently from a6196c9 to d1a9442 Compare October 28, 2024 12:45

binacs reviewed Oct 29, 2024

View reviewed changes

slipegg changed the base branch from main to dev/pod-affinity October 29, 2024 12:23

binacs reviewed Oct 30, 2024

View reviewed changes

binacs approved these changes Oct 30, 2024

View reviewed changes

slipegg force-pushed the develop branch from 15c166f to ae54580 Compare October 30, 2024 16:01

feat: scheduler and binder support interPodAffinity and podTopologySp…

719acb3

…read plugins

slipegg force-pushed the develop branch from ae54580 to 719acb3 Compare October 30, 2024 16:05

NickrenREN merged commit cb8907b into kubewharf:dev/pod-affinity Oct 31, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support interpodaffinity and podtopologyspread #61

Support interpodaffinity and podtopologyspread #61

slipegg commented Aug 6, 2024

CLAassistant commented Aug 6, 2024 •

edited

Loading

rookie0080 Sep 24, 2024

slipegg Sep 24, 2024 •

edited

Loading

slipegg commented Oct 14, 2024

rookie0080 Oct 24, 2024

slipegg Oct 25, 2024

rookie0080 commented Oct 28, 2024

slipegg commented Oct 28, 2024

slipegg Sep 25, 2024

slipegg Sep 25, 2024

binacs left a comment

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 30, 2024

binacs Oct 29, 2024

slipegg Oct 29, 2024

slipegg Oct 30, 2024

binacs left a comment

binacs Oct 30, 2024

slipegg Oct 30, 2024

binacs commented Oct 30, 2024


		var GlobalNodeInfoPlaceHolder = framework.NewNodeInfo()

		type NodeSlices struct {

Support interpodaffinity and podtopologyspread #61

Support interpodaffinity and podtopologyspread #61

Conversation

slipegg commented Aug 6, 2024

CLAassistant commented Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

slipegg Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

slipegg commented Oct 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rookie0080 commented Oct 28, 2024

slipegg commented Oct 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binacs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binacs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binacs commented Oct 30, 2024

CLAassistant commented Aug 6, 2024 •

edited

Loading

slipegg Sep 24, 2024 •

edited

Loading