Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAS: Compute the topology assignments #3256

Merged
merged 3 commits into from
Oct 22, 2024

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Oct 17, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #2724

Special notes for your reviewer:

I'm not setting any release note here as it is already set on the API PR.

This PR will be followed up with (based on the prototype in #3218)

  • Adding scheduling gates for the Pods (DONE)
  • TopologyUngater to ungate the pods (DONE)
  • e2e tests (prototyped)
  • validation PRs
  • bugfixes when tested e2e at bigger scale

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 17, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 17, 2024
@mimowo mimowo changed the title TAS: Compute the topology assignments WIP: TAS: Compute the topology assignments Oct 17, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 17, 2024
Copy link

netlify bot commented Oct 17, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 915d1de
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67178aa926fe560008fb8c32

@mimowo mimowo force-pushed the tas-compute-assignment branch 2 times, most recently from 735b626 to 27bbbab Compare October 17, 2024 14:21
@mimowo
Copy link
Contributor Author

mimowo commented Oct 17, 2024

/assign @PBundyra @gabesaba

@mimowo mimowo changed the title WIP: TAS: Compute the topology assignments TAS: Compute the topology assignments Oct 17, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Oct 17, 2024

/cc @tenzen-y

@PBundyra
Copy link
Contributor

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 18, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Oct 18, 2024

/cc @alculquicondor

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

pkg/cache/cache.go Outdated Show resolved Hide resolved
pkg/cache/cache.go Outdated Show resolved Hide resolved
pkg/cache/clusterqueue.go Outdated Show resolved Hide resolved
pkg/cache/clusterqueue.go Outdated Show resolved Hide resolved
@@ -76,6 +78,10 @@ func (s *Snapshot) Log(log logr.Logger) {
}

func (c *Cache) Snapshot() Snapshot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you plan to remove this function in a follow up?

Copy link
Contributor Author

@mimowo mimowo Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intended to follow up. I could do it in this PR too, but it is used in a bunch of tests, so I didn't want to increase the size of the diff which is already large.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep opening this thread so that we can avoid cleanup this function in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up the Snapshot function in this PR, last commit: cd977aa

pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Outdated Show resolved Hide resolved
@mimowo mimowo force-pushed the tas-compute-assignment branch 2 times, most recently from d5ca699 to 8807023 Compare October 21, 2024 06:36

// domain holds the static information about placement of a topology
// domain in the hierarchy of topology domains.
type domain struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I have renamed it once again. I like just domain because it is specific in this context (contrary to previous: info or node), yet short (compared to previous topologyDomainNode)
cc @gabesaba

@mimowo
Copy link
Contributor Author

mimowo commented Oct 22, 2024

/test pull-kueue-test-multikueue-e2e-main
infra flake unralted to the PR:
go: sigs.k8s.io/kind@v0.24.0: Get "https://proxy.golang.org/sigs.k8s.io/kind/@v/v0.24.0.info": net/http: TLS handshake timeout

@mimowo
Copy link
Contributor Author

mimowo commented Oct 22, 2024

FYI as mentioned under the issue I opened a spreadsheet to keep track of the follow ups.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great implementation!
This is an awesome feature! Thank you!

Note that I leave some trivial comments. Let's address those in a follow-up.

/lgtm
/approve

pkg/cache/tas_flavor.go Show resolved Hide resolved
test/integration/tas/tas_job_test.go Show resolved Hide resolved
test/integration/tas/tas_job_test.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d1b5d6eebf128345d06fa257eb856a90e592f1e6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 22, 2024
@k8s-ci-robot k8s-ci-robot merged commit 0afb766 into kubernetes-sigs:main Oct 22, 2024
15 of 16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Oct 22, 2024
pkg/cache/tas_flavor_snapshot.go Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Show resolved Hide resolved
pkg/cache/tas_flavor_snapshot.go Show resolved Hide resolved
PBundyra

This comment was marked as outdated.

Copy link
Contributor

@PBundyra PBundyra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are my final nits, thank you!

pkg/cache/tas_cache_test.go Show resolved Hide resolved
test/integration/tas/tas_job_test.go Show resolved Hide resolved
test/integration/tas/tas_job_test.go Show resolved Hide resolved
test/integration/tas/tas_job_test.go Show resolved Hide resolved
@PBundyra
Copy link
Contributor

LGTM, great work!
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants