Skip to content

Conversation

@mitdesai
Copy link
Contributor

What is this PR for?

Added a new Rest Endpoint for getting the scheduling order - These come in handy when debugging issues in scheduling.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-3121

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.83%. Comparing base (e41a426) to head (28895fa).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1042      +/-   ##
==========================================
+ Coverage   81.76%   81.83%   +0.06%     
==========================================
  Files         103      103              
  Lines       13535    13557      +22     
==========================================
+ Hits        11067    11094      +27     
+ Misses       2208     2203       -5     
  Partials      260      260              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pbacsko
Copy link
Contributor

pbacsko commented Nov 7, 2025

@mitdesai please take care of the linter problem. You can check locally by running "make lint".

@mitdesai mitdesai force-pushed the YUNIKORN-3121 branch 2 times, most recently from 9b91b8e to f76bad8 Compare November 7, 2025 19:37
Copy link
Contributor

@manirajv06 manirajv06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it helps.

I propose orderLog or something like that similar to allocationLog to capture the order of events. We can come up with list of constant or message to be logged for each important step. As and when each step pass through, log the corresponding constant/message to the orderLog from the originating place as opposed to simulating the code given in this PR as it does not require us to update both the places. Finally, it can be dumped in the state dump. Not really sure this is something needs to be exposed through REST API.

In addition, we can also capture these kind of cases as well going forward.

@mitdesai
Copy link
Contributor Author

I like the idea of in-place updates and maintaining the orderLog. But I still feel that it should be a separate endpoint even though we put this information in the state dump.

State dumps are huge. And when we are debugging a scenario where some pods are not getting scheduled and we just need to know the order of what is getting scheduled, it is quicker and less expensive to get selective information.

@manirajv06
Copy link
Contributor

State dump is the place where we keep all these diagnostic info in general. It is just a matter of looking for "orderLog" string in the dump :)

@mitdesai mitdesai force-pushed the YUNIKORN-3121 branch 3 times, most recently from de98649 to 3b47b63 Compare November 19, 2025 16:23
// Build a map of queue -> applications with pending requests
queueAppMap := make(map[string][]string)

for _, app := range pc.applications {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map containing applications does not guarantee specific order. Since we are looking for scheduling order at any given moment, I would rather log into data structure which strictly maintains order based on the insertion order so that it can consumed as is in the state dump. I can see Queue.TryAllocate as the best place to push the application into the above mentioned slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Good Catch. Not sure how I missed that. Let me fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants