-
-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101
Conversation
It will definitely need to be added to the changelog. Seem to be rather a bug than an enhancement |
I am assuming that it is my responsibility to fix the failing check. Unfortunately, I was unable to reproduce the error message both on my Windows and Linux machine and therefore I can not prepare a fix. I suspect that the problem might be related to control characters. When doing a fresh git checkout on my Linux machine, I can not see anything wrong with it (e.g.: all line endings of a freshly checked out Jenkins.java are LF). |
The build log tells you what to do. Search 'fix' |
The problem is that the build log of my Linux machine and the build log from ci.jenkins.io differ: On my machine, the problem occurs in the file I have attached a minimal log file that I generated by executing I am aware that I can fix the format violation by running I could fix the format violation on my machine and commit the change, but I ideally want to fix all format violations in one commit. Should I run |
try it with Java 8, spotless will use a newer version on Java 11 |
That took me some time to figure out. Between creating my fork and today, new format rules were introduced in the following PRs: Since my fork only had the old I solved the problem by copying the pom.xml over from the current master branch. I then saw identical format violations on my machine and ci.jenkins.io and could fix them. Changelog entry:When triggering a new build while the build graph is currently being re-computed, jenkins waits for the re-computation to finish. This gurantees that recently updated build triggers are executed. |
To fix the test failure, I first tried to replicate it on my Windows machine. Unfortunately, the test does not fail on my machine :/ Strangely enough other tests fail on my machine that do not fail on ci.jenkion.io. It seems that jenkins can not dispose the test environment and I don't know how my changes could have caused that. Is it possible that disposing the test environment is flaky? Maybe a dumb question, but I want to make sure before investigating further. |
I further looked into the possible causes for the flakiness of the tests. Sometimes the log files cannot be deleted after the test has been run which causes the test to fail. I will explain how this can occur by using In this test, the jenkins instance is shut down while a build is running. Sometimes (this is the reason why the test is flaky) the job is in the cleanup phase ( Due to my changes, the The reason why I could not replicate this problem on my local machine is that it is sensitive to timing. My machine is either too slow or too fast compared to ci.jenkins.io. I can think of three possible solutions:
I believe option 3 might be the best solution, but I need feedback on how to proceed. Two things made it hard for me to make progress on this issue: I followed both https://wiki.jenkins.io/JENKINS/Building-Jenkins.html and https://wiki.jenkins.io/JENKINS/Setting-up-Eclipse-to-build-Jenkins.html but to no success. When executing a unit-test from eclipse I get a I helped myself by starting the test from mvn like so |
Can't really help on your questions, but:
IntelliJ also fails to run Jenkins core tests in general for me at least. I use the same approach you did with |
…ng shutdown to avoid exceptions during JenkinsRule.after
Okay, I added a new method to This method cancels an ongoing dependency graph calculation and thus the flaky tests are once more stable. Canceling dependency graph calculation during shutdown is unproblematic since they are re-calculated during startup anyway. This PR is good to go from my POV. |
@jenkinsci/core-pr-reviewers |
Can I support the review process in any way? |
} | ||
|
||
|
||
public DependencyGraph createNewDependencyGraph() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just inline this call into its callers, we don't want to add this new public API since no one would find it useful. If you need it for testing could you make it package scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't make it package scope. The relevant unit-test is located in jenkins-test, because I need JenkinsRule which is only available in jenkins-test.
By modifying the unit-test, I could do without the method.
Thanks for your feedback.
@@ -3917,6 +3921,20 @@ private void _cleanUpReleaseAllLoggers(List<Throwable> errors) { | |||
} | |||
} | |||
|
|||
private void _cleanUpCancelDependencyGraphCalculation() { | |||
LOGGER.log(Level.FINE, "Canceling internal dependency graph calculation"); | |||
if (scheduledFutureDependencyGraph != null && !scheduledFutureDependencyGraph.isDone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to syncronize the entire block because scheduledFutureDependency graph can become null in between the nullcheck and the isDone call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it. Thank you for your feedback :)
|
||
// Mark that we finished calculating the dependency graph | ||
synchronized (dependencyGraphLock) { | ||
calculatingFutureDependencyGraph = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should not null this out here, we don't nullcheck this for any scheduling purpose.
Not nulling this will allow the above code to probably be simplified to
if (calculatingFutureDependencyGraph != null) {
calculatingFutureDependencyGraph.get();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now making sure that calculatingFutureDependencyGraph never becomes null. This made this null check obsolete and some other checks became obsolete as well and I deleted them. Thank you for your feedback.
…atingFutureDependencyGraph never becomes null. Fixed a synchronization bug in _cleanUpCancelDependencyGraphCalculation(). Slightly altered unit-test, because the behavior of Jenkins.getFutureDependencyGraph() changed when no graph has been built beforehand.
…and subsequently altered the corresponding unit-test
I think the Windows Build failed, because the Windows machine is overwhelmed. I came to this conclusion, because the Windows build on other PRs also failed. |
@res0nance you able to take another look? |
@jglick you know anything about this area? |
Not much, and I am not inclined to learn. At one point I had thought to port |
…onger lock during re-calculation of dependency graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback. Thanks! |
I am afraid this may have caused the regression noted in JENKINS-68189. |
I'm guessing that this is because Maybe if we call the async version with a get this will update the |
Whatever the case, standard procedure would be for this PR to be reverted promptly, and then a separate PR filed as time permits to reintroduce the change along with a fix for the regression and a new functional test covering the broken scenario. |
I second that. I thought I had enough time today to look at the issue, but didn't. When I have time I will both contribute a new test and a fix. |
In the absence of a fix and a new test, I plan to revert this PR on Friday. |
Agreed we should just revert this |
I was able to reproduce the bug on my machine and I am working now on both the test case and a fix. The merge should be reverted because the new bug is much more severe than the bug that I fixed. Steps to reproduce the bug:
This bug does occur even if only one node is used. Analysis:res0nance already found the cause of the bug. After restarting Jenkins, rebuildDependencyGraphAsync() is never triggered and thus the calculatingFutureDependencyGraph remains EMPTY until the dependency graph has been re-computed asynchronously. |
…ph has been updated (jenkinsci#6101)" https://issues.jenkins.io/browse/JENKINS-68189 reports that downstream jobs are no longer executed in Jenkins 2.341. https://issues.jenkins.io/browse/JENKINS-67237 reports that Downstream jobs are not guaranteed to build when the job configs have been updated recently. Comments in jenkinsci#6101 recommend that this pull request be reverted while a fix is being developed. This change is the revert of the pull request. The revert was generated by pressing the "revert" button on the GitHub interface, then using the resulting commit in this pull request. This reverts commit 292382f.
Revert "[JENKINS-67237] `BuildTrigger` waits until the dependency graph has been updated (#6101)"
…ency graph has been updated (jenkinsci#6101)"" This reverts commit f3a32be.
See JENKINS-67237.
Proposed changelog entries
Proposed upgrade guidelines
N/A
Submitter checklist
Proposed changelog entries
section only if there are breaking changes or other changes which may require extra steps from users during the upgradeDesired reviewers
@mention
Maintainer checklist
Before the changes are marked as
ready-for-merge
:Proposed changelog entries
are correctupgrade-guide-needed
label is set and there is aProposed upgrade guidelines
section in the PR title. (example)lts-candidate
to be considered (see query).