[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101

Si-So · 2021-12-23T10:51:15Z

See JENKINS-67237.

Proposed changelog entries

When triggering a new build while the build graph is currently being re-computed, jenkins waits for the re-computation to finish. This guarantees that recently updated build triggers are executed.

Proposed upgrade guidelines

N/A

Submitter checklist

(If applicable) Jira issue is well described
Changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developer, depending on the change). Examples
- Fill-in the Proposed changelog entries section only if there are breaking changes or other changes which may require extra steps from users during the upgrade
Appropriate autotests or explanation to why this change has no tests
For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@mention

Maintainer checklist

Before the changes are marked as ready-for-merge:

There are at least 2 approvals for the pull request and no outstanding requests for change
Conversations in the pull request are over OR it is explicit that a reviewer does not block the change
Changelog entries in the PR title and/or Proposed changelog entries are correct
Proper changelog labels are set so that the changelog can be generated automatically
If the change needs additional upgrade steps from users, upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the PR title. (example)
If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

…ependency graph

…ation

oleg-nenashev · 2021-12-24T00:45:05Z

It will definitely need to be added to the changelog. Seem to be rather a bug than an enhancement

Si-So · 2022-01-04T13:01:06Z

I am assuming that it is my responsibility to fix the failing check. Unfortunately, I was unable to reproduce the error message both on my Windows and Linux machine and therefore I can not prepare a fix.
Here's my setup:
JDK 11
maven 3.8.4
maven-checkstyle-plugin:3.1.2

I suspect that the problem might be related to control characters. When doing a fresh git checkout on my Linux machine, I can not see anything wrong with it (e.g.: all line endings of a freshly checked out Jenkins.java are LF).

timja · 2022-01-04T13:57:04Z

The build log tells you what to do. Search 'fix'

Si-So · 2022-01-04T15:47:26Z

The problem is that the build log of my Linux machine and the build log from ci.jenkins.io differ:
On ci.jenkins.io the problem occurs in the file Jenkins.java in the reactor "Jenkins core".

On my machine, the problem occurs in the file JenkinsFutureDependencyGraphTest.java in the reactor "Tests for Jenkins core".

I have attached a minimal log file that I generated by executing mvn com.diffplug.spotless:spotless-maven-plugin:3.1.2:check. Just to be sure, I also attached a full log-file which I generated by executing mvn --batch-mode --show-version --errors --no-transfer-progress -Pdebug -Pjapicmp --update-snapshots -Dmaven.test.failure.ignore -Dspotbugs.failOnError=false -Dcheckstyle.failOnViolation=false -Dset.changelist help:evaluate -Dexpression=changelist --log-file ../maven-log.txt -DskipTests clean install

I am aware that I can fix the format violation by running mvn spotless:apply, but since my machine complains about a different format violation than ci.jenkins.io, I assume that the error will persist. I believe ci.jenkins.io did not find the format violations that my machine found, because it aborted the build earlier.

I could fix the format violation on my machine and commit the change, but I ideally want to fix all format violations in one commit.

Should I run mvn spotless:apply locally, create a commit and push it, even though I do not believe this will fix the format violation found by ci.jenkins.io?

maven-full-log.txt
maven-short-log.txt

timja · 2022-01-04T20:50:30Z

try it with Java 8, spotless will use a newer version on Java 11

Si-So · 2022-01-06T11:49:49Z

That took me some time to figure out.

Between creating my fork and today, new format rules were introduced in the following PRs:
#6149
#6147

Since my fork only had the old pom.xml, my machines did not know about the new format rules, whereas ci.jenkins.io knew.

I solved the problem by copying the pom.xml over from the current master branch. I then saw identical format violations on my machine and ci.jenkins.io and could fix them.

Changelog entry:

When triggering a new build while the build graph is currently being re-computed, jenkins waits for the re-computation to finish. This gurantees that recently updated build triggers are executed.

Si-So · 2022-01-07T14:02:33Z

To fix the test failure, I first tried to replicate it on my Windows machine. Unfortunately, the test does not fail on my machine :/

Strangely enough other tests fail on my machine that do not fail on ci.jenkion.io.

It seems that jenkins can not dispose the test environment and I don't know how my changes could have caused that.

Is it possible that disposing the test environment is flaky? Maybe a dumb question, but I want to make sure before investigating further.

Si-So · 2022-02-09T16:27:17Z

I further looked into the possible causes for the flakiness of the tests. Sometimes the log files cannot be deleted after the test has been run which causes the test to fail. I will explain how this can occur by using hudson.model.QueueTest.persistenceBlockedItem as an example.

In this test, the jenkins instance is shut down while a build is running. Sometimes (this is the reason why the test is flaky) the job is in the cleanup phase (AbstractBuildExecution.cleanUp) in which the possible downstream projects are determined (BuildTrigger.execute).

Due to my changes, the BuildTrigger.execute method can now take much longer than before since it is waiting for the calculation of the dependency graph to finish (the calculation of the dependency graph was started by the creation of a job earlier in this test). In BuildTrigger.execute a write to the log file occurs. When the shutdown occurs during writing to the log file, the JenkinsRule cannot delete the log file (since it is still being written to) and the test fails.

The reason why I could not replicate this problem on my local machine is that it is sensitive to timing. My machine is either too slow or too fast compared to ci.jenkins.io.

I can think of three possible solutions:

Modify the flaky tests. This can be done in multiple ways. One is to wait until the dependency graph calculation has finished (I fixed one of the flaky tests in this way in my last commit).
Modify the JenkinsRule.after method in such a way that it waits until the log file is no longer being written to.
Change the business logic of either BuildTrigger.execute or Jenkins.cleanUp: One solution would be to wait in Jenkins.cleanUp to cancel all dependency graph calculations that are still ongoing.

I believe option 3 might be the best solution, but I need feedback on how to proceed.

Two things made it hard for me to make progress on this issue:
a) I cannot reproduce the problem on my machine. I think this is intrinsic to the problem and cannot be fixed :'(
b) I cannot start Unit-Test from eclipse. I think this can be fixed, but I don't know how. Help is greatly appreciated.

I followed both https://wiki.jenkins.io/JENKINS/Building-Jenkins.html and https://wiki.jenkins.io/JENKINS/Setting-up-Eclipse-to-build-Jenkins.html but to no success. When executing a unit-test from eclipse I get a java.lang.ClassNotFoundException: hudson.model.QueueTest exception.

I helped myself by starting the test from mvn like so mvn -Dmaven.surefire.debug -Dtest=QueueTest#persistenceBlockedItem test and then connecting via remote debugging. Another annoying thing that happens when I go this route is that eclipse is confused by the presence of two files named QueueTest and always picks the wrong one when halting at a breakpoint...

timja · 2022-02-09T16:52:53Z

Can't really help on your questions, but:

I helped myself by starting the test from mvn like so mvn -Dmaven.surefire.debug -Dtest=QueueTest#persistenceBlockedItem test and then connecting via remote debugging. Another annoying thing that happens when I go this route is that eclipse is confused by the presence of two files named QueueTest and always picks the wrong one when halting at a breakpoint...

IntelliJ also fails to run Jenkins core tests in general for me at least. I use the same approach you did with -Dmaven.surefire.debug and mvn test, although not hit the other issue you mentioned.

…ng shutdown to avoid exceptions during JenkinsRule.after

Si-So · 2022-02-22T09:16:25Z

Okay, I added a new method to Jenkins called _cleanUpCancelDependencyGraphCalculation().

This method cancels an ongoing dependency graph calculation and thus the flaky tests are once more stable. Canceling dependency graph calculation during shutdown is unproblematic since they are re-calculated during startup anyway.

This PR is good to go from my POV.

Si-So · 2022-02-24T08:24:17Z

@jenkinsci/core-pr-reviewers

Si-So · 2022-03-18T15:16:03Z

Can I support the review process in any way?

res0nance · 2022-03-20T04:39:33Z

core/src/main/java/jenkins/model/Jenkins.java

+    }
+
+
+    public DependencyGraph createNewDependencyGraph() {


Probably just inline this call into its callers, we don't want to add this new public API since no one would find it useful. If you need it for testing could you make it package scope?

I couldn't make it package scope. The relevant unit-test is located in jenkins-test, because I need JenkinsRule which is only available in jenkins-test.

By modifying the unit-test, I could do without the method.

Thanks for your feedback.

res0nance · 2022-03-20T04:48:07Z

core/src/main/java/jenkins/model/Jenkins.java

@@ -3917,6 +3921,20 @@ private void _cleanUpReleaseAllLoggers(List<Throwable> errors) {
        }
    }

+    private void _cleanUpCancelDependencyGraphCalculation() {
+        LOGGER.log(Level.FINE, "Canceling internal dependency graph calculation");
+        if (scheduledFutureDependencyGraph != null && !scheduledFutureDependencyGraph.isDone()) {


I think you have to syncronize the entire block because scheduledFutureDependency graph can become null in between the nullcheck and the isDone call

I fixed it. Thank you for your feedback :)

res0nance · 2022-03-20T04:55:45Z

core/src/main/java/jenkins/model/Jenkins.java

+
+            // Mark that we finished calculating the dependency graph
+            synchronized (dependencyGraphLock) {
+                calculatingFutureDependencyGraph = null;


We probably should not null this out here, we don't nullcheck this for any scheduling purpose.

Not nulling this will allow the above code to probably be simplified to

if (calculatingFutureDependencyGraph != null) { calculatingFutureDependencyGraph.get(); }

I am now making sure that calculatingFutureDependencyGraph never becomes null. This made this null check obsolete and some other checks became obsolete as well and I deleted them. Thank you for your feedback.

…atingFutureDependencyGraph never becomes null. Fixed a synchronization bug in _cleanUpCancelDependencyGraphCalculation(). Slightly altered unit-test, because the behavior of Jenkins.getFutureDependencyGraph() changed when no graph has been built beforehand.

…and subsequently altered the corresponding unit-test

Si-So · 2022-03-24T08:40:49Z

I think the Windows Build failed, because the Windows machine is overwhelmed. I came to this conclusion, because the Windows build on other PRs also failed.

timja · 2022-03-27T08:08:03Z

@res0nance you able to take another look?

timja · 2022-03-27T08:09:25Z

@jglick you know anything about this area?

jglick · 2022-03-27T15:06:52Z

you know anything about this area?

Not much, and I am not inclined to learn. At one point I had thought to port DependencyGraph to Pipeline (JENKINS-29913) but it never seemed pressing. The DependencyGraph system is rather complex—something that probably ought to have been done in a plugin, but would be difficult to split now due to old API design choices. Many (though certainly not all) use cases can be expressed more clearly and directly in Pipeline using control flow constructs within a single job with multiple stages; or by having an orchestration job use build steps to launch other jobs at specific times under defined conditions. For the cases where dependency relationships between independent and autonomous jobs are most appropriate, another option is to use triggers based on external events (such as deployments to artifact repositories); or domain-specific features more sensitive to the nuances of particular build systems, like https://github.com/jenkinsci/pipeline-maven-plugin#feature-trigger-downstream.

core/src/main/java/jenkins/model/Jenkins.java

…onger lock during re-calculation of dependency graph.

res0nance

LGTM!

timja · 2022-03-28T12:29:58Z

This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

basil · 2022-04-05T16:00:14Z

I am afraid this may have caused the regression noted in JENKINS-68189.

res0nance · 2022-04-06T13:24:31Z

I'm guessing that this is because rebuildDependencyGraph is called here
https://github.com/Si-So/jenkins/blob/a83e40d3cceab702b8024710adcd29d4cac7f1f0/core/src/main/java/jenkins/model/Jenkins.java#L3434

Maybe if we call the async version with a get this will update the calculatingFutureDependencyGraph which at the start holds EMPTY

jglick · 2022-04-06T14:11:10Z

Whatever the case, standard procedure would be for this PR to be reverted promptly, and then a separate PR filed as time permits to reintroduce the change along with a fix for the regression and a new functional test covering the broken scenario.

Si-So · 2022-04-06T14:24:48Z

I second that. I thought I had enough time today to look at the issue, but didn't.

When I have time I will both contribute a new test and a fix.

basil · 2022-04-06T14:28:30Z

In the absence of a fix and a new test, I plan to revert this PR on Friday.

res0nance · 2022-04-06T14:47:43Z

Whatever the case, standard procedure would be for this PR to be reverted promptly, and then a separate PR filed as time permits to reintroduce the change along with a fix for the regression and a new functional test covering the broken scenario.

Agreed we should just revert this

Si-So · 2022-04-07T09:54:46Z

I was able to reproduce the bug on my machine and I am working now on both the test case and a fix. The merge should be reverted because the new bug is much more severe than the bug that I fixed.

Steps to reproduce the bug:

Start a Jenkins instance.
Create Job_A.
Create Job_B. Job_B is triggered by Job_A.
Restart Jenkins.
A build of Job_B is no longer triggered after Job_A has been built.

This bug does occur even if only one node is used.

Analysis:

res0nance already found the cause of the bug. After restarting Jenkins, rebuildDependencyGraphAsync() is never triggered and thus the calculatingFutureDependencyGraph remains EMPTY until the dependency graph has been re-computed asynchronously.

…ph has been updated (#6101)" This reverts commit 292382f.

…ph has been updated (jenkinsci#6101)" https://issues.jenkins.io/browse/JENKINS-68189 reports that downstream jobs are no longer executed in Jenkins 2.341. https://issues.jenkins.io/browse/JENKINS-67237 reports that Downstream jobs are not guaranteed to build when the job configs have been updated recently. Comments in jenkinsci#6101 recommend that this pull request be reverted while a fix is being developed. This change is the revert of the pull request. The revert was generated by pressing the "revert" button on the GitHub interface, then using the resulting commit in this pull request. This reverts commit 292382f.

Revert "[JENKINS-67237] `BuildTrigger` waits until the dependency graph has been updated (#6101)"

…ency graph has been updated (jenkinsci#6101)"" This reverts commit f3a32be.

Si-So added 3 commits December 23, 2021 11:33

[JENKINS-67237] Added a method in Jenkins to return the most recent d…

34e3663

…ependency graph

[JENKINS-67237] Created test cases for future dependency graph calcul…

8f5e1e6

…ation

[JENKINS-67237] Renamed variables

8d6d55e

oleg-nenashev requested review from a team and oleg-nenashev December 24, 2021 00:43

oleg-nenashev added the bug For changelog: Minor bug. Will be listed after features label Dec 24, 2021

[JENKINS-67237] Fixed format violations

fceb3cc

[JENKINS-67237] Fixed another format violation

e967f2d

timja changed the title ~~[Jenkins 67237] BuildTrigger waits until the dependency graph has been updated~~ [JENKINS-67237] BuildTrigger waits until the dependency graph has been updated Jan 6, 2022

[JENKINS-67237] Attempt to fix a broken test

525c438

[JENKINS-67237]: Dependency graph calculations are now cancelled duri…

ff0f97b

…ng shutdown to avoid exceptions during JenkinsRule.after

timja requested a review from a team February 24, 2022 08:45

res0nance reviewed Mar 20, 2022

View reviewed changes

Si-So added 2 commits March 23, 2022 09:35

[JENKINS-67237]: Removed unnecessary public method from Jenkins.java …

9af42ce

…and subsequently altered the corresponding unit-test

Si-So requested a review from res0nance March 24, 2022 08:37

timja requested a review from jglick March 27, 2022 08:09

jglick removed their request for review March 27, 2022 14:56

res0nance reviewed Mar 27, 2022

View reviewed changes

core/src/main/java/jenkins/model/Jenkins.java Outdated Show resolved Hide resolved

core/src/main/java/jenkins/model/Jenkins.java Outdated Show resolved Hide resolved

[JENKINS-67237]: scheduleCalculationOfFutureDependencyGraph does no l…

a83e40d

…onger lock during re-calculation of dependency graph.

Si-So requested a review from res0nance March 28, 2022 07:42

res0nance approved these changes Mar 28, 2022

View reviewed changes

timja approved these changes Mar 28, 2022

View reviewed changes

timja added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Mar 28, 2022

basil approved these changes Mar 29, 2022

View reviewed changes

basil merged commit 292382f into jenkinsci:master Mar 29, 2022

MarkEWaite added a commit that referenced this pull request Apr 7, 2022

Revert "[JENKINS-67237] BuildTrigger waits until the dependency gra…

64133be

…ph has been updated (#6101)" This reverts commit 292382f.

MarkEWaite mentioned this pull request Apr 7, 2022

Revert "BuildTrigger waits until the dependency graph has been updated (#6101)" #6447

Merged

11 tasks

MarkEWaite added a commit that referenced this pull request Apr 8, 2022

Merge pull request #6447 from MarkEWaite/revert-6101-JENKINS-67237

c46f968

Revert "[JENKINS-67237] `BuildTrigger` waits until the dependency graph has been updated (#6101)"

Si-So added a commit to Si-So/jenkins that referenced this pull request Apr 8, 2022

Revert "Revert "[JENKINS-67237] BuildTrigger waits until the depend…

ce4641c

…ency graph has been updated (jenkinsci#6101)"" This reverts commit f3a32be.

Si-So mentioned this pull request Apr 8, 2022

[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated - 2nd attempt #6450

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101

[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101

Si-So commented Dec 23, 2021 •

edited by MarkEWaite

Loading

oleg-nenashev commented Dec 24, 2021

Si-So commented Jan 4, 2022

timja commented Jan 4, 2022

Si-So commented Jan 4, 2022

timja commented Jan 4, 2022

Si-So commented Jan 6, 2022

Si-So commented Jan 7, 2022

Si-So commented Feb 9, 2022

timja commented Feb 9, 2022

Si-So commented Feb 22, 2022

Si-So commented Feb 24, 2022

Si-So commented Mar 18, 2022

res0nance Mar 20, 2022

Si-So Mar 23, 2022

res0nance Mar 20, 2022

Si-So Mar 23, 2022

res0nance Mar 20, 2022

Si-So Mar 23, 2022

Si-So commented Mar 24, 2022

timja commented Mar 27, 2022

timja commented Mar 27, 2022

jglick commented Mar 27, 2022

res0nance left a comment

timja commented Mar 28, 2022

basil commented Apr 5, 2022

res0nance commented Apr 6, 2022

jglick commented Apr 6, 2022

Si-So commented Apr 6, 2022

basil commented Apr 6, 2022

res0nance commented Apr 6, 2022

Si-So commented Apr 7, 2022

[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101

[JENKINS-67237] BuildTrigger waits until the dependency graph has been updated #6101

Conversation

Si-So commented Dec 23, 2021 • edited by MarkEWaite Loading

Proposed changelog entries

Proposed upgrade guidelines

Submitter checklist

Desired reviewers

Maintainer checklist

oleg-nenashev commented Dec 24, 2021

Si-So commented Jan 4, 2022

timja commented Jan 4, 2022

Si-So commented Jan 4, 2022

timja commented Jan 4, 2022

Si-So commented Jan 6, 2022

Changelog entry:

Si-So commented Jan 7, 2022

Si-So commented Feb 9, 2022

timja commented Feb 9, 2022

Si-So commented Feb 22, 2022

Si-So commented Feb 24, 2022

Si-So commented Mar 18, 2022

res0nance Mar 20, 2022

Choose a reason for hiding this comment

Si-So Mar 23, 2022

Choose a reason for hiding this comment

res0nance Mar 20, 2022

Choose a reason for hiding this comment

Si-So Mar 23, 2022

Choose a reason for hiding this comment

res0nance Mar 20, 2022

Choose a reason for hiding this comment

Si-So Mar 23, 2022

Choose a reason for hiding this comment

Si-So commented Mar 24, 2022

timja commented Mar 27, 2022

timja commented Mar 27, 2022

jglick commented Mar 27, 2022

res0nance left a comment

Choose a reason for hiding this comment

timja commented Mar 28, 2022

basil commented Apr 5, 2022

res0nance commented Apr 6, 2022

jglick commented Apr 6, 2022

Si-So commented Apr 6, 2022

basil commented Apr 6, 2022

res0nance commented Apr 6, 2022

Si-So commented Apr 7, 2022

Steps to reproduce the bug:

Analysis:

Si-So commented Dec 23, 2021 •

edited by MarkEWaite

Loading