[MBUILDCACHE-67] Bugfix artifact restoration error not handled properly #92

kbuntrock · 2023-08-03T21:17:41Z

Description of this pull request

https://issues.apache.org/jira/projects/MBUILDCACHE/issues/MBUILDCACHE-67

Fix artefact restoration error was not executing the regular project build as expected
Fix non-lazy artefact restoration was in fact partially lazy
Added an IT test on this

Draft until PR-MBUILDCACHE-64 is merged since I want to factorize a function in the IT test (see the related todo).

MR checklist

Following this checklist to help us incorporate your
contribution quickly and easily:

Make sure there is a MBUILDCACHE JIRA issue filed
for the change (usually before you start working on it). Trivial changes like typos do not
require a JIRA issue. Your pull request should address just this issue, without
pulling in other changes.
Each commit in the pull request should have a meaningful subject line and body.
Format the pull request title like [MBUILDCACHE-XXX] - Fixes bug in ApproximateQuantiles,
where you replace MBUILDCACHE-XXX with the appropriate JIRA issue. Best practice
is to use the JIRA issue title in the pull request title and in the first line of the
commit message.
Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
Run mvn clean verify to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
In any other case, please file an Apache Individual Contributor License Agreement.

…t build + every restoration was a lazy restore, even if configured otherwise.

…roperly

src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java

maximilian-novikov-db · 2023-08-18T17:20:51Z

src/main/java/org/apache/maven/buildcache/BuildCacheMojosExecutionStrategy.java

                restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig);
-            } else {
+            }
+            if (!restored) {


perhaps we need to execute all mojos in this case including for Clean phase to remove any "garbage" which could be left after an unsuccessful restore

Good point. Restoring in the tmp + mv seems like a more robust approach.

Good idea @maximilian-novikov-db.

@AlexanderAshitkin : do you mean we should change the way we restore artefacts from remote caches? I'm more inclined to the "second clean phase" idea since it covers any restoration problem (in case "clean" was asked in the build). For further changes, I guess it could be done in another PR.

That is discussable:

clean phase has been completed by this time, and repeating it sounds slightly off to me.

Keeping partially restored artifacts is error-prone, and clean might not even be requested.

invoking ad-hoc clean might wipe out other cached data and might be undesirable

I see 2 options here: leave it as is (and rely on the user requested clean), or implement mv logic to minimize the risk of corrupted cache data in the target dir.

Keeping partially restored artifacts is error-prone, and clean might not even be requested.

all mojos except clean are executed again, if it falls back to the "normal" build, i'd say keeping any partially restored data is harmful and may lead to unpredictable result

i suggest to remove the filter here

maven-build-cache-extension/src/main/java/org/apache/maven/buildcache/BuildCacheMojosExecutionStrategy.java

Line 137 in 4d53a0a

|| lifecyclePhasesHelper.isLaterPhaseThanClean(mojoExecution.getLifecyclePhase())) {

and not to add Clean if it's not requested by user

I agree that seeing a second clean phase seems a bit "off" but I could be in piece with it since it is a abnormal edge case (and we will write a beautiful log to explain it. :p)

Trying to summarize the exchange in order to check if we have a consensus in a situation where the cache fails the restoration

the build was requested without clean phase : we do not change the actual behaviour.

the build was requested with a clean phase : an extra clean phase will be executed before the actual behaviour (= the whole build is executed).

We change generically the log "Cannot restore project artifacts, continuing with non cached build" by :
"Cannot restore project artifacts, continuing with non cached build (clean phase will be re-executed if requested)".

And in another PR, the remote download could be enhanced by doing it in two steps : downloading in a temp directory + moving it
Reducing the risk of leftovers files.

Some clarifications about the current implementation:

Logic of rerunning on failures is already within the BuildCacheMojosExecutionStrategy#restoreProject:

if (!restored) { // cleanup partial state project.getArtifact().setFile(null); project.getArtifact().setResolved(false); mojoListener.remove(project); // build as usual for (MojoExecution mojoExecution : cachedSegment) { <<< rebuilding everything mojoExecutionRunner.run(mojoExecution); } }

!restored condition should be read as "at least part of the project were rebuilt, not restored."

It doesn't mean an error automatically. It means something was rebuilt (at least in the design). 100% successful completion of the restoreProject should return false if it was a non-cached path and the build state differs from the cached one.

So, in all, it feels like we might need fixes within the restoreProject:

to implement proper clean phase

Add a verification step to the "forced" steps - to force cache update on the detected inconsistencies

Make sure that in case of additional executions ("post-cached" or similar), we return false to update the cache.

@AlexanderAshitkin : Sorry, did not had time to investigate this the past week, hope I'll have a bit more time on this one.

Alright, finally had some time to work on what you pointed out @AlexanderAshitkin : regular build after a failed try to use the cache was handled differently depending the situation.
I will add comments on some lines of this commit to improve your ability to review it effectively. But here are some global points :

I moved the part where we check cached mojos consistency before artifact restoration. There are two benefits : this steps is faster than restoration over the network. So if we can't use the cache because of mojos parameters, better to know it before downloading artifacts. And secondly: there are no more "things" to rollback since setting artifacts of the project is done at the very end of the restoration.

I kept the regular build execution outside the "restoreProject" function.

I execute an extra clean phase only if two conditions are met : clean phase was requested in the build + there are potentially files restored in the project directory (it does not happen if we don't restore generated resources, "outputArtifacts" or if an error occurs before)

I'm removing the draft on this PR since I don't want anymore to link it with my other PR. The duplicated 3 line function used in IT test can be remove in a very small 3rd PR.

AlexanderAshitkin · 2023-08-18T17:50:45Z

src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java

        if (!cacheConfig.isLazyRestore()) {
            downloadTask.run();
+            try {
+                downloadTask.get();


Hi Kevin. Please elaborate, what are the benefits of implementing this. Because drawbacks are apparent - cached artifacts not used in the actual build might fail the cache now (exotic classifiers, etc.). If that starts happening, it is a more serious issue than a delayed error. The change has its own merits, but the default behavior should be the most reasonable considering all the cases.

Hello Alexander, nice to meet you. :)

As I apprehend the "lazy restore" functionality, there are two approach, each one with their pros and cons.

Non lazy restore (the default) : immediate restoration of artefact and other build outputs.
Build stability is the main concern. If the restoration fails, the build is executed normally. (I'm interrogative about the part where you say it can fail the build. For me it's the exact opposite intend). Con: we might restore stuff not used further in another module build.

Lazy restore : restoration of artefacts is done only when used. It can save some bandwidth and speedup the build when used with a remote cache. (Pros with a local cache are not so clear to me.) Con: If the cache is corrupted/not accessible and an artefact must be used as a dependency in another module : the build is doomed and will fail.

Actually, the non lazy restore does not really work as expected since the restore is done (download is immediately launched when using a remote cache), but we don't check the result straight away. It might be far too late when we'll discover the artefact did not restored well.

Am I missing something in the picture?

Hi. The main question is whether or not to fail the cache when unused (in reactor) artifacts cannot be downloaded. I’m trying to understand what are the benefits of failing the cache. Does it solve any practical issues? Because in terms of the cache hit rate, this change brings regression.

@kbuntrock could you please move this change to a separate pr

@maximilian-novikov-db : I would rather not since I consider this as part of the bug I am willing to fix.

Here are some other arguments.

The documentation state the following in the "performance" page :

Use a lazy restore

By default, the cache tries to restore all artifacts for a project preemptively. Lazy restore could give significant time by avoiding requesting and downloading unnecessary artifacts from the cache. It is beneficial when small changes are a dominating build pattern. Use command line flag:

-Dmaven.build.cache.lazyRestore=true";

In cache corruption situations, the lazy cache cannot support fallback to normal execution. It will fail instead. To heal the corrupted cache, manually remove corrupted cache entries or force cache rewrite.

Implied by this piece of documentation : Contrary to the lazy restore, the pre-emptive default restore option supports fallback. This is actually not the case for artifacts, the most important build outputs.
Alongside, restoration of generated source and other attached outputs works as expected : any cache corruption will trigger a regular build.
I have a strong feeling we are just in presence of a simple implementation oversight I'm fixing here : calling "run" on a FutureTask does the restoration but do no throw any exception upon failure. It has to be checked via a get method.

@AlexanderAshitkin
To complete a bit more this question :

Does it solve any practical issues?

Yes. Actually, if a corrupted/non downloadable artefact is used in a dependant module while we have not used the "lazy restore" functionality : the build will fail.
Worst case before this fix : the build fails
Worst case after this fix : we do a regular build.

Seems pretty important that one should be able to choose stability over rapidity.

I agree that keeping the fallback is important. Thanks for the clarifications! We can add such optimization in the future by tracking artifact usage downstream.

AlexanderAshitkin · 2023-08-18T17:52:45Z

src/main/java/org/apache/maven/buildcache/BuildCacheMojosExecutionStrategy.java

                restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig);
-            } else {
+            }
+            if (!restored) {


Good point. Restoring in the tmp + mv seems like a more robust approach.

AlexanderAshitkin · 2023-08-18T18:12:07Z

src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java

                    .adjustArchiveArtifactVersion(project, originalVersion, artifactFile)
                    .toFile();
        });
        if (!cacheConfig.isLazyRestore()) {


Looks like a more relevant name for the original intent is lazyDownload.

"LazyRestore" seems fine to me. I would instead replace "download" by "restore" since there is no downloading involved when working with a local cache. But it's not something I will do in this PR. :P

…oration error or cached mojo inconsistency

kbuntrock · 2023-08-30T19:40:43Z

src/main/java/org/apache/maven/buildcache/xml/CacheState.java

 * CacheState
 */
 public enum CacheState {
-    NOT_INITIALIZED,


This enum value was not used.

src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java

kbuntrock · 2023-08-30T19:55:27Z

src/main/java/org/apache/maven/buildcache/MojoParametersListener.java

        return projectExecutions.get(project);
    }
-
-    public void remove(MavenProject project) {


If I understood well, this was used to "reset" the project mojo's execution spying after a "rollback" from using the cache to a regular build execution.
I'm not anymore in a position where I need this function so I deleted it. But I could use your careful reviewing on it @maximilian-novikov-db (I see you name on the oldest commit of the class :p )

AlexanderAshitkin

It looks good to me. I added a few comments for potential optimizations, but those enhancements are out of the scope.

AlexanderAshitkin · 2023-08-30T22:11:08Z

src/main/java/org/apache/maven/buildcache/BuildCacheMojosExecutionStrategy.java

+            LOGGER.debug("Execute mojos post cache segment");
            List<MojoExecution> postCachedSegment =
                    lifecyclePhasesHelper.getPostCachedSegment(project, mojoExecutions, build);
            for (MojoExecution mojoExecution : postCachedSegment) {


@kbuntrock, it is out of scope, but please notice two edge cases here:

A post-cached segment is possible if initially the build was cached at, say package phase and the current build runs a later phase, say install. In that case, the best behavior is to update the cached build with the more complete version, which won't happen in the SUCCESS case - maybe we need a separate status.

The same for forced executions - even if the execution is forced, the mojo configuration could be not consistent with the cached one. If it is not consistent, it could be desirable to update the cache. We shouldn't update the cache on discrepancies only if configs prohibit it.

kbuntrock · 2023-09-26T15:16:57Z

Hello there,

A little up on this old-timer PR. If I'm correct I fixed every mandatory request.

Any other thought?

cowwoc · 2025-02-01T03:38:10Z

I think I found a possible regression of this issue. Please take a look at https://issues.apache.org/jira/browse/MNG-8546

jira-importer · 2025-06-09T10:59:34Z

Resolve #330

jira-importer · 2025-06-09T11:08:02Z

Resolve #330

kbuntrock force-pushed the MBUILDCACHE-67-artefact-restoration-errors branch from 3f48f0d to 943e47d Compare August 3, 2023 21:30

kbuntrock added 2 commits August 3, 2023 23:35

[MBUILDCACHE-67] Fix restoration error not starting the normal projec…

8d5de9f

…t build + every restoration was a lazy restore, even if configured otherwise.

[MBUILDCACHE-67] IT test to check if a restoration error is handled p…

4d53a0a

…roperly

kbuntrock force-pushed the MBUILDCACHE-67-artefact-restoration-errors branch from 943e47d to 4d53a0a Compare August 3, 2023 21:36

olamy added the bug Something isn't working label Aug 4, 2023

olamy reviewed Aug 4, 2023

View reviewed changes

src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java Outdated Show resolved Hide resolved

maximilian-novikov-db reviewed Aug 18, 2023

View reviewed changes

AlexanderAshitkin reviewed Aug 18, 2023

View reviewed changes

[MBUILDCACHE-67] Standardize regular build execution after cache rest…

d7169c3

…oration error or cached mojo inconsistency

kbuntrock marked this pull request as ready for review August 30, 2023 19:39

[MBUILDCACHE-67] Removed * import

3d1c908

kbuntrock commented Aug 30, 2023

View reviewed changes

AlexanderAshitkin approved these changes Aug 30, 2023

View reviewed changes

AlexanderAshitkin reviewed Aug 30, 2023

View reviewed changes

maximilian-novikov-db approved these changes Sep 28, 2023

View reviewed changes

olamy changed the title ~~[MBUILDCACHE-67] Bugfix artefact restoration error not handled properly~~ [MBUILDCACHE-67] Bugfix artifact restoration error not handled properly Oct 12, 2023

olamy merged commit e85610d into apache:master Oct 12, 2023

kbuntrock mentioned this pull request Nov 2, 2023

[MBUILDCACHE-86] bugfix / enhancements restoration of outputs on disk #104

Merged

8 tasks

[MBUILDCACHE-67] Bugfix artifact restoration error not handled properly #92

[MBUILDCACHE-67] Bugfix artifact restoration error not handled properly #92

Uh oh!

Conversation

kbuntrock commented Aug 3, 2023

Description of this pull request

MR checklist

Uh oh!

Uh oh!

maximilian-novikov-db Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbuntrock Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximilian-novikov-db Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbuntrock Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbuntrock Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbuntrock Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin left a comment

Choose a reason for hiding this comment

Uh oh!

AlexanderAshitkin Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

maximilian-novikov-db Aug 18, 2023 •

edited

Loading

kbuntrock Aug 19, 2023 •

edited

Loading

AlexanderAshitkin Aug 21, 2023 •

edited

Loading

maximilian-novikov-db Aug 21, 2023 •

edited

Loading

AlexanderAshitkin Aug 23, 2023 •

edited

Loading

kbuntrock Aug 30, 2023 •

edited

Loading

AlexanderAshitkin Aug 18, 2023 •

edited

Loading

kbuntrock Aug 19, 2023 •

edited

Loading

AlexanderAshitkin Aug 21, 2023 •

edited

Loading

kbuntrock Aug 22, 2023 •

edited

Loading

AlexanderAshitkin Aug 22, 2023 •

edited

Loading

AlexanderAshitkin Aug 30, 2023 •

edited

Loading