-
Notifications
You must be signed in to change notification settings - Fork 59
[MBUILDCACHE-67] Bugfix artifact restoration error not handled properly #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MBUILDCACHE-67] Bugfix artifact restoration error not handled properly #92
Conversation
3f48f0d to
943e47d
Compare
…t build + every restoration was a lazy restore, even if configured otherwise.
943e47d to
4d53a0a
Compare
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java
Outdated
Show resolved
Hide resolved
| restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig); | ||
| } else { | ||
| } | ||
| if (!restored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we need to execute all mojos in this case including for Clean phase to remove any "garbage" which could be left after an unsuccessful restore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Restoring in the tmp + mv seems like a more robust approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea @maximilian-novikov-db.
@AlexanderAshitkin : do you mean we should change the way we restore artefacts from remote caches? I'm more inclined to the "second clean phase" idea since it covers any restoration problem (in case "clean" was asked in the build). For further changes, I guess it could be done in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is discussable:
cleanphase has been completed by this time, and repeating it sounds slightly off to me.- Keeping partially restored artifacts is error-prone, and
cleanmight not even be requested. - invoking ad-hoc
cleanmight wipe out other cached data and might be undesirable
I see 2 options here: leave it as is (and rely on the user requested clean), or implement mv logic to minimize the risk of corrupted cache data in the target dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Keeping partially restored artifacts is error-prone, and
cleanmight not even be requested.
all mojos except clean are executed again, if it falls back to the "normal" build, i'd say keeping any partially restored data is harmful and may lead to unpredictable result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest to remove the filter here
Line 137 in 4d53a0a
| || lifecyclePhasesHelper.isLaterPhaseThanClean(mojoExecution.getLifecyclePhase())) { |
and not to add Clean if it's not requested by user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that seeing a second clean phase seems a bit "off" but I could be in piece with it since it is a abnormal edge case (and we will write a beautiful log to explain it. :p)
Trying to summarize the exchange in order to check if we have a consensus in a situation where the cache fails the restoration
- the build was requested without clean phase : we do not change the actual behaviour.
- the build was requested with a clean phase : an extra clean phase will be executed before the actual behaviour (= the whole build is executed).
We change generically the log "Cannot restore project artifacts, continuing with non cached build" by :
"Cannot restore project artifacts, continuing with non cached build (clean phase will be re-executed if requested)".
And in another PR, the remote download could be enhanced by doing it in two steps : downloading in a temp directory + moving it
Reducing the risk of leftovers files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clarifications about the current implementation:
- Logic of rerunning on failures is already within the BuildCacheMojosExecutionStrategy#restoreProject:
if (!restored) {
// cleanup partial state
project.getArtifact().setFile(null);
project.getArtifact().setResolved(false);
mojoListener.remove(project);
// build as usual
for (MojoExecution mojoExecution : cachedSegment) { <<< rebuilding everything
mojoExecutionRunner.run(mojoExecution);
}
}!restoredcondition should be read as "at least part of the project were rebuilt, not restored."- It doesn't mean an error automatically. It means something was rebuilt (at least in the design). 100% successful completion of the
restoreProjectshould returnfalseif it was anon-cachedpath and the build state differs from the cached one.
So, in all, it feels like we might need fixes within the restoreProject:
- to implement proper clean phase
- Add a verification step to the "forced" steps - to force cache update on the detected inconsistencies
- Make sure that in case of additional executions ("post-cached" or similar), we return
falseto update the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexanderAshitkin : Sorry, did not had time to investigate this the past week, hope I'll have a bit more time on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, finally had some time to work on what you pointed out @AlexanderAshitkin : regular build after a failed try to use the cache was handled differently depending the situation.
I will add comments on some lines of this commit to improve your ability to review it effectively. But here are some global points :
- I moved the part where we check cached mojos consistency before artifact restoration. There are two benefits : this steps is faster than restoration over the network. So if we can't use the cache because of mojos parameters, better to know it before downloading artifacts. And secondly: there are no more "things" to rollback since setting artifacts of the project is done at the very end of the restoration.
- I kept the regular build execution outside the "restoreProject" function.
- I execute an extra clean phase only if two conditions are met : clean phase was requested in the build + there are potentially files restored in the project directory (it does not happen if we don't restore generated resources, "outputArtifacts" or if an error occurs before)
I'm removing the draft on this PR since I don't want anymore to link it with my other PR. The duplicated 3 line function used in IT test can be remove in a very small 3rd PR.
| if (!cacheConfig.isLazyRestore()) { | ||
| downloadTask.run(); | ||
| try { | ||
| downloadTask.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kevin. Please elaborate, what are the benefits of implementing this. Because drawbacks are apparent - cached artifacts not used in the actual build might fail the cache now (exotic classifiers, etc.). If that starts happening, it is a more serious issue than a delayed error. The change has its own merits, but the default behavior should be the most reasonable considering all the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Alexander, nice to meet you. :)
As I apprehend the "lazy restore" functionality, there are two approach, each one with their pros and cons.
Non lazy restore (the default) : immediate restoration of artefact and other build outputs.
Build stability is the main concern. If the restoration fails, the build is executed normally. (I'm interrogative about the part where you say it can fail the build. For me it's the exact opposite intend). Con: we might restore stuff not used further in another module build.
Lazy restore : restoration of artefacts is done only when used. It can save some bandwidth and speedup the build when used with a remote cache. (Pros with a local cache are not so clear to me.) Con: If the cache is corrupted/not accessible and an artefact must be used as a dependency in another module : the build is doomed and will fail.
Actually, the non lazy restore does not really work as expected since the restore is done (download is immediately launched when using a remote cache), but we don't check the result straight away. It might be far too late when we'll discover the artefact did not restored well.
Am I missing something in the picture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. The main question is whether or not to fail the cache when unused (in reactor) artifacts cannot be downloaded. I’m trying to understand what are the benefits of failing the cache. Does it solve any practical issues? Because in terms of the cache hit rate, this change brings regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbuntrock could you please move this change to a separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maximilian-novikov-db : I would rather not since I consider this as part of the bug I am willing to fix.
Here are some other arguments.
The documentation state the following in the "performance" page :
Use a lazy restore
By default, the cache tries to restore all artifacts for a project preemptively. Lazy restore could give significant time by avoiding requesting and downloading unnecessary artifacts from the cache. It is beneficial when small changes are a dominating build pattern. Use command line flag:
-Dmaven.build.cache.lazyRestore=true";In cache corruption situations, the lazy cache cannot support fallback to normal execution. It will fail instead. To heal the corrupted cache, manually remove corrupted cache entries or force cache rewrite.
Implied by this piece of documentation : Contrary to the lazy restore, the pre-emptive default restore option supports fallback. This is actually not the case for artifacts, the most important build outputs.
Alongside, restoration of generated source and other attached outputs works as expected : any cache corruption will trigger a regular build.
I have a strong feeling we are just in presence of a simple implementation oversight I'm fixing here : calling "run" on a FutureTask does the restoration but do no throw any exception upon failure. It has to be checked via a get method.
@AlexanderAshitkin
To complete a bit more this question :
Does it solve any practical issues?
Yes. Actually, if a corrupted/non downloadable artefact is used in a dependant module while we have not used the "lazy restore" functionality : the build will fail.
Worst case before this fix : the build fails
Worst case after this fix : we do a regular build.
Seems pretty important that one should be able to choose stability over rapidity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that keeping the fallback is important. Thanks for the clarifications! We can add such optimization in the future by tracking artifact usage downstream.
| restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig); | ||
| } else { | ||
| } | ||
| if (!restored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Restoring in the tmp + mv seems like a more robust approach.
| .adjustArchiveArtifactVersion(project, originalVersion, artifactFile) | ||
| .toFile(); | ||
| }); | ||
| if (!cacheConfig.isLazyRestore()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a more relevant name for the original intent is lazyDownload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"LazyRestore" seems fine to me. I would instead replace "download" by "restore" since there is no downloading involved when working with a local cache. But it's not something I will do in this PR. :P
…oration error or cached mojo inconsistency
| * CacheState | ||
| */ | ||
| public enum CacheState { | ||
| NOT_INITIALIZED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum value was not used.
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java
Outdated
Show resolved
Hide resolved
| return projectExecutions.get(project); | ||
| } | ||
|
|
||
| public void remove(MavenProject project) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood well, this was used to "reset" the project mojo's execution spying after a "rollback" from using the cache to a regular build execution.
I'm not anymore in a position where I need this function so I deleted it. But I could use your careful reviewing on it @maximilian-novikov-db (I see you name on the oldest commit of the class :p )
AlexanderAshitkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I added a few comments for potential optimizations, but those enhancements are out of the scope.
| LOGGER.debug("Execute mojos post cache segment"); | ||
| List<MojoExecution> postCachedSegment = | ||
| lifecyclePhasesHelper.getPostCachedSegment(project, mojoExecutions, build); | ||
| for (MojoExecution mojoExecution : postCachedSegment) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbuntrock, it is out of scope, but please notice two edge cases here:
- A post-cached segment is possible if initially the build was cached at, say
packagephase and the current build runs a later phase, sayinstall. In that case, the best behavior is to update the cached build with the more complete version, which won't happen in theSUCCESScase - maybe we need a separate status. - The same for forced executions - even if the execution is forced, the mojo configuration could be not consistent with the cached one. If it is not consistent, it could be desirable to update the cache. We shouldn't update the cache on discrepancies only if configs prohibit it.
|
Hello there, A little up on this old-timer PR. If I'm correct I fixed every mandatory request. Any other thought? |
|
I think I found a possible regression of this issue. Please take a look at https://issues.apache.org/jira/browse/MNG-8546 |
|
Resolve #330 |
1 similar comment
|
Resolve #330 |
Description of this pull request
https://issues.apache.org/jira/projects/MBUILDCACHE/issues/MBUILDCACHE-67
Draft until PR-MBUILDCACHE-64 is merged since I want to factorize a function in the IT test (see the related todo).
MR checklist
Following this checklist to help us incorporate your
contribution quickly and easily:
for the change (usually before you start working on it). Trivial changes like typos do not
require a JIRA issue. Your pull request should address just this issue, without
pulling in other changes.
[MBUILDCACHE-XXX] - Fixes bug in ApproximateQuantiles,where you replace
MBUILDCACHE-XXXwith the appropriate JIRA issue. Best practiceis to use the JIRA issue title in the pull request title and in the first line of the
commit message.
mvn clean verifyto make sure basic checks pass. A more thorough check willbe performed on your pull request automatically.
If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.
To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.
I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
In any other case, please file an Apache Individual Contributor License Agreement.