Skip to content

Conversation

@kbuntrock
Copy link
Contributor

Description of this pull request

https://issues.apache.org/jira/projects/MBUILDCACHE/issues/MBUILDCACHE-67

  • Fix artefact restoration error was not executing the regular project build as expected
  • Fix non-lazy artefact restoration was in fact partially lazy
  • Added an IT test on this

Draft until PR-MBUILDCACHE-64 is merged since I want to factorize a function in the IT test (see the related todo).

MR checklist

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Make sure there is a MBUILDCACHE JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MBUILDCACHE-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace MBUILDCACHE-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@kbuntrock kbuntrock force-pushed the MBUILDCACHE-67-artefact-restoration-errors branch from 3f48f0d to 943e47d Compare August 3, 2023 21:30
@kbuntrock kbuntrock force-pushed the MBUILDCACHE-67-artefact-restoration-errors branch from 943e47d to 4d53a0a Compare August 3, 2023 21:36
@olamy olamy added the bug Something isn't working label Aug 4, 2023
restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig);
} else {
}
if (!restored) {
Copy link
Contributor

@maximilian-novikov-db maximilian-novikov-db Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we need to execute all mojos in this case including for Clean phase to remove any "garbage" which could be left after an unsuccessful restore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Restoring in the tmp + mv seems like a more robust approach.

Copy link
Contributor Author

@kbuntrock kbuntrock Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea @maximilian-novikov-db.

@AlexanderAshitkin : do you mean we should change the way we restore artefacts from remote caches? I'm more inclined to the "second clean phase" idea since it covers any restoration problem (in case "clean" was asked in the build). For further changes, I guess it could be done in another PR.

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is discussable:

  • clean phase has been completed by this time, and repeating it sounds slightly off to me.
  • Keeping partially restored artifacts is error-prone, and clean might not even be requested.
  • invoking ad-hoc clean might wipe out other cached data and might be undesirable

I see 2 options here: leave it as is (and rely on the user requested clean), or implement mv logic to minimize the risk of corrupted cache data in the target dir.

Copy link
Contributor

@maximilian-novikov-db maximilian-novikov-db Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Keeping partially restored artifacts is error-prone, and clean might not even be requested.

all mojos except clean are executed again, if it falls back to the "normal" build, i'd say keeping any partially restored data is harmful and may lead to unpredictable result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to remove the filter here

|| lifecyclePhasesHelper.isLaterPhaseThanClean(mojoExecution.getLifecyclePhase())) {

and not to add Clean if it's not requested by user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that seeing a second clean phase seems a bit "off" but I could be in piece with it since it is a abnormal edge case (and we will write a beautiful log to explain it. :p)

Trying to summarize the exchange in order to check if we have a consensus in a situation where the cache fails the restoration

  • the build was requested without clean phase : we do not change the actual behaviour.
  • the build was requested with a clean phase : an extra clean phase will be executed before the actual behaviour (= the whole build is executed).

We change generically the log "Cannot restore project artifacts, continuing with non cached build" by :
"Cannot restore project artifacts, continuing with non cached build (clean phase will be re-executed if requested)".

And in another PR, the remote download could be enhanced by doing it in two steps : downloading in a temp directory + moving it
Reducing the risk of leftovers files.

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some clarifications about the current implementation:

  • Logic of rerunning on failures is already within the BuildCacheMojosExecutionStrategy#restoreProject:
if (!restored) {
    // cleanup partial state
    project.getArtifact().setFile(null);
    project.getArtifact().setResolved(false);
    mojoListener.remove(project);
    // build as usual
    for (MojoExecution mojoExecution : cachedSegment) { <<< rebuilding everything
        mojoExecutionRunner.run(mojoExecution);
    }
}
  • !restored condition should be read as "at least part of the project were rebuilt, not restored."
  • It doesn't mean an error automatically. It means something was rebuilt (at least in the design). 100% successful completion of the restoreProject should return false if it was a non-cached path and the build state differs from the cached one.

So, in all, it feels like we might need fixes within the restoreProject:

  1. to implement proper clean phase
  2. Add a verification step to the "forced" steps - to force cache update on the detected inconsistencies
  3. Make sure that in case of additional executions ("post-cached" or similar), we return false to update the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexanderAshitkin : Sorry, did not had time to investigate this the past week, hope I'll have a bit more time on this one.

Copy link
Contributor Author

@kbuntrock kbuntrock Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, finally had some time to work on what you pointed out @AlexanderAshitkin : regular build after a failed try to use the cache was handled differently depending the situation.
I will add comments on some lines of this commit to improve your ability to review it effectively. But here are some global points :

  • I moved the part where we check cached mojos consistency before artifact restoration. There are two benefits : this steps is faster than restoration over the network. So if we can't use the cache because of mojos parameters, better to know it before downloading artifacts. And secondly: there are no more "things" to rollback since setting artifacts of the project is done at the very end of the restoration.
  • I kept the regular build execution outside the "restoreProject" function.
  • I execute an extra clean phase only if two conditions are met : clean phase was requested in the build + there are potentially files restored in the project directory (it does not happen if we don't restore generated resources, "outputArtifacts" or if an error occurs before)

I'm removing the draft on this PR since I don't want anymore to link it with my other PR. The duplicated 3 line function used in IT test can be remove in a very small 3rd PR.

if (!cacheConfig.isLazyRestore()) {
downloadTask.run();
try {
downloadTask.get();
Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kevin. Please elaborate, what are the benefits of implementing this. Because drawbacks are apparent - cached artifacts not used in the actual build might fail the cache now (exotic classifiers, etc.). If that starts happening, it is a more serious issue than a delayed error. The change has its own merits, but the default behavior should be the most reasonable considering all the cases.

Copy link
Contributor Author

@kbuntrock kbuntrock Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Alexander, nice to meet you. :)

As I apprehend the "lazy restore" functionality, there are two approach, each one with their pros and cons.

Non lazy restore (the default) : immediate restoration of artefact and other build outputs.
Build stability is the main concern. If the restoration fails, the build is executed normally. (I'm interrogative about the part where you say it can fail the build. For me it's the exact opposite intend). Con: we might restore stuff not used further in another module build.

Lazy restore : restoration of artefacts is done only when used. It can save some bandwidth and speedup the build when used with a remote cache. (Pros with a local cache are not so clear to me.) Con: If the cache is corrupted/not accessible and an artefact must be used as a dependency in another module : the build is doomed and will fail.

Actually, the non lazy restore does not really work as expected since the restore is done (download is immediately launched when using a remote cache), but we don't check the result straight away. It might be far too late when we'll discover the artefact did not restored well.

Am I missing something in the picture?

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. The main question is whether or not to fail the cache when unused (in reactor) artifacts cannot be downloaded. I’m trying to understand what are the benefits of failing the cache. Does it solve any practical issues? Because in terms of the cache hit rate, this change brings regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbuntrock could you please move this change to a separate pr

Copy link
Contributor Author

@kbuntrock kbuntrock Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximilian-novikov-db : I would rather not since I consider this as part of the bug I am willing to fix.

Here are some other arguments.

The documentation state the following in the "performance" page :

Use a lazy restore

By default, the cache tries to restore all artifacts for a project preemptively. Lazy restore could give significant time by avoiding requesting and downloading unnecessary artifacts from the cache. It is beneficial when small changes are a dominating build pattern. Use command line flag:

-Dmaven.build.cache.lazyRestore=true";

In cache corruption situations, the lazy cache cannot support fallback to normal execution. It will fail instead. To heal the corrupted cache, manually remove corrupted cache entries or force cache rewrite.

Implied by this piece of documentation : Contrary to the lazy restore, the pre-emptive default restore option supports fallback. This is actually not the case for artifacts, the most important build outputs.
Alongside, restoration of generated source and other attached outputs works as expected : any cache corruption will trigger a regular build.
I have a strong feeling we are just in presence of a simple implementation oversight I'm fixing here : calling "run" on a FutureTask does the restoration but do no throw any exception upon failure. It has to be checked via a get method.

@AlexanderAshitkin
To complete a bit more this question :

Does it solve any practical issues?

Yes. Actually, if a corrupted/non downloadable artefact is used in a dependant module while we have not used the "lazy restore" functionality : the build will fail.
Worst case before this fix : the build fails
Worst case after this fix : we do a regular build.

Seems pretty important that one should be able to choose stability over rapidity.

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that keeping the fallback is important. Thanks for the clarifications! We can add such optimization in the future by tracking artifact usage downstream.

restored &= restoreProject(result, mojoExecutions, mojoExecutionRunner, cacheConfig);
} else {
}
if (!restored) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Restoring in the tmp + mv seems like a more robust approach.

.adjustArchiveArtifactVersion(project, originalVersion, artifactFile)
.toFile();
});
if (!cacheConfig.isLazyRestore()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a more relevant name for the original intent is lazyDownload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LazyRestore" seems fine to me. I would instead replace "download" by "restore" since there is no downloading involved when working with a local cache. But it's not something I will do in this PR. :P

@kbuntrock kbuntrock marked this pull request as ready for review August 30, 2023 19:39
* CacheState
*/
public enum CacheState {
NOT_INITIALIZED,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum value was not used.

return projectExecutions.get(project);
}

public void remove(MavenProject project) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood well, this was used to "reset" the project mojo's execution spying after a "rollback" from using the cache to a regular build execution.
I'm not anymore in a position where I need this function so I deleted it. But I could use your careful reviewing on it @maximilian-novikov-db (I see you name on the oldest commit of the class :p )

Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I added a few comments for potential optimizations, but those enhancements are out of the scope.

LOGGER.debug("Execute mojos post cache segment");
List<MojoExecution> postCachedSegment =
lifecyclePhasesHelper.getPostCachedSegment(project, mojoExecutions, build);
for (MojoExecution mojoExecution : postCachedSegment) {
Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbuntrock, it is out of scope, but please notice two edge cases here:

  • A post-cached segment is possible if initially the build was cached at, say package phase and the current build runs a later phase, say install. In that case, the best behavior is to update the cached build with the more complete version, which won't happen in the SUCCESS case - maybe we need a separate status.
  • The same for forced executions - even if the execution is forced, the mojo configuration could be not consistent with the cached one. If it is not consistent, it could be desirable to update the cache. We shouldn't update the cache on discrepancies only if configs prohibit it.

@kbuntrock
Copy link
Contributor Author

Hello there,

A little up on this old-timer PR. If I'm correct I fixed every mandatory request.

Any other thought?

@olamy olamy changed the title [MBUILDCACHE-67] Bugfix artefact restoration error not handled properly [MBUILDCACHE-67] Bugfix artifact restoration error not handled properly Oct 12, 2023
@olamy olamy merged commit e85610d into apache:master Oct 12, 2023
@cowwoc
Copy link
Contributor

cowwoc commented Feb 1, 2025

I think I found a possible regression of this issue. Please take a look at https://issues.apache.org/jira/browse/MNG-8546

@jira-importer
Copy link

Resolve #330

1 similar comment
@jira-importer
Copy link

Resolve #330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants