Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move BES upload check earlier, to prevent calling FindMissingBlobs #16999

Conversation

brentleyjones
Copy link
Contributor

Before this change, every file referenced in the BEP would have a FindMissingBlobs call, which can be very expensive with hundreds of outputs. This could make --experimental_remote_build_event_upload=minimal a performance regression compared to --experimental_build_event_upload_strategy=local.

Before this change, every file referenced in the BEP would have a `FindMissingBlobs` call, which can be very expensive with hundreds of outputs. This could make `--experimental_remote_build_event_upload=minimal` a performance regression compared to `--experimental_build_event_upload_strategy=local`.
@brentleyjones brentleyjones requested a review from a team as a code owner December 12, 2022 20:46
@brentleyjones
Copy link
Contributor Author

@coeuvre @tjgq

@brentleyjones
Copy link
Contributor Author

brentleyjones commented Dec 12, 2022

Side comment: it would be nice if those FindMissingBlobs calls could be consolidated. Thousands of calls in the span of a couple seconds is pretty extreme.

@brentleyjones
Copy link
Contributor Author

If this gets in quickly, I think it should go in 6.0 as it only affects code that uses --experimental_remote_build_event_upload=minimal, which is new to the release, and it also makes the flag more usable by the people that are most likely to be reaching for it. If it can't get into 6.0, then at least 6.1.

Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

You are right, if Bazel shouldn't upload these files, it shouldn't call FindMissingBlobs for them neither.

@@ -236,7 +237,9 @@ private Single<List<PathMetadata>> queryRemoteCache(
List<PathMetadata> filesToQuery = new ArrayList<>();
Set<Digest> digestsToQuery = new HashSet<>();
for (PathMetadata path : paths) {
if (shouldQuery(path)) {
if (!shouldUploadBasedOnUploadMode(path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this check into shouldQuery. Tests are failing because we didn't add path into knownRemotePaths below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't put it into shouldQuery, was because the path might not be remote. So not sure on the answer to this.

@coeuvre
Copy link
Member

coeuvre commented Dec 13, 2022

Ha, the reason we call FindMissingBlobs is, according to #16299 (comment), to convert Paths for files that are already uploaded to the remote cache.

@brentleyjones
Copy link
Contributor Author

brentleyjones commented Dec 13, 2022

So yeah, we want the paths to be converted, if we uploaded them in this build, not if they were uploaded ever. Do you recommend a different way to accomplish this (or is this the correct way to get that behavior)? This PR was mainly a bug report with a suggested fix. I'm fine with you opening a PR with a proper fix and closing out this PR.

@coeuvre
Copy link
Member

coeuvre commented Dec 13, 2022

we want the paths to be converted, if we uploaded them in this build, not if they were uploaded ever.

We can know if one file is uploaded in this build, but we can't know if the file is available in the remote cache (because it is uploaded before) without using FindMissingBlobs.

If we solely relies on whether the file is uploaded in this build to determine whether to convert the path, there could be an issue that the paths for previously uploaded output are represented as local path in this build.

We could instead always use bytestream:// for the paths, but then BES could point to some files missing from remote cache (because Bazel didn't upload them). I guess it's fine because remote cache could delete blobs anyway. WDYT?

@coeuvre
Copy link
Member

coeuvre commented Dec 13, 2022

OTOH, we can always set uri to file://. There is a digest field in the File message which can be used as a pointer to CAS.

@brentleyjones
Copy link
Contributor Author

We could instead always use bytestream:// for the paths, but then BES could point to some files missing from remote cache (because Bazel didn't upload them). I guess it's fine because remote cache could delete blobs anyway. WDYT?

I like this one.

@brentleyjones
Copy link
Contributor Author

Either way, @coeuvre can you submit a new PR with the changes/test fixes? This was more a bug report than a fix, so I don't need us to go back and forth on my PR. Thanks!

@coeuvre
Copy link
Member

coeuvre commented Dec 14, 2022

yes, working on it.

ShreeM01 added a commit that referenced this pull request Jan 30, 2023
When using `--experimental_remote_build_event_upload=minimal`, instead of querying remote cache to decide whether convert local path of a file to `bytestream://`, it now always use `bytestream://` for BEP referenced files.

Closes #16999.

Closes #17025.

PiperOrigin-RevId: 502351401
Change-Id: Ic858367ffaf3c2a2c20db88ada85fbbe1d93aae8

Co-authored-by: Chi Wang <chiwang@google.com>
@brentleyjones brentleyjones deleted the bj/move-bes-upload-check-earlier-to-prevent-calling-findmissingblobs branch January 30, 2023 16:54
hvadehra pushed a commit that referenced this pull request Feb 14, 2023
When using `--experimental_remote_build_event_upload=minimal`, instead of querying remote cache to decide whether convert local path of a file to `bytestream://`, it now always use `bytestream://` for BEP referenced files.

Closes #16999.

Closes #17025.

PiperOrigin-RevId: 502351401
Change-Id: Ic858367ffaf3c2a2c20db88ada85fbbe1d93aae8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants