Skip to content

HADOOP-16644. Do a HEAD after a PUT to get the modtime. #1627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Oct 9, 2019

WiP: no tests. What would a test look like? best to use some mock
to fix the remote time to always be slightly different from the local.
Or we make the clock of the S3A FS patchable, which is potentially
the most flexible.

Its unfortunate we need to do this; inclusion of the result in the put response, or, if it
is there, extraction of it, is what would work best -especially as that would guarantee
consistent read on update, the way an unversioned HEAD does not

Change-Id: I2c99752647f522991b1f89dd9c43f3a2e9b98bf5

@steveloughran steveloughran added enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome labels Oct 9, 2019
@steveloughran
Copy link
Contributor Author

Update: this is the wrong approach. The PUT response does include the modtime in the metadata

@steveloughran steveloughran changed the title HADOOP-16444. Do a HEAD after a PUT to get the modtime. HADOOP-16644. Do a HEAD after a PUT to get the modtime. Oct 9, 2019
WiP: no tests. What would a test look like? best to use some mock
to fix the remote time to always be slightly different from the local.
Or we make the clock of the S3A FS patchable, which is potentially
the most flexible

Change-Id: I2c99752647f522991b1f89dd9c43f3a2e9b98bf5
@steveloughran steveloughran force-pushed the s3/HADOOP-16644-HEAD-after-PUT branch from 5b27b86 to 679fa3d Compare October 9, 2019 14:12
@apache apache deleted a comment from hadoop-yetus Oct 10, 2019
@apache apache deleted a comment from hadoop-yetus Oct 10, 2019
@steveloughran
Copy link
Contributor Author

I've lifted the changes to the StatusProbes enum to #1601 ; that one is ready to go in, after which I'll have to fix this one up.

Proposed

  • caller passes down the PutResult; if this is non-null it is used as the source of a timestamp.
  • For multipart uploads we do the HEAD only, setting version + etag and update the timestamp if we get a result from the consistent version. If we get a different file, then do not retry, just carry on.

@bgaborg bgaborg self-requested a review November 19, 2019 16:42
@steveloughran
Copy link
Contributor Author

Update: put result doesn't include timestamp. unless we do a HEAD every time, this is in trouble.

I'm going to close the patch as is. If we were to revisit this it would be needed for every file and I'd make an option for some special cases (Yarn job submission) where timestamp mismatch is a blocker

@steveloughran steveloughran deleted the s3/HADOOP-16644-HEAD-after-PUT branch October 15, 2021 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fs/s3 changes related to hadoop-aws; submitter must declare test endpoint work in progress PRs still Work in Progress; reviews not expected but still welcome
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant