-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDDS-1228. Chunk Scanner Checkpoints #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/label ozone |
|
💔 -1 overall
This message was automatically generated. |
|
@arp7 please review |
|
|
||
| private String checksum; | ||
| public static final Charset CHARSET_ENCODING = Charset.forName("UTF-8"); | ||
| private Long dataScanTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a Java Optional. Then instead of null we can check for Optional.absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you add a comment stating what the number means? Is it Unix epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. I will address these and update the pull request in the new repo.
| metrics.incNumUnHealthyContainers(); | ||
| controller.markContainerUnhealthy( | ||
| c.getContainerData().getContainerID()); | ||
| controller.markContainerUnhealthy(containerId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also call logScanCompleted and updateDataScanTimestamp in the failure path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid this for two reasons:
- The full scan includes a scan of the metadata, too, and the failure may be due to metadata problem. Eg. if the
.containerfile is missing or invalid etc. In that case we cannot update the timestamp in the file. - Unhealthy containers are skipped during further iterations, so the timestamp would not make much difference anyway.
arp7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. The change looks pretty good to me overall.
|
Thank you very much to open this pull request. During the weekend the Ozone source code has been moved out from apache/hadoop repository to apache/hadoop-ozone repository. This git commits are rewritten, but the branch of this pull request is also transformed (state of Saturday morning), you can use the new, migrated branch to recreate this pull request. Your pull request is important for us: Can you please re-create your pull request in the new repository? 1. Create a new fork of https://github.com/apache/hadoop-ozone 2. Clone it and have both your fork and the apache repo as remotes: 3. Fetch your migrated branch and push it to your fork. 4. And create the new pull request on the new repository. https://github.com/apache/hadoop-ozone/compare/master...adoroszlai:HDDS-1228?expand=1 If you need more information, please check this wiki page or contact with me (my github user name + apache.org). Thank you, and sorry for the inconvenience. |
|
Moved to apache/ozone#7 |
What changes were proposed in this pull request?
Save timestamp of last successful data scan for each container (in the
.containerfile). After a datanode restart, resume data scanning with the container that was least recently scanned.Newly closed containers have no timestamp and are thus scanned first during the next iteration. This will be changed in HDDS-1369, which proposes to scan newly closed containers immediately.
https://issues.apache.org/jira/browse/HDDS-1228
How was this patch tested?
Created and closed containers. Restarted datanode while scanning was in progress. Verified that after the restart, scanner resumed from the container where it was interrupted.
Also tested upgrade from Ozone 0.4.0. (Downgrade does not work, see HDDS-2268.)