Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Save timestamp of last successful data scan for each container (in the .container file). After a datanode restart, resume data scanning with the container that was least recently scanned.

Newly closed containers have no timestamp and are thus scanned first during the next iteration. This will be changed in HDDS-1369, which proposes to scan newly closed containers immediately.

https://issues.apache.org/jira/browse/HDDS-1228

How was this patch tested?

Created and closed containers. Restarted datanode while scanning was in progress. Verified that after the restart, scanner resumed from the container where it was interrupted.

datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned never
datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:37:07.570Z
datanode_1  | 2019-10-08 19:37:07 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 0 minutes. Number of iterations (since the data-node restart) : 1, Number of containers scanned in this iteration : 1, Number of unhealthy containers found in this iteration : 0
datanode_1  | 2019-10-08 19:37:17 DEBUG ContainerDataScanner:148 - Scanning container 2, last scanned never
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed scan of container 2 at 2019-10-08T19:38:57.402Z
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned at 2019-10-08T19:37:07.570Z
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:38:57.443Z
datanode_1  | 2019-10-08 19:38:57 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 2, Number of containers scanned in this iteration : 2, Number of unhealthy containers found in this iteration : 0
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning container 3, last scanned never
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed scan of container 3 at 2019-10-08T19:39:02.402Z
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning container 4, last scanned never
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed scan of container 4 at 2019-10-08T19:39:02.430Z
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning container 5, last scanned never
datanode_1  | 2019-10-08 19:39:11 ERROR HddsDatanodeService:75 - RECEIVED SIGNAL 15: SIGTERM
datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
datanode_1  | 2019-10-08 19:39:22 DEBUG ContainerDataScanner:148 - Scanning container 5, last scanned never
datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:155 - Completed scan of container 5 at 2019-10-08T19:40:18.268Z
datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:148 - Scanning container 6, last scanned never
datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:155 - Completed scan of container 6 at 2019-10-08T19:40:31.735Z
datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:148 - Scanning container 2, last scanned at 2019-10-08T19:38:57.402Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed scan of container 2 at 2019-10-08T19:42:12.128Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned at 2019-10-08T19:38:57.443Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:42:12.140Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning container 3, last scanned at 2019-10-08T19:39:02.402Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed scan of container 3 at 2019-10-08T19:42:16.629Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:148 - Scanning container 4, last scanned at 2019-10-08T19:39:02.430Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed scan of container 4 at 2019-10-08T19:42:16.669Z
datanode_1  | 2019-10-08 19:42:16 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 2 minutes. Number of iterations (since the data-node restart) : 1, Number of containers scanned in this iteration : 6, Number of unhealthy containers found in this iteration : 0

Also tested upgrade from Ozone 0.4.0. (Downgrade does not work, see HDDS-2268.)

@adoroszlai
Copy link
Contributor Author

/label ozone

@elek elek added the ozone label Oct 8, 2019
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 728 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
0 mvndep 24 Maven dependency ordering for branch
-1 mvninstall 31 hadoop-hdds in trunk failed.
-1 mvninstall 36 hadoop-ozone in trunk failed.
-1 compile 21 hadoop-hdds in trunk failed.
-1 compile 16 hadoop-ozone in trunk failed.
+1 checkstyle 61 trunk passed
+1 mvnsite 0 trunk passed
+1 shadedclient 857 branch has no errors when building and testing our client artifacts.
-1 javadoc 22 hadoop-hdds in trunk failed.
-1 javadoc 20 hadoop-ozone in trunk failed.
0 spotbugs 958 Used deprecated FindBugs config; considering switching to SpotBugs.
-1 findbugs 34 hadoop-hdds in trunk failed.
-1 findbugs 20 hadoop-ozone in trunk failed.
_ Patch Compile Tests _
0 mvndep 18 Maven dependency ordering for patch
-1 mvninstall 34 hadoop-hdds in the patch failed.
-1 mvninstall 37 hadoop-ozone in the patch failed.
-1 compile 24 hadoop-hdds in the patch failed.
-1 compile 19 hadoop-ozone in the patch failed.
-1 javac 24 hadoop-hdds in the patch failed.
-1 javac 19 hadoop-ozone in the patch failed.
+1 checkstyle 57 the patch passed
+1 mvnsite 0 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 shadedclient 717 patch has no errors when building and testing our client artifacts.
-1 javadoc 23 hadoop-hdds in the patch failed.
-1 javadoc 20 hadoop-ozone in the patch failed.
-1 findbugs 32 hadoop-hdds in the patch failed.
-1 findbugs 20 hadoop-ozone in the patch failed.
_ Other Tests _
-1 unit 28 hadoop-hdds in the patch failed.
-1 unit 27 hadoop-ozone in the patch failed.
+1 asflicense 33 The patch does not generate ASF License warnings.
3093
Subsystem Report/Notes
Docker Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/Dockerfile
GITHUB PR #1622
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux a3505a42bf7e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 87d9f36
Default Java 1.8.0_222
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-mvninstall-hadoop-hdds.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-mvninstall-hadoop-ozone.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-compile-hadoop-hdds.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-compile-hadoop-ozone.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-javadoc-hadoop-hdds.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-javadoc-hadoop-ozone.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-findbugs-hadoop-hdds.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-findbugs-hadoop-ozone.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-mvninstall-hadoop-hdds.txt
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-mvninstall-hadoop-ozone.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-hdds.txt
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-ozone.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-hdds.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-ozone.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-javadoc-hadoop-hdds.txt
javadoc https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-javadoc-hadoop-ozone.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-findbugs-hadoop-hdds.txt
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-findbugs-hadoop-ozone.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-unit-hadoop-hdds.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/testReport/
Max. process+thread count 440 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/container-service U: hadoop-hdds
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/console
versions git=2.7.4 maven=3.3.9
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@adoroszlai
Copy link
Contributor Author

@arp7 please review

@arp7 arp7 self-requested a review October 10, 2019 14:30

private String checksum;
public static final Charset CHARSET_ENCODING = Charset.forName("UTF-8");
private Long dataScanTimestamp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a Java Optional. Then instead of null we can check for Optional.absent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add a comment stating what the number means? Is it Unix epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. I will address these and update the pull request in the new repo.

metrics.incNumUnHealthyContainers();
controller.markContainerUnhealthy(
c.getContainerData().getContainerID());
controller.markContainerUnhealthy(containerId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call logScanCompleted and updateDataScanTimestamp in the failure path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid this for two reasons:

  1. The full scan includes a scan of the metadata, too, and the failure may be due to metadata problem. Eg. if the .container file is missing or invalid etc. In that case we cannot update the timestamp in the file.
  2. Unhealthy containers are skipped during further iterations, so the timestamp would not make much difference anyway.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. The change looks pretty good to me overall.

@elek
Copy link
Member

elek commented Oct 13, 2019

Thank you very much to open this pull request.

During the weekend the Ozone source code has been moved out from apache/hadoop repository to apache/hadoop-ozone repository.

This git commits are rewritten, but the branch of this pull request is also transformed (state of Saturday morning), you can use the new, migrated branch to recreate this pull request.

Your pull request is important for us: Can you please re-create your pull request in the new repository?

1. Create a new fork of https://github.com/apache/hadoop-ozone

2. Clone it and have both your fork and the apache repo as remotes:

git clone git@github.com:adoroszlai/hadoop-ozone.git
cd hadoop-ozone
git remote add apache git@github.com:apache/hadoop-ozone.git
git fetch apache

3. Fetch your migrated branch and push it to your fork.

git checkout -b HDDS-1228 apache/HDDS-1228
git push origin HDDS-1228

4. And create the new pull request on the new repository.

https://github.com/apache/hadoop-ozone/compare/master...adoroszlai:HDDS-1228?expand=1

If you need more information, please check this wiki page or contact with me (my github user name + apache.org).

Thank you, and sorry for the inconvenience.

@adoroszlai
Copy link
Contributor Author

Moved to apache/ozone#7

@adoroszlai adoroszlai closed this Oct 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants