HDDS-1228. Chunk Scanner Checkpoints #1622

adoroszlai · 2019-10-08T20:52:49Z

What changes were proposed in this pull request?

Save timestamp of last successful data scan for each container (in the .container file). After a datanode restart, resume data scanning with the container that was least recently scanned.

Newly closed containers have no timestamp and are thus scanned first during the next iteration. This will be changed in HDDS-1369, which proposes to scan newly closed containers immediately.

https://issues.apache.org/jira/browse/HDDS-1228

How was this patch tested?

Created and closed containers. Restarted datanode while scanning was in progress. Verified that after the restart, scanner resumed from the container where it was interrupted.

datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned never
datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:37:07.570Z
datanode_1  | 2019-10-08 19:37:07 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 0 minutes. Number of iterations (since the data-node restart) : 1, Number of containers scanned in this iteration : 1, Number of unhealthy containers found in this iteration : 0
datanode_1  | 2019-10-08 19:37:17 DEBUG ContainerDataScanner:148 - Scanning container 2, last scanned never
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed scan of container 2 at 2019-10-08T19:38:57.402Z
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned at 2019-10-08T19:37:07.570Z
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:38:57.443Z
datanode_1  | 2019-10-08 19:38:57 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 2, Number of containers scanned in this iteration : 2, Number of unhealthy containers found in this iteration : 0
datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning container 3, last scanned never
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed scan of container 3 at 2019-10-08T19:39:02.402Z
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning container 4, last scanned never
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed scan of container 4 at 2019-10-08T19:39:02.430Z
datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning container 5, last scanned never
datanode_1  | 2019-10-08 19:39:11 ERROR HddsDatanodeService:75 - RECEIVED SIGNAL 15: SIGTERM
datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
datanode_1  | 2019-10-08 19:39:22 DEBUG ContainerDataScanner:148 - Scanning container 5, last scanned never
datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:155 - Completed scan of container 5 at 2019-10-08T19:40:18.268Z
datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:148 - Scanning container 6, last scanned never
datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:155 - Completed scan of container 6 at 2019-10-08T19:40:31.735Z
datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:148 - Scanning container 2, last scanned at 2019-10-08T19:38:57.402Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed scan of container 2 at 2019-10-08T19:42:12.128Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning container 1, last scanned at 2019-10-08T19:38:57.443Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed scan of container 1 at 2019-10-08T19:42:12.140Z
datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning container 3, last scanned at 2019-10-08T19:39:02.402Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed scan of container 3 at 2019-10-08T19:42:16.629Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:148 - Scanning container 4, last scanned at 2019-10-08T19:39:02.430Z
datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed scan of container 4 at 2019-10-08T19:42:16.669Z
datanode_1  | 2019-10-08 19:42:16 INFO  ContainerDataScanner:122 - Completed an iteration of container data scrubber in 2 minutes. Number of iterations (since the data-node restart) : 1, Number of containers scanned in this iteration : 6, Number of unhealthy containers found in this iteration : 0

Also tested upgrade from Ozone 0.4.0. (Downgrade does not work, see HDDS-2268.)

adoroszlai · 2019-10-08T20:53:08Z

/label ozone

hadoop-yetus · 2019-10-08T22:12:35Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	728	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 2 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	24	Maven dependency ordering for branch
-1	mvninstall	31	hadoop-hdds in trunk failed.
-1	mvninstall	36	hadoop-ozone in trunk failed.
-1	compile	21	hadoop-hdds in trunk failed.
-1	compile	16	hadoop-ozone in trunk failed.
+1	checkstyle	61	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	857	branch has no errors when building and testing our client artifacts.
-1	javadoc	22	hadoop-hdds in trunk failed.
-1	javadoc	20	hadoop-ozone in trunk failed.
0	spotbugs	958	Used deprecated FindBugs config; considering switching to SpotBugs.
-1	findbugs	34	hadoop-hdds in trunk failed.
-1	findbugs	20	hadoop-ozone in trunk failed.
		_ Patch Compile Tests _
0	mvndep	18	Maven dependency ordering for patch
-1	mvninstall	34	hadoop-hdds in the patch failed.
-1	mvninstall	37	hadoop-ozone in the patch failed.
-1	compile	24	hadoop-hdds in the patch failed.
-1	compile	19	hadoop-ozone in the patch failed.
-1	javac	24	hadoop-hdds in the patch failed.
-1	javac	19	hadoop-ozone in the patch failed.
+1	checkstyle	57	the patch passed
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	717	patch has no errors when building and testing our client artifacts.
-1	javadoc	23	hadoop-hdds in the patch failed.
-1	javadoc	20	hadoop-ozone in the patch failed.
-1	findbugs	32	hadoop-hdds in the patch failed.
-1	findbugs	20	hadoop-ozone in the patch failed.
		_ Other Tests _
-1	unit	28	hadoop-hdds in the patch failed.
-1	unit	27	hadoop-ozone in the patch failed.
+1	asflicense	33	The patch does not generate ASF License warnings.
		3093

Subsystem	Report/Notes
Docker	Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/Dockerfile
GITHUB PR	#1622
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux a3505a42bf7e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `87d9f36`
Default Java	1.8.0_222
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-mvninstall-hadoop-hdds.txt
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-mvninstall-hadoop-ozone.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-compile-hadoop-hdds.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-compile-hadoop-ozone.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-javadoc-hadoop-hdds.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-javadoc-hadoop-ozone.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-findbugs-hadoop-hdds.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/branch-findbugs-hadoop-ozone.txt
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-mvninstall-hadoop-hdds.txt
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-mvninstall-hadoop-ozone.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-hdds.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-ozone.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-hdds.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-compile-hadoop-ozone.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-javadoc-hadoop-hdds.txt
javadoc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-javadoc-hadoop-ozone.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-findbugs-hadoop-hdds.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-findbugs-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-unit-hadoop-hdds.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/testReport/
Max. process+thread count	440 (vs. ulimit of 5500)
modules	C: hadoop-hdds/common hadoop-hdds/container-service U: hadoop-hdds
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1622/1/console
versions	git=2.7.4 maven=3.3.9
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

adoroszlai · 2019-10-10T14:29:23Z

@arp7 please review

arp7 · 2019-10-11T20:31:46Z

...ainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java


  private String checksum;
-  public static final Charset CHARSET_ENCODING = Charset.forName("UTF-8");
+  private Long dataScanTimestamp;


Can you make this a Java Optional. Then instead of null we can check for Optional.absent.

Also can you add a comment stating what the number means? Is it Unix epoch?

Thanks for the comments. I will address these and update the pull request in the new repo.

arp7 · 2019-10-11T20:36:47Z

...-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerDataScanner.java

            metrics.incNumUnHealthyContainers();
-            controller.markContainerUnhealthy(
-                c.getContainerData().getContainerID());
+            controller.markContainerUnhealthy(containerId);


We should also call logScanCompleted and updateDataScanTimestamp in the failure path.

I would avoid this for two reasons:

The full scan includes a scan of the metadata, too, and the failure may be due to metadata problem. Eg. if the .container file is missing or invalid etc. In that case we cannot update the timestamp in the file.

Unhealthy containers are skipped during further iterations, so the timestamp would not make much difference anyway.

arp7

Minor comments. The change looks pretty good to me overall.

elek · 2019-10-13T07:32:51Z

Thank you very much to open this pull request.

During the weekend the Ozone source code has been moved out from apache/hadoop repository to apache/hadoop-ozone repository.

This git commits are rewritten, but the branch of this pull request is also transformed (state of Saturday morning), you can use the new, migrated branch to recreate this pull request.

Your pull request is important for us: Can you please re-create your pull request in the new repository?

1. Create a new fork of https://github.com/apache/hadoop-ozone

2. Clone it and have both your fork and the apache repo as remotes:

git clone git@github.com:adoroszlai/hadoop-ozone.git
cd hadoop-ozone
git remote add apache git@github.com:apache/hadoop-ozone.git
git fetch apache

3. Fetch your migrated branch and push it to your fork.

git checkout -b HDDS-1228 apache/HDDS-1228
git push origin HDDS-1228

4. And create the new pull request on the new repository.

https://github.com/apache/hadoop-ozone/compare/master...adoroszlai:HDDS-1228?expand=1

If you need more information, please check this wiki page or contact with me (my github user name + apache.org).

Thank you, and sorry for the inconvenience.

adoroszlai · 2019-10-13T18:23:05Z

Moved to apache/ozone#7

HDDS-1228. Chunk Scanner Checkpoints

4718322

elek added the ozone label Oct 8, 2019

arp7 self-requested a review October 10, 2019 14:30

arp7 assigned adoroszlai Oct 10, 2019

arp7 reviewed Oct 11, 2019

View reviewed changes

adoroszlai closed this Oct 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-1228. Chunk Scanner Checkpoints #1622

HDDS-1228. Chunk Scanner Checkpoints #1622

Uh oh!

adoroszlai commented Oct 8, 2019

Uh oh!

adoroszlai commented Oct 8, 2019

Uh oh!

hadoop-yetus commented Oct 8, 2019

Uh oh!

adoroszlai commented Oct 10, 2019

Uh oh!

arp7 Oct 11, 2019

Uh oh!

arp7 Oct 11, 2019

Uh oh!

adoroszlai Oct 14, 2019

Uh oh!

arp7 Oct 11, 2019

Uh oh!

adoroszlai Oct 14, 2019

Uh oh!

arp7 left a comment

Uh oh!

elek commented Oct 13, 2019

Uh oh!

adoroszlai commented Oct 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HDDS-1228. Chunk Scanner Checkpoints #1622

HDDS-1228. Chunk Scanner Checkpoints #1622

Uh oh!

Conversation

adoroszlai commented Oct 8, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

adoroszlai commented Oct 8, 2019

Uh oh!

hadoop-yetus commented Oct 8, 2019

Uh oh!

adoroszlai commented Oct 10, 2019

Uh oh!

arp7 Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

arp7 Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

adoroszlai Oct 14, 2019

Choose a reason for hiding this comment

Uh oh!

arp7 Oct 11, 2019

Choose a reason for hiding this comment

Uh oh!

adoroszlai Oct 14, 2019

Choose a reason for hiding this comment

Uh oh!

arp7 left a comment

Choose a reason for hiding this comment

Uh oh!

elek commented Oct 13, 2019

Uh oh!

adoroszlai commented Oct 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants