Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-919. Enable prometheus endpoints for Ozone datanodes #502

Merged
merged 7 commits into from
Mar 5, 2019

Conversation

elek
Copy link
Member

@elek elek commented Feb 19, 2019

HDDS-846 provides a new metric endpoint which publishes the available Hadoop metrics in prometheus friendly format with a new servlet.

Unfortunately it's enabled only on the scm/om side. It would be great to enable it in the Ozone/HDDS datanodes on the web server of the HDDS Rest endpoint.

See: https://issues.apache.org/jira/browse/HDDS-919

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Feb 22, 2019

@elek
Can you rebase this PR?

@@ -37,6 +37,10 @@ http://maven.apache.org/xsd/maven-4.0.0.xsd">
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdds-server-framework</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is duplicated, as above lines 37-38 already have added this dependency.

<property>
<name>hdds.datanode.https-bind-host</name>
<value>0.0.0.0</value>
<tag>OZONE, MANAGEMENT</tag>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for https keys, can we add SECURITY tag also?

@elek
Copy link
Member Author

elek commented Feb 26, 2019

Thanks @bharatviswa504 the review.

  • I added the SECURITY tags to the kerberos/config keys
  • I added HDDS tag (instead of OZONE) to all the config keys as datanode is an HDDS component (especially if we will remove the rest server)
  • I added the MANAGEMENT tag to all the config keys as this endpoint is only for management.
  • And I rebased the patch.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 6 #502 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #502
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-502/2/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 23 Docker mode activated.
_ Prechecks _
0 yamllint 0 yamllint was not available.
+1 @author 0 The patch does not contain any @author tags.
-1 test4tests 0 The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
0 mvndep 77 Maven dependency ordering for branch
+1 mvninstall 1359 trunk passed
+1 compile 1136 trunk passed
+1 checkstyle 228 trunk passed
-1 mvnsite 40 dist in trunk failed.
+1 shadedclient 1215 branch has no errors when building and testing our client artifacts.
0 findbugs 0 Skipped patched modules with no Java source: hadoop-ozone/dist
+1 findbugs 159 trunk passed
+1 javadoc 119 trunk passed
_ Patch Compile Tests _
0 mvndep 24 Maven dependency ordering for patch
-1 mvninstall 19 dist in the patch failed.
+1 compile 1108 the patch passed
+1 javac 1108 the patch passed
+1 checkstyle 277 the patch passed
-1 mvnsite 31 dist in the patch failed.
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 4 The patch has no ill-formed XML file.
+1 shadedclient 783 patch has no errors when building and testing our client artifacts.
0 findbugs 0 Skipped patched modules with no Java source: hadoop-ozone/dist
+1 findbugs 169 the patch passed
+1 javadoc 101 the patch passed
_ Other Tests _
-1 unit 95 common in the patch failed.
-1 unit 103 container-service in the patch failed.
-1 unit 33 dist in the patch failed.
+1 asflicense 47 The patch does not generate ASF License warnings.
7338
Reason Tests
Failed junit tests hadoop.hdds.security.x509.certificate.client.TestDefaultCertificateClient
hadoop.ozone.container.common.TestDatanodeStateMachine
Subsystem Report/Notes
Docker Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/Dockerfile
GITHUB PR #502
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml yamllint
uname Linux e776ef3585a4 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 59ba355
maven version: Apache Maven 3.3.9
Default Java 1.8.0_191
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/branch-mvnsite-hadoop-ozone_dist.txt
findbugs v3.1.0-RC1
mvninstall https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/patch-mvninstall-hadoop-ozone_dist.txt
mvnsite https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/patch-mvnsite-hadoop-ozone_dist.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/patch-unit-hadoop-hdds_common.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/patch-unit-hadoop-hdds_container-service.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/artifact/out/patch-unit-hadoop-ozone_dist.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/testReport/
Max. process+thread count 385 (vs. ulimit of 5500)
modules C: hadoop-hdds/common hadoop-hdds/container-service hadoop-ozone/dist U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-502/1/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@bharatviswa504
Copy link
Contributor

Thank You @elek for addressing comments.
There are many additional changes are done in ozone-default.xml, not related to this, can we do them as part of separate Jira. As those changes do not belong to this Jira.

@elek
Copy link
Member Author

elek commented Feb 27, 2019

Thank You @elek for addressing comments.
There are many additional changes are done in ozone-default.xml, not related to this, can we do them as part of separate Jira. As those changes do not belong to this Jira.

Thanks for the warning @bharatviswa504. It's a rebase error they shouldn't be there. Let me rebase the patch and remove the unrelated formatting changes...

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
0 reexec 0 Docker mode activated.
-1 patch 6 #502 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.
Subsystem Report/Notes
GITHUB PR #502
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-502/3/console
Powered by Apache Yetus 0.9.0 http://yetus.apache.org

This message was automatically generated.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Feb 27, 2019

Thank You @elek for the update.

One minor comment: We dont need the change in hadoop-hdds/container-service/pom.xml.

As we have already those dependencies in Line 36-39. You can take care of this during commit.

<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdds-server-framework</artifactId>
    </dependency>

@elek
Copy link
Member Author

elek commented Feb 28, 2019

One minor comment: We dont need the change in hadoop-hdds/container-service/pom.xml.

Ups, thanks. I removed it.

@bharatviswa504
Copy link
Contributor

Thank You @elek for the update.
+1 LGTM (pending jenkins).

@bharatviswa504
Copy link
Contributor

Hi @elek
When I am planning to commit, just seen test failures.
And also, in MiniOzoneClusterImpl, in configureHddsDatanodes() we need to set this port address to 0. As when multiple dn's start on the localhost, start of httpserver will fail.

I think this patch needs some more work, see below error.

2019-02-28 20:08:24,593 INFO  hdfs.DFSUtil (DFSUtil.java:httpServerTemplateForNNAndJN(1641)) - Starting Web-server for hddsDatanode at: http://0.0.0.0:9882
2019-02-28 20:08:24,594 ERROR ozone.HddsDatanodeService (HddsDatanodeService.java:start(189)) - HttpServer failed to start.
java.io.FileNotFoundException: webapps/hddsDatanode not found in CLASSPATH
	at org.apache.hadoop.http.HttpServer2.getWebAppsPath(HttpServer2.java:1070)
	at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:536)
	at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:119)
	at org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:433)
	at org.apache.hadoop.hdds.server.BaseHttpServer.<init>(BaseHttpServer.java:90)
	at org.apache.hadoop.ozone.HddsDatanodeHttpServer.<init>(HddsDatanodeHttpServer.java:34)
	at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:186)
	at org.apache.hadoop.ozone.MiniOzoneClusterImpl.lambda$startHddsDatanodes$2(MiniOzoneClusterImpl.java:367)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at org.apache.hadoop.ozone.MiniOzoneClusterImpl.startHddsDatanodes(MiniOzoneClusterImpl.java:367)
	at org.apache.hadoop.ozone.om.TestScmChillMode.init(TestScmChillMode.java:99)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

@elek elek force-pushed the HDDS-919 branch 2 times, most recently from 29f4d60 to 5aff6c9 Compare March 1, 2019 16:30
@elek
Copy link
Member Author

elek commented Mar 1, 2019

Shame on me. I created the .keep file locally but it's ignored by the .gitignore file. I had it locally but it was not pushed. Now it's also fixed.

@apache apache deleted a comment from hadoop-yetus Mar 4, 2019
@bharatviswa504
Copy link
Contributor

+1 LGTM.
I don't think test failures are related to this patch.

@ajayydv
Copy link
Contributor

ajayydv commented Mar 5, 2019

@elek seems some of the test failures are related. Could you please take a look.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Mar 5, 2019

I don't think test failures are not by this patch, not sure if i am missing something here.

public static final int HDDS_DATANODE_HTTPS_BIND_PORT_DEFAULT = 9883;
public static final String
HDDS_DATANODE_HTTP_KERBEROS_PRINCIPAL_KEY =
"hdds.datanode.http.kerberos.principal";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just reuse the corresponding HDFS key here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arp7 i think idea is to allow our hdds plugin to have separate identity.

@ajayydv
Copy link
Contributor

ajayydv commented Mar 5, 2019

@bharatviswa504 you are right, all of the passed tests passed locally except TestFailureHandlingByClient#testBlockWritesWithDnFailures. testBlockWritesWithDnFailures seems to fail even in trunk, we can work on it separate jira.

@ajayydv ajayydv merged commit 7f636b4 into apache:trunk Mar 5, 2019
asfgit pushed a commit that referenced this pull request Mar 5, 2019
Pull request from Elek, Márton. (#502)

(cherry picked from commit 7f636b4)
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…amApp).

Author: Shanthoosh Venkataraman <santhoshvenkat1988@gmail.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes apache#502 from shanthoosh/local_application_runner_set_exception_in_finish
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants