Skip to content

HADOOP-17424. Replace HTrace with No-Op tracer #2645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 1, 2021

Conversation

smengcl
Copy link
Contributor

@smengcl smengcl commented Jan 25, 2021

https://issues.apache.org/jira/browse/HADOOP-17424

Previous PR at #2632. The CI itself seems buggy, was complaining about a file I've already removed from the project. Filing this new PR to see if this solves that problem.

@smengcl
Copy link
Contributor Author

smengcl commented Jan 25, 2021

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@iwasakims
Copy link
Member

iwasakims commented Jan 26, 2021

The failure of TestDFSOutputStream#testCongestionBackoff is related.

[ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.594 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSOutputStream
[ERROR] testCongestionBackoff(org.apache.hadoop.hdfs.TestDFSOutputStream)  Time elapsed: 2.71 s  <<< FAILURE!
java.lang.AssertionError
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:832)
        at org.apache.hadoop.hdfs.TestDFSOutputStream.testCongestionBackoff(TestDFSOutputStream.java:299)

The NPE is on DataStreamer.java:702. DFSPacket#getTraceParents should be fixed?

java.lang.NullPointerException
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:702)
        at org.apache.hadoop.hdfs.TestDFSOutputStream.testCongestionBackoff(TestDFSOutputStream.java:299)

@iwasakims
Copy link
Member

TestBalancerWithHANameNodes and TestRollingUpgrade passed on my local.

@iwasakims
Copy link
Member

@smengcl I'm +1 to commit this after the test failure and findbugs warnings are fixed since this touches many files. We can address nits and documentation fix in follow-up JIRAs. Tracing does not affect usual code paths if not enabled.

  • mvn clean install -DskipTests -Pnative -Pdist worked.
  • The output of mvn dependency:tree looks good.

@smengcl
Copy link
Contributor Author

smengcl commented Jan 26, 2021

The failure of TestDFSOutputStream#testCongestionBackoff is related.

[ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.594 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSOutputStream
[ERROR] testCongestionBackoff(org.apache.hadoop.hdfs.TestDFSOutputStream)  Time elapsed: 2.71 s  <<< FAILURE!
java.lang.AssertionError
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:832)
        at org.apache.hadoop.hdfs.TestDFSOutputStream.testCongestionBackoff(TestDFSOutputStream.java:299)

The NPE is on DataStreamer.java:702. DFSPacket#getTraceParents should be fixed?

java.lang.NullPointerException
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:702)
        at org.apache.hadoop.hdfs.TestDFSOutputStream.testCongestionBackoff(TestDFSOutputStream.java:299)

Thanks @iwasakims for reviewing this. Yes they definitely look related. Will fix them.

@Sushmasree-28
Copy link
Contributor

@smengcl @iwasakims Thank you for the work here
I have taken your latest patch, compiled and trying to verify trace feature.
Just wanted to confirm, As per the PR, TraceAdminProtocols have been removed. Will these TraceAdminProtocol be handled in any follow-up JIRAs or will be removed permanently ?

@smengcl
Copy link
Contributor Author

smengcl commented Jan 27, 2021

@smengcl @iwasakims Thank you for the work here
I have taken your latest patch, compiled and trying to verify trace feature.
Just wanted to confirm, As per the PR, TraceAdminProtocols have been removed. Will these TraceAdminProtocol be handled in any follow-up JIRAs or will be removed permanently ?

Hi @Sushmasree-28 , thanks for taking your time to check this PR.

The current plan it to ditch the TraceAdminProtocol completely. As I previously (tentatively) implemented OpenTracing in #1846 , I don't find the legacy interface useful anymore (unless we want to maintain compatibility with HTrace -- which has become a potential security hazard -- and hence this PR). So I would just remove it here. Let me know if you have more concerns about it.

Change-Id: Ie00973147d99c41027642933364c03dbe8db4d8e
Change-Id: I3c57043354f063a9ef8ce9b149dde1a0500f82e2
@hadoop-yetus

This comment has been minimized.

Change-Id: Iece8222a67796768a7f1f016df3df867bd589ae4
@hadoop-yetus

This comment has been minimized.

@Sushmasree-28
Copy link
Contributor

@smengcl @iwasakims Thank you for the work here
I have taken your latest patch, compiled and trying to verify trace feature.
Just wanted to confirm, As per the PR, TraceAdminProtocols have been removed. Will these TraceAdminProtocol be handled in any follow-up JIRAs or will be removed permanently ?

Hi @Sushmasree-28 , thanks for taking your time to check this PR.

The current plan it to ditch the TraceAdminProtocol completely. As I previously (tentatively) implemented OpenTracing in #1846 , I don't find the legacy interface useful anymore (unless we want to maintain compatibility with HTrace -- which has become a potential security hazard -- and hence this PR). So I would just remove it here. Let me know if you have more concerns about it.

@smengcl , OK, thanks for the update regarding TraceAdminProtocol.

@Sushmasree-28
Copy link
Contributor

@smengcl We are also trying to replace Htrace with OpenTracing. Just curious to know, apart from replacing HTrace with No-Op tracer are you parallelly working on E2E tracing feature with OpenTracing as well?

@smengcl
Copy link
Contributor Author

smengcl commented Jan 29, 2021

@smengcl We are also trying to replace Htrace with OpenTracing. Just curious to know, apart from replacing HTrace with No-Op tracer are you parallelly working on E2E tracing feature with OpenTracing as well?

Yes as I mentioned #1846 was the POC patch for OpenTracing.

Now that OpenTelemetry has replaced OpenTracing, we should be looking at using OpenTelemetry instead. I haven't diven deep into the difference but I'm expecting the latter's APIs to be mostly compatible with its predecessor.

Change-Id: Iafb0db502aa352885d841a18b96fb8bd59230445
@smengcl
Copy link
Contributor Author

smengcl commented Jan 29, 2021

Update: TestBalancerWithHANameNodes passed locally. Addressed findbugs. Should expect the next CI to pass that check.

@smengcl smengcl marked this pull request as ready for review January 29, 2021 18:00
@smengcl smengcl self-assigned this Jan 29, 2021
@hadoop-yetus

This comment has been minimized.

@iwasakims
Copy link
Member

@smengcl LGTM overall. Could you address the findbugs warning? TestErasureCodingPoliciesWithRandomECPolicy and TestStandbyInProgressTail are not related.

Change-Id: I6b391862afe208fa78bf1de53644c889ab447fe1
@smengcl
Copy link
Contributor Author

smengcl commented Jan 31, 2021

@smengcl LGTM overall. Could you address the findbugs warning? TestErasureCodingPoliciesWithRandomECPolicy and TestStandbyInProgressTail are not related.

Done. Thanks!

@hadoop-yetus

This comment has been minimized.

@smengcl
Copy link
Contributor Author

smengcl commented Jan 31, 2021

findbugs passed after the last commit. TestNamenodeCapacityReport#testXceiverCount and TestEditLog are unrelated (ran and passed locally).

@smengcl
Copy link
Contributor Author

smengcl commented Jan 31, 2021

Just realized that when rebasing the patch some unwanted changes (minor license header diff, whitespace) is introduced. Will post a commit to amend that.

Change-Id: If8044adadac127b32154f6f5294311e92a46f63b
Change-Id: I0d5884d4e2cb6ee87307b81761c2f7b2d51a8884
@hadoop-yetus

This comment has been minimized.

Copy link
Member

@iwasakims iwasakims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@iwasakims iwasakims merged commit 1a205cc into apache:trunk Feb 1, 2021
@hadoop-yetus

This comment has been minimized.

@smengcl
Copy link
Contributor Author

smengcl commented Feb 1, 2021

Thanks @steveloughran @iwasakims @Sushmasree-28 for reviewing this patch. And thanks @iwasakims for merging this.

iwasakims pushed a commit to iwasakims/hadoop that referenced this pull request Oct 11, 2021
(cherry picked from commit 1a205cc)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants