HDFS-16266. Add remote port information to HDFS audit log #3538

tomscut · 2021-10-10T14:23:10Z

In our production environment, we occasionally encounter a problem where a user submits an abnormal computation task, causing a sudden flood of requests, which causes the queueTime and processingTime of the Namenode to rise very high, causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based on metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but there is no port information, so it is difficult to locate specific processes sometimes. Therefore, I propose that we add the port information to the audit log, so that we can easily track the upstream process.

Currently, some projects contain port information in audit logs, such as Hbase and Alluxio. I think it is also necessary to add port information for HDFS audit logs.

Before:

After:

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java

ferhui · 2021-10-11T11:41:00Z

Hi @tomscut Is it necessary to add port to audit log. I am afraid that this will have a big influence. Because many users analyze audit log as table and this maybe change format.

tomscut · 2021-10-11T11:49:55Z

Hi @tomscut Is it necessary to add port to audit log. I am afraid that this will have a big influence. Because many users analyze audit log as table and this maybe change format.

Thanks @ferhui for your review.

I also have similar concerns, so I published a discussion to HADOOP community, hoping to let the members of the community know about this matter and then give some suggestions.

However, Adding port is to modify the internal content of the IP field, which has little impact on the overall layout. In our cluster, we parse the audit log through Vector and send the data to Kafka, which is unaffected.

tomscut · 2021-10-12T02:18:02Z

Hi @jojochuang @tasanuma @goiri @Hexiaoqiao, could you please also take a look at this? Do you think it is reasonable to add remote port into ip field like ip=/0.0.0.0:1234? And we can make it optional as @iwasakims suggested. Thank you very much.

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java

aajisaka · 2021-10-15T07:27:29Z

And we can make it optional as @iwasakims suggested. Thank you very much.

+1 to make it optional.

tomscut · 2021-10-15T09:04:55Z

Thanks @akira Ajisaka ***@***.***> for your comments and suggestion. I will update it ASAP. Akira Ajisaka ***@***.***> 于2021年10月15日周五下午3:27写道：

…

And we can make it optional as @iwasakims <https://github.com/iwasakims> suggested. Thank you very much. +1 to make it optional. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3538 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANEUPM7J6PDPOGD5UJXX74LUG7JW3ANCNFSM5FWQZO3A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ferhui · 2021-10-19T06:56:32Z

It's good that port is optional.

tomscut · 2021-10-19T07:03:16Z

It's good that port is optional.

Thanks @ferhui for your comment. I made it configurable and committed just now.

tomscut · 2021-10-20T11:46:07Z

Failed junit tests | hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS

This failed unit test is unrelated to the change.

tomscut · 2021-10-20T11:50:42Z

Hi @yakirgb @iwasakims @aajisaka @tasanuma @ayushtkn @ferhui, I made it configurable. Please help review the code. Thank you very much.

tasanuma · 2021-10-21T08:17:50Z

Thanks for updating the PR, @tomscut.

I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?
After HDFS-13293, Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.

tomscut · 2021-10-21T11:26:20Z

Thanks for updating the PR, @tomscut.

I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?

After HDFS-13293, Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.

Thanks @tasanuma and your colleagues for your good advice. And sorry for the late reply.

I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?
I think it would be nice to put the port in a separate field, but adding the port to the IP field is optional at the moment, so I'm a little confused which way is more appropriate. I'd like to ask a few other committers to look at this and give some suggestions. Anyway, I will update the PR in time.
@aajisaka @iwasakims @ayushtkn @ferhui @Hexiaoqiao @goiri @jojochuang Could you please take a look at this and give some suggestions? Thank you very much!

After [HDFS-13293](https://issues.apache.org/jira/browse/HDFS-13293), Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.
I would like to open a new JIRA to do this. Thank you for pointing this out.

ayushtkn · 2021-10-22T03:00:20Z

I haven't gone through the entire discussion/code. Just that whether we should modify the existing field or add a new one. Technically both are correct and I don't see any serious issue with either(not thinking too deep). But I feel for the parsers to adapt, if there was a new field, it might be little bit more easy, Rather than trying to figure out whether the existing field has a port or not. Just my thoughts, I am Ok with whichever way most people tend to agree.
Anyway whatever we do should be optional & guarded by a config.

jojochuang · 2021-10-25T08:11:16Z

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance:
It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext:
the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

tomscut · 2021-10-26T00:28:41Z

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance: It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext: the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

I haven't gone through the entire discussion/code. Just that whether we should modify the existing field or add a new one. Technically both are correct and I don't see any serious issue with either(not thinking too deep). But I feel for the parsers to adapt, if there was a new field, it might be little bit more easy, Rather than trying to figure out whether the existing field has a port or not. Just my thoughts, I am Ok with whichever way most people tend to agree. Anyway whatever we do should be optional & guarded by a config.

Thanks @ayushtkn for your comments and suggestions.

tomscut · 2021-10-26T00:30:09Z

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance: It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext: the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

Thanks @jojochuang for your careful consideration and advice.

I think it's a very good idea to add remote port to the CallerContext, these will not affect the compatibility @tasanuma mentioned. After the user enable the CallerContext, we add clientPort to the CallerContext, similar to how the Router sets clientIp to the CallerContext.

tomscut · 2021-10-26T00:41:29Z

Previously, I didn't think it would be too much of an impact to append a port to the IP field if the feature was made configurable, but users might need to change the resolution rules.

Considering compatibility, @tasanuma suggests adding fields, and @jojochuang suggests putting port in the CallerContext. If we put the port in the CallerContext, it will not affect field resolution. The content in the callerContext is also dynamic, which is more flexible.

Thank you all for your advice and help. I will update the PR later.

tomscut · 2021-10-27T02:05:44Z

Hi @tasanuma @jojochuang @ayushtkn @ferhui , could you please take a look at this PR again. Thank you very much.

The code was submitted with multiple versions, resulting in some invalid changes (removing some white space).

...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

hadoop-yetus · 2021-10-28T13:09:16Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 5s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	13m 27s		Maven dependency ordering for branch
+1 💚	mvninstall	21m 40s		trunk passed
+1 💚	compile	22m 29s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	compile	19m 15s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	checkstyle	3m 47s		trunk passed
+1 💚	mvnsite	3m 14s		trunk passed
+1 💚	javadoc	2m 18s		trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	3m 24s		trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	5m 43s		trunk passed
+1 💚	shadedclient	22m 34s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s		Maven dependency ordering for patch
+1 💚	mvninstall	2m 13s		the patch passed
+1 💚	compile	21m 22s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javac	21m 22s		the patch passed
+1 💚	compile	19m 50s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	javac	19m 50s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	3m 49s		root: The patch generated 0 new + 606 unchanged - 2 fixed = 606 total (was 608)
+1 💚	mvnsite	3m 10s		the patch passed
+1 💚	xml	0m 1s		The patch has no ill-formed XML file.
+1 💚	javadoc	2m 16s		the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚	javadoc	3m 23s		the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚	spotbugs	6m 10s		the patch passed
+1 💚	shadedclient	22m 53s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	18m 58s		hadoop-common in the patch passed.
+1 💚	unit	388m 27s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	1m 12s		The patch does not generate ASF License warnings.
		612m 49s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/artifact/out/Dockerfile
GITHUB PR	#3538
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 63dfc831ddaf 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `cc82c7c`
Default Java	Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/testReport/
Max. process+thread count	3304 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

jojochuang

lgtm +1

tomscut · 2021-10-29T03:27:48Z

lgtm +1

Thanks @jojochuang for your comment.

tomscut · 2021-10-29T03:30:03Z

Hi @tasanuma @aajisaka @ferhui @ayushtkn , could you also take a look? Thanks a lot.

tasanuma · 2021-10-29T08:19:19Z

Thanks for updating it, @tomscut.
I tried it with my RBF cluster. There is a client server (1.1.1.1), a DFS Router (2.2.2.2), and NameNode. When a client sends a request to the Router, NameNode logs the following.

INFO FSNamesystem.audit: allowed=true   ugi=tasanuma ip=/2.2.2.2       cmd=listStatus  src=/user/tasanuma      dst=null        perm=null       proto=rpc       callerContext=CLI,clientIp:1.1.1.1,clientPort:33070

In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

tasanuma · 2021-10-29T08:38:29Z

How about using the word of "port" instead of "clientPort" here, and adding "clientPort" as the actual client server port for RBF in another JIRA?

tomscut · 2021-10-29T23:16:38Z

Thanks for updating it, @tomscut. I tried it with my RBF cluster. There is a client server (1.1.1.1), a DFS Router (2.2.2.2), and NameNode. When a client sends a request to the Router, NameNode logs the following.
INFO FSNamesystem.audit: allowed=true   ugi=tasanuma ip=/2.2.2.2       cmd=listStatus  src=/user/tasanuma      dst=null        perm=null       proto=rpc       callerContext=CLI,clientIp:1.1.1.1,clientPort:33070
In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

Thank you @tasanuma very much for your test. Because the router acts as a client of the Namenode, in this case the port of router is indeed regarded as a clientport.

tomscut · 2021-10-30T00:07:57Z

How about using the word of "port" instead of "clientPort" here, and adding "clientPort" as the actual client server port for RBF in another JIRA?

Thank you @tasanuma for your advice.

Based on your suggestion, I have an idea like this:

We leave the ClientPort field for now.

When a client sends a request directly to Namenode, the clientPort records the port of the real client.
When the client sends a request to Router and then forwards to NameNode, we set the clientPort (the real clientPort) in the CallerContext of the Router. Before NameNode prints the audit log, if the CallerContext already contains the clientPort field, we will not set port again. (I plan to do this in another PR)

In both cases, only one clientPort field is left in the CallerContext, which holds the actual clientPort. What do you think of this? Looking forward to your comments. Thank you.

tasanuma · 2021-11-01T02:23:47Z

@tomscut Thanks for your thoughts. That makes sense to me.

tasanuma

LGTM.

tomscut · 2021-11-01T02:29:13Z

@tomscut Thanks for your thoughts. That makes sense to me.

Thanks @tasanuma for your reply and review.

tomscut · 2021-11-02T02:38:03Z

Hi @aajisaka , could you please review this again. Thanks a lot.

aajisaka

+1. Thank you @tomscut, @jojochuang, and @tasanuma for updating the patch and the discussion.

In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

I suppose the name clientIp should be more specific, such as routerClientIp. And then we can add the router client port as routerClientPort.

tomscut · 2021-11-02T12:05:12Z

routerClientIp

Thanks @aajisaka for your review and comment. I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

aajisaka · 2021-11-02T12:26:38Z

I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

Agreed. Let's discuss in a separate JIRA.

tomscut · 2021-11-02T12:28:50Z

I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

Agreed. Let's discuss in a separate JIRA.

Thanks @aajisaka for your quick reply.

tomscut · 2021-11-03T04:29:56Z

Hi @tasanuma @jojochuang @aajisaka , could you please help merge this PR. And I will open a new JIRA based on it. Thanks a lot.

tasanuma · 2021-11-04T01:13:25Z

There are three +1s. So I'd like to merge it.

tasanuma · 2021-11-04T01:16:57Z

Thanks for your contribution, @tomscut. Thanks all for your discussion and reviews.

tomscut · 2021-11-04T01:24:36Z

Thanks @tasanuma for the merge. Thanks all for your reviews and suggestions.

Reviewed-by: Akira Ajisaka <aajisaka@apache.org> Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org> Cherry-picked from 359b03c by Owen O'Malley

Reviewed-by: Akira Ajisaka <aajisaka@apache.org> Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>

Reviewed-by: Akira Ajisaka <aajisaka@apache.org> Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org> Cherry-picked from 359b03c by Owen O'Malley

yakirgb reviewed Oct 10, 2021

View reviewed changes

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

aajisaka reviewed Oct 15, 2021

View reviewed changes

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java Outdated Show resolved Hide resolved

tomscut requested a review from aajisaka October 19, 2021 07:16

This comment has been minimized.

Sign in to view

HDFS-16266. Add remote port information to HDFS audit log

389d469

tomscut force-pushed the HDFS-16266 branch from ce3984c to 389d469 Compare October 26, 2021 10:32

This comment has been minimized.

Sign in to view

jojochuang reviewed Oct 28, 2021

View reviewed changes

...s-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Outdated Show resolved Hide resolved

make code more understandable

cc82c7c

jojochuang approved these changes Oct 29, 2021

View reviewed changes

tasanuma approved these changes Nov 1, 2021

View reviewed changes

aajisaka approved these changes Nov 2, 2021

View reviewed changes

tasanuma merged commit 359b03c into apache:trunk Nov 4, 2021

tomscut mentioned this pull request Nov 9, 2021

HDFS-16310. RBF: Add client port to CallerContext for Router #3635

Merged

HDFS-16266. Add remote port information to HDFS audit log #3538

HDFS-16266. Add remote port information to HDFS audit log #3538

Uh oh!

Conversation

tomscut commented Oct 10, 2021

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ferhui commented Oct 11, 2021

Uh oh!

tomscut commented Oct 11, 2021

Uh oh!

This comment has been minimized.

tomscut commented Oct 12, 2021

Uh oh!

Uh oh!

aajisaka commented Oct 15, 2021

Uh oh!

tomscut commented Oct 15, 2021 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferhui commented Oct 19, 2021

Uh oh!

tomscut commented Oct 19, 2021

Uh oh!

This comment has been minimized.

This comment has been minimized.

tomscut commented Oct 20, 2021

Uh oh!

tomscut commented Oct 20, 2021

Uh oh!

tasanuma commented Oct 21, 2021

Uh oh!

tomscut commented Oct 21, 2021

Uh oh!

ayushtkn commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jojochuang commented Oct 25, 2021

Uh oh!

tomscut commented Oct 26, 2021

Uh oh!

tomscut commented Oct 26, 2021

Uh oh!

tomscut commented Oct 26, 2021

Uh oh!

This comment has been minimized.

This comment has been minimized.

tomscut commented Oct 27, 2021

Uh oh!

Uh oh!

hadoop-yetus commented Oct 28, 2021

Uh oh!

jojochuang left a comment

Choose a reason for hiding this comment

Uh oh!

tomscut commented Oct 29, 2021

Uh oh!

tomscut commented Oct 29, 2021

Uh oh!

tasanuma commented Oct 29, 2021

Uh oh!

tasanuma commented Oct 29, 2021

Uh oh!

tomscut commented Oct 29, 2021

Uh oh!

tomscut commented Oct 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tasanuma commented Nov 1, 2021

Uh oh!

tasanuma left a comment

Choose a reason for hiding this comment

Uh oh!

tomscut commented Nov 1, 2021

Uh oh!

tomscut commented Nov 2, 2021

Uh oh!

tomscut commented Oct 15, 2021 via email •

edited

Loading

ayushtkn commented Oct 22, 2021 •

edited

Loading

tomscut commented Oct 30, 2021 •

edited

Loading