Skip to content

HDFS-16266. Add remote port information to HDFS audit log #3538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 4, 2021

Conversation

tomscut
Copy link
Contributor

@tomscut tomscut commented Oct 10, 2021

JIRA: HDFS-16266

In our production environment, we occasionally encounter a problem where a user submits an abnormal computation task, causing a sudden flood of requests, which causes the queueTime and processingTime of the Namenode to rise very high, causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based on metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but there is no port information, so it is difficult to locate specific processes sometimes. Therefore, I propose that we add the port information to the audit log, so that we can easily track the upstream process.

Currently, some projects contain port information in audit logs, such as Hbase and Alluxio. I think it is also necessary to add port information for HDFS audit logs.

Before:
before-hdfs-audit-log

After:
hdfs-audit-log

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@ferhui
Copy link
Contributor

ferhui commented Oct 11, 2021

Hi @tomscut Is it necessary to add port to audit log. I am afraid that this will have a big influence. Because many users analyze audit log as table and this maybe change format.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 11, 2021

Hi @tomscut Is it necessary to add port to audit log. I am afraid that this will have a big influence. Because many users analyze audit log as table and this maybe change format.

Thanks @ferhui for your review.

I also have similar concerns, so I published a discussion to HADOOP community, hoping to let the members of the community know about this matter and then give some suggestions.

However, Adding port is to modify the internal content of the IP field, which has little impact on the overall layout. In our cluster, we parse the audit log through Vector and send the data to Kafka, which is unaffected.

@hadoop-yetus

This comment has been minimized.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 12, 2021

Hi @jojochuang @tasanuma @goiri @Hexiaoqiao, could you please also take a look at this? Do you think it is reasonable to add remote port into ip field like ip=/0.0.0.0:1234? And we can make it optional as @iwasakims suggested. Thank you very much.

@aajisaka
Copy link
Member

And we can make it optional as @iwasakims suggested. Thank you very much.

+1 to make it optional.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 15, 2021 via email

@ferhui
Copy link
Contributor

ferhui commented Oct 19, 2021

It's good that port is optional.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 19, 2021

It's good that port is optional.

Thanks @ferhui for your comment. I made it configurable and committed just now.

@tomscut tomscut requested a review from aajisaka October 19, 2021 07:16
@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 20, 2021

Failed junit tests | hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS

This failed unit test is unrelated to the change.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 20, 2021

Hi @yakirgb @iwasakims @aajisaka @tasanuma @ayushtkn @ferhui, I made it configurable. Please help review the code. Thank you very much.

@tasanuma
Copy link
Member

Thanks for updating the PR, @tomscut.

  • I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?
  • After HDFS-13293, Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 21, 2021

Thanks for updating the PR, @tomscut.

  • I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?
  • After HDFS-13293, Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.

Thanks @tasanuma and your colleagues for your good advice. And sorry for the late reply.

I discussed with my colleagues, and they suggested that adding a new port field would have less impact on users who are analyzing the audit logs instead of expanding the existing IP field. What do you think?
I think it would be nice to put the port in a separate field, but adding the port to the IP field is optional at the moment, so I'm a little confused which way is more appropriate. I'd like to ask a few other committers to look at this and give some suggestions. Anyway, I will update the PR in time.
@aajisaka @iwasakims @ayushtkn @ferhui @Hexiaoqiao @goiri @jojochuang Could you please take a look at this and give some suggestions? Thank you very much!

After [HDFS-13293](https://issues.apache.org/jira/browse/HDFS-13293), Router is forwarding client IP via CallerContext. How about adding the client-side port to the CallerContext as well? Maybe we can consider it in another JIRA.
I would like to open a new JIRA to do this. Thank you for pointing this out.

@ayushtkn
Copy link
Member

ayushtkn commented Oct 22, 2021

I haven't gone through the entire discussion/code. Just that whether we should modify the existing field or add a new one. Technically both are correct and I don't see any serious issue with either(not thinking too deep). But I feel for the parsers to adapt, if there was a new field, it might be little bit more easy, Rather than trying to figure out whether the existing field has a port or not. Just my thoughts, I am Ok with whichever way most people tend to agree.
Anyway whatever we do should be optional & guarded by a config.

@jojochuang
Copy link
Contributor

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance:
It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext:
the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 26, 2021

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance: It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext: the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

I haven't gone through the entire discussion/code. Just that whether we should modify the existing field or add a new one. Technically both are correct and I don't see any serious issue with either(not thinking too deep). But I feel for the parsers to adapt, if there was a new field, it might be little bit more easy, Rather than trying to figure out whether the existing field has a port or not. Just my thoughts, I am Ok with whichever way most people tend to agree. Anyway whatever we do should be optional & guarded by a config.

Thanks @ayushtkn for your comments and suggestions.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 26, 2021

The API is declared Public, Evolving. If it stays in Hadoop 3.4.0 I am fine with it.

We used to have an audit logger (Cloudera Navigator) that extends the AuditLogger interface. But we've moved away from that.

Performance: It would have a slight performance penalty because every audit log op will always convert InetAddress to a string, regardless if audit logger is off (audit log level = debug or dfs.namenode.audit.log.debug.cmdlist has the excluded op)). It's probably acceptable since audit is logged outside of namenode lock.

CallerContext: the caller context is probably a better option when you want to do fine-grained post-mortem anyway. Maybe we can modify the caller context to attach remote port so that it doesn't break api compatibility. Just a thought.

Thanks @jojochuang for your careful consideration and advice.

I think it's a very good idea to add remote port to the CallerContext, these will not affect the compatibility @tasanuma mentioned. After the user enable the CallerContext, we add clientPort to the CallerContext, similar to how the Router sets clientIp to the CallerContext.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 26, 2021

Previously, I didn't think it would be too much of an impact to append a port to the IP field if the feature was made configurable, but users might need to change the resolution rules.

Considering compatibility, @tasanuma suggests adding fields, and @jojochuang suggests putting port in the CallerContext. If we put the port in the CallerContext, it will not affect field resolution. The content in the callerContext is also dynamic, which is more flexible.

Thank you all for your advice and help. I will update the PR later.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 27, 2021

Hi @tasanuma @jojochuang @ayushtkn @ferhui , could you please take a look at this PR again. Thank you very much.

The code was submitted with multiple versions, resulting in some invalid changes (removing some white space).

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 27s Maven dependency ordering for branch
+1 💚 mvninstall 21m 40s trunk passed
+1 💚 compile 22m 29s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 19m 15s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 3m 47s trunk passed
+1 💚 mvnsite 3m 14s trunk passed
+1 💚 javadoc 2m 18s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 3m 24s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 5m 43s trunk passed
+1 💚 shadedclient 22m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 2m 13s the patch passed
+1 💚 compile 21m 22s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 21m 22s the patch passed
+1 💚 compile 19m 50s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 19m 50s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 49s root: The patch generated 0 new + 606 unchanged - 2 fixed = 606 total (was 608)
+1 💚 mvnsite 3m 10s the patch passed
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 2m 16s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 3m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 6m 10s the patch passed
+1 💚 shadedclient 22m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 58s hadoop-common in the patch passed.
+1 💚 unit 388m 27s hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 12s The patch does not generate ASF License warnings.
612m 49s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/artifact/out/Dockerfile
GITHUB PR #3538
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname Linux 63dfc831ddaf 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cc82c7c
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/testReport/
Max. process+thread count 3304 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3538/10/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm +1

@tomscut
Copy link
Contributor Author

tomscut commented Oct 29, 2021

lgtm +1

Thanks @jojochuang for your comment.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 29, 2021

Hi @tasanuma @aajisaka @ferhui @ayushtkn , could you also take a look? Thanks a lot.

@tasanuma
Copy link
Member

Thanks for updating it, @tomscut.
I tried it with my RBF cluster. There is a client server (1.1.1.1), a DFS Router (2.2.2.2), and NameNode. When a client sends a request to the Router, NameNode logs the following.

INFO FSNamesystem.audit: allowed=true   ugi=tasanuma ip=/2.2.2.2       cmd=listStatus  src=/user/tasanuma      dst=null        perm=null       proto=rpc       callerContext=CLI,clientIp:1.1.1.1,clientPort:33070

In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

@tasanuma
Copy link
Member

How about using the word of "port" instead of "clientPort" here, and adding "clientPort" as the actual client server port for RBF in another JIRA?

@tomscut
Copy link
Contributor Author

tomscut commented Oct 29, 2021

Thanks for updating it, @tomscut. I tried it with my RBF cluster. There is a client server (1.1.1.1), a DFS Router (2.2.2.2), and NameNode. When a client sends a request to the Router, NameNode logs the following.

INFO FSNamesystem.audit: allowed=true   ugi=tasanuma ip=/2.2.2.2       cmd=listStatus  src=/user/tasanuma      dst=null        perm=null       proto=rpc       callerContext=CLI,clientIp:1.1.1.1,clientPort:33070

In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

Thank you @tasanuma very much for your test. Because the router acts as a client of the Namenode, in this case the port of router is indeed regarded as a clientport.

@tomscut
Copy link
Contributor Author

tomscut commented Oct 30, 2021

How about using the word of "port" instead of "clientPort" here, and adding "clientPort" as the actual client server port for RBF in another JIRA?

Thank you @tasanuma for your advice.

Based on your suggestion, I have an idea like this:

We leave the ClientPort field for now.

  1. When a client sends a request directly to Namenode, the clientPort records the port of the real client.

  2. When the client sends a request to Router and then forwards to NameNode, we set the clientPort (the real clientPort) in the CallerContext of the Router. Before NameNode prints the audit log, if the CallerContext already contains the clientPort field, we will not set port again. (I plan to do this in another PR)

In both cases, only one clientPort field is left in the CallerContext, which holds the actual clientPort. What do you think of this? Looking forward to your comments. Thank you.

@tasanuma
Copy link
Member

tasanuma commented Nov 1, 2021

@tomscut Thanks for your thoughts. That makes sense to me.

Copy link
Member

@tasanuma tasanuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 1, 2021

@tomscut Thanks for your thoughts. That makes sense to me.

Thanks @tasanuma for your reply and review.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 2, 2021

Hi @aajisaka , could you please review this again. Thanks a lot.

Copy link
Member

@aajisaka aajisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thank you @tomscut, @jojochuang, and @tasanuma for updating the patch and the discussion.

In this case, clientIp:1.1.1.1 is the IP of the client server, but clientPort:33070 is the port of the DFS Router (2.2.2.2), not the one of the client server. It would be confusing for the users.

I suppose the name clientIp should be more specific, such as routerClientIp. And then we can add the router client port as routerClientPort.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 2, 2021

routerClientIp

Thanks @aajisaka for your review and comment. I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

@aajisaka
Copy link
Member

aajisaka commented Nov 2, 2021

I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

Agreed. Let's discuss in a separate JIRA.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 2, 2021

I will make some changes in the other JIRA if necessary, because this might involve the callContext from Router. What do you think of this? Thank you.

Agreed. Let's discuss in a separate JIRA.

Thanks @aajisaka for your quick reply.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 3, 2021

Hi @tasanuma @jojochuang @aajisaka , could you please help merge this PR. And I will open a new JIRA based on it. Thanks a lot.

@tasanuma
Copy link
Member

tasanuma commented Nov 4, 2021

There are three +1s. So I'd like to merge it.

@tasanuma tasanuma merged commit 359b03c into apache:trunk Nov 4, 2021
@tasanuma
Copy link
Member

tasanuma commented Nov 4, 2021

Thanks for your contribution, @tomscut. Thanks all for your discussion and reviews.

@tomscut
Copy link
Contributor Author

tomscut commented Nov 4, 2021

Thanks @tasanuma for the merge. Thanks all for your reviews and suggestions.

omalley pushed a commit that referenced this pull request Mar 14, 2022
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
Cherry-picked from 359b03c by Owen O'Malley
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 26, 2024
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
NyteKnight pushed a commit to NyteKnight/hadoop that referenced this pull request Jun 25, 2024
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
Cherry-picked from 359b03c by Owen O'Malley
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants