Skip to content

WIP: 1110: Help recover in DOS attack #1311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: 1110
Choose a base branch
from

Conversation

gtmills
Copy link
Contributor

@gtmills gtmills commented Jun 5, 2025

Upstream is https://gerrit.openbmc.org/c/openbmc/bmcweb/+/80819 and https://gerrit.openbmc.org/c/openbmc/bmcweb/+/80839 and has merged. The DOS attack we are helping clear up is coming from the HMC. 2 commits:

  1. Fix DOS attack scenario

Problem : When 201 connections made in parallel to BMC and closed them immediately without properly sending a close_notify alert it was observed that Bmcweb server was taking several minutes to close the sockets. All the 200 TCP sockets were in CLOSE_WAIT state.

Journal log shows below line
[CRITICAL http_connection.hpp:213] 0x29d1ef0Max connection count exceeded.

Not able to make new connection

$ curl -k -H "X-Auth-Token:$bmc_token" -X GET https://${BMC_IP}/redfish
/v1/AccountService/Accounts
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to
127.0.0.1:2443

Fix : The bmcweb server failed to identify the end of stream at TCP (SSL
/TLS) layer, therefore added check to identify the end of stream which
closes the connection and socket

Change-Id: I1c277db0b774d33c656b4a2b1bd14f3575535bec

  1. Do hard close if client disobeys protocol

There are cases bmcweb might close the connection due to a violation of
the protocol. Currently these are done gracefully, under the assumption
that a client might attempt to recover. But this opens us up to
potentially leaving sockets open for far longer than we intend if the
client is completely gone, due to a disconnect or explicitly closing the
socket hard.

In cases where we get a protocol error, shutdown the socket hard, rather
than attempt to do things "correctly".

Tested:

I tested this MR using a script that simulated 5,000 parallel
connections simultaneously to BMC and closed them immediately without
properly sending a close_notify alert

Observations:

The BMC became unresponsive for 30-40 seconds before recovering.

After recovery, it took approximately 90 seconds to close all
connections in QEMU.
On real hardware, connection closure times may be slightly higher
(though still within expected parameters).

Conclusion:
This behavior aligns with expectations.

After 90 seconds observed that

No sockets in CLOSE_WAIT state

Able to make new connection.

curl -k -H "X-Auth-Token:$bmc_token" https://${ip}/redfish/v1/AccountService/Accounts
{
"@odata.id": "/redfish/v1/AccountService/Accounts",
"@odata.type": "#ManagerAccountCollection.ManagerAccountCollection",
"Description": "BMC User Accounts",
"Members": [
{
"@odata.id": "/redfish/v1/AccountService/Accounts/root"
}
],
"Members@odata.count": 1,
"Name": "Accounts Collection"
}

Change-Id: I6ab4347efd8fda9ae86bfbb8575666ad3eabe88c
Signed-off-by: Ed Tanous etanous@nvidia.com
Signed-off-by: Chandramohan Harkude chandramohan.harkude@gmail.com

Problem : When 201 connections made in parallel to BMC and closed them
immediately without properly sending a close_notify alert it was
observed that Bmcweb server was taking several minutes to close the
sockets. All the 200 TCP sockets were in CLOSE_WAIT state.

Journal log shows below line
[CRITICAL http_connection.hpp:213] 0x29d1ef0Max connection count
exceeded.

```
Not able to make new connection

$ curl -k -H "X-Auth-Token:$bmc_token"  -X GET https://${BMC_IP}/redfish
/v1/AccountService/Accounts
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to
127.0.0.1:2443

Fix : The bmcweb server failed to identify the end of stream at TCP (SSL
/TLS) layer, therefore added check to identify the end of stream which
closes the connection and socket

Test :

I tested this MR using a script that simulated 5,000 parallel
connections simultaneously to BMC and closed them immediately without
properly sending a close_notify alert

Observations:

The BMC became unresponsive for 30-40 seconds before recovering.

After recovery, it took approximately 90 seconds to close all
connections in QEMU.
On real hardware, connection closure times may be slightly higher
(though still within expected parameters).

Conclusion:
This behavior aligns with expectations.

After 90 seconds observed that
1) No sockets in CLOSE_WAIT state

2) Able to make new connection.

curl -k -H "X-Auth-Token:$bmc_token" https://${IP}/redfish/v1/AccountService/Accounts
{
  "@odata.id": "/redfish/v1/AccountService/Accounts",
  "@odata.type": "#ManagerAccountCollection.ManagerAccountCollection",
  "Description": "BMC User Accounts",
  "Members": [
    {
      "@odata.id": "/redfish/v1/AccountService/Accounts/root"
    }
  ],
  "Members@odata.count": 1,
  "Name": "Accounts Collection"
}
```

Change-Id: I1c277db0b774d33c656b4a2b1bd14f3575535bec
Signed-off-by: Chandramohan Harkude <chandramohan.harkude@gmail.com>
@gtmills gtmills changed the title 1110: help in dos attack 1110: Help recover in DOS attack Jun 5, 2025
Copy link
Contributor

@baemyung baemyung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me

There are cases bmcweb might close the connection due to a violation of
the protocol.  Currently these are done gracefully, under the assumption
that a client might attempt to recover.  But this opens us up to
potentially leaving sockets open for far longer than we intend if the
client is completely gone, due to a disconnect or explicitly closing the
socket hard.

In cases where we get a protocol error, shutdown the socket hard, rather
than attempt to do things "correctly".

Tested:

I tested this MR using a script that simulated 5,000 parallel
connections simultaneously to BMC and closed them immediately without
properly sending a close_notify alert

Observations:

The BMC became unresponsive for 30-40 seconds before recovering.

After recovery, it took approximately 90 seconds to close all
connections in QEMU.
On real hardware, connection closure times may be slightly higher
(though still within expected parameters).

Conclusion:
This behavior aligns with expectations.

After 90 seconds observed that

1) No sockets in CLOSE_WAIT state

2) Able to make new connection.

```

curl -k -H "X-Auth-Token:$bmc_token" https://${IP}/redfish/v1/AccountService/Accounts
{
  "@odata.id": "/redfish/v1/AccountService/Accounts",
  "@odata.type": "#ManagerAccountCollection.ManagerAccountCollection",
  "Description": "BMC User Accounts",
  "Members": [
    {
      "@odata.id": "/redfish/v1/AccountService/Accounts/root"
    }
  ],
  "Members@odata.count": 1,
  "Name": "Accounts Collection"
}

```
Change-Id: I6ab4347efd8fda9ae86bfbb8575666ad3eabe88c
Signed-off-by: Ed Tanous <etanous@nvidia.com>
Signed-off-by: Chandramohan Harkude <chandramohan.harkude@gmail.com>
@gtmills gtmills changed the title 1110: Help recover in DOS attack WIP: 1110: Help recover in DOS attack Jun 9, 2025
@gtmills
Copy link
Contributor Author

gtmills commented Jun 10, 2025

Putting into 1120 at #1310 and will run with it for a while

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants