WIP: 1110: Help recover in DOS attack #1311

gtmills · 2025-06-05T18:20:55Z

Upstream is https://gerrit.openbmc.org/c/openbmc/bmcweb/+/80819 and https://gerrit.openbmc.org/c/openbmc/bmcweb/+/80839 and has merged. The DOS attack we are helping clear up is coming from the HMC. 2 commits:

Fix DOS attack scenario

Problem : When 201 connections made in parallel to BMC and closed them immediately without properly sending a close_notify alert it was observed that Bmcweb server was taking several minutes to close the sockets. All the 200 TCP sockets were in CLOSE_WAIT state.

Journal log shows below line
[CRITICAL http_connection.hpp:213] 0x29d1ef0Max connection count exceeded.

Not able to make new connection

$ curl -k -H "X-Auth-Token:$bmc_token" -X GET https://${BMC_IP}/redfish
/v1/AccountService/Accounts
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to
127.0.0.1:2443

Fix : The bmcweb server failed to identify the end of stream at TCP (SSL
/TLS) layer, therefore added check to identify the end of stream which
closes the connection and socket

Change-Id: I1c277db0b774d33c656b4a2b1bd14f3575535bec

Do hard close if client disobeys protocol

There are cases bmcweb might close the connection due to a violation of
the protocol. Currently these are done gracefully, under the assumption
that a client might attempt to recover. But this opens us up to
potentially leaving sockets open for far longer than we intend if the
client is completely gone, due to a disconnect or explicitly closing the
socket hard.

In cases where we get a protocol error, shutdown the socket hard, rather
than attempt to do things "correctly".

Tested:

I tested this MR using a script that simulated 5,000 parallel
connections simultaneously to BMC and closed them immediately without
properly sending a close_notify alert

Observations:

The BMC became unresponsive for 30-40 seconds before recovering.

After recovery, it took approximately 90 seconds to close all
connections in QEMU.
On real hardware, connection closure times may be slightly higher
(though still within expected parameters).

Conclusion:
This behavior aligns with expectations.

After 90 seconds observed that

No sockets in CLOSE_WAIT state

Able to make new connection.

curl -k -H "X-Auth-Token:$bmc_token" https://${ip}/redfish/v1/AccountService/Accounts
{
"@odata.id": "/redfish/v1/AccountService/Accounts",
"@odata.type": "#ManagerAccountCollection.ManagerAccountCollection",
"Description": "BMC User Accounts",
"Members": [
{
"@odata.id": "/redfish/v1/AccountService/Accounts/root"
}
],
"Members@odata.count": 1,
"Name": "Accounts Collection"
}

Change-Id: I6ab4347efd8fda9ae86bfbb8575666ad3eabe88c
Signed-off-by: Ed Tanous etanous@nvidia.com
Signed-off-by: Chandramohan Harkude chandramohan.harkude@gmail.com

Problem : When 201 connections made in parallel to BMC and closed them immediately without properly sending a close_notify alert it was observed that Bmcweb server was taking several minutes to close the sockets. All the 200 TCP sockets were in CLOSE_WAIT state. Journal log shows below line [CRITICAL http_connection.hpp:213] 0x29d1ef0Max connection count exceeded. ``` Not able to make new connection $ curl -k -H "X-Auth-Token:$bmc_token" -X GET https://${BMC_IP}/redfish /v1/AccountService/Accounts curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 127.0.0.1:2443 Fix : The bmcweb server failed to identify the end of stream at TCP (SSL /TLS) layer, therefore added check to identify the end of stream which closes the connection and socket Test : I tested this MR using a script that simulated 5,000 parallel connections simultaneously to BMC and closed them immediately without properly sending a close_notify alert Observations: The BMC became unresponsive for 30-40 seconds before recovering. After recovery, it took approximately 90 seconds to close all connections in QEMU. On real hardware, connection closure times may be slightly higher (though still within expected parameters). Conclusion: This behavior aligns with expectations. After 90 seconds observed that 1) No sockets in CLOSE_WAIT state 2) Able to make new connection. curl -k -H "X-Auth-Token:$bmc_token" https://${IP}/redfish/v1/AccountService/Accounts { "@odata.id": "/redfish/v1/AccountService/Accounts", "@odata.type": "#ManagerAccountCollection.ManagerAccountCollection", "Description": "BMC User Accounts", "Members": [ { "@odata.id": "/redfish/v1/AccountService/Accounts/root" } ], "Members@odata.count": 1, "Name": "Accounts Collection" } ``` Change-Id: I1c277db0b774d33c656b4a2b1bd14f3575535bec Signed-off-by: Chandramohan Harkude <chandramohan.harkude@gmail.com>

baemyung

It looks good to me

There are cases bmcweb might close the connection due to a violation of the protocol. Currently these are done gracefully, under the assumption that a client might attempt to recover. But this opens us up to potentially leaving sockets open for far longer than we intend if the client is completely gone, due to a disconnect or explicitly closing the socket hard. In cases where we get a protocol error, shutdown the socket hard, rather than attempt to do things "correctly". Tested: I tested this MR using a script that simulated 5,000 parallel connections simultaneously to BMC and closed them immediately without properly sending a close_notify alert Observations: The BMC became unresponsive for 30-40 seconds before recovering. After recovery, it took approximately 90 seconds to close all connections in QEMU. On real hardware, connection closure times may be slightly higher (though still within expected parameters). Conclusion: This behavior aligns with expectations. After 90 seconds observed that 1) No sockets in CLOSE_WAIT state 2) Able to make new connection. ``` curl -k -H "X-Auth-Token:$bmc_token" https://${IP}/redfish/v1/AccountService/Accounts { "@odata.id": "/redfish/v1/AccountService/Accounts", "@odata.type": "#ManagerAccountCollection.ManagerAccountCollection", "Description": "BMC User Accounts", "Members": [ { "@odata.id": "/redfish/v1/AccountService/Accounts/root" } ], "Members@odata.count": 1, "Name": "Accounts Collection" } ``` Change-Id: I6ab4347efd8fda9ae86bfbb8575666ad3eabe88c Signed-off-by: Ed Tanous <etanous@nvidia.com> Signed-off-by: Chandramohan Harkude <chandramohan.harkude@gmail.com>

gtmills · 2025-06-10T20:01:25Z

Putting into 1120 at #1310 and will run with it for a while

gtmills changed the title ~~1110: help in dos attack~~ 1110: Help recover in DOS attack Jun 5, 2025

baemyung approved these changes Jun 6, 2025

View reviewed changes

gtmills changed the title ~~1110: Help recover in DOS attack~~ WIP: 1110: Help recover in DOS attack Jun 9, 2025

gtmills mentioned this pull request Jun 10, 2025

1120: Help recover in DOS attack #1310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: 1110: Help recover in DOS attack #1311

WIP: 1110: Help recover in DOS attack #1311

Uh oh!

gtmills commented Jun 5, 2025 •

edited

Loading

Uh oh!

baemyung left a comment

Uh oh!

gtmills commented Jun 10, 2025

Uh oh!

Uh oh!

WIP: 1110: Help recover in DOS attack #1311

Are you sure you want to change the base?

WIP: 1110: Help recover in DOS attack #1311

Uh oh!

Conversation

gtmills commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baemyung left a comment

Choose a reason for hiding this comment

Uh oh!

gtmills commented Jun 10, 2025

Uh oh!

Uh oh!

gtmills commented Jun 5, 2025 •

edited

Loading