-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodical OOM incidents on Testnet storage nodes #1319
Comments
Observations:
Possible reproductions:
Virtual consumption is bigger than Go runtime, so mb some direct OS actions cause memory growth. |
To create OOM condition in dev-env:
On my machine it consistently fails in the middle of a second object put. |
I have conducted some experiments (1g memory on each node as described in the previous post):
|
When we put an object in the blobstor https://github.com/nspcc-dev/neofs-node/blob/master/pkg/local_object_storage/blobstor/put.go#L38, we do the following steps:
So if the object size is close to a maximum (64 MiB), we allocate additional 64 MiB on the step 2 and yet another 64 MiB on the step 3 (actually, if the object is non-compressable, we allocate twice, because the initial capacity is equal to the object size). This explains sudden increase in the memory consumption right before the OOM.
|
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
…te` targets They work with prepared objects only. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Was fixed. |
…tedTarget` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
…and `remote` targets They work with prepared objects only. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Some NeoFS Testnet storage nodes (
nagisa
,ai
,yu
) are periodically killed by OOM signal from OS. All these nodes has ~2GB RAM. We need to detect the reason and try to prevent it.Possible reasons:
Observations also show that memory consuming sometimes happens almost simultaneously on different nodes, which can hint either at an external load spike on the container, or global event processing.
The text was updated successfully, but these errors were encountered: