-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restriction on MaxRequestSize and maxWALEntrySize #14114
Comments
Thanks @ahrtr for putting this together, I think you show a deep understanding of the problem, and you indeed recognise that it is a bug that needs to be resolved. About the proposed solution, I think it is fine, I just want to make sure that when you say |
Yes, the dynamic value is |
Thank you for continuing with this task. Proposal looks good to me. It keeps flexibility and keeps some security limitations and the same time. As for ServerOption - maybe your concern could be solved through the hard config compatibility checks. In general it sounds safe when server cannot start because of conflicting config parameters. Moreover we're talking about new features, so no backward compatibility will be broken. |
@ahrtr we have a customer upgrading from etcd 3.4 to 3.5, now dealing with this message:
I assume this is due to the backport in #14127, is there an easy mitigation the customer can take to get out of this issue? |
It looks like the last record in the WAL file is corrupted. Please workaround the issue using #15066 for now. I will deliver a formal fix later. Sorry for the inconvenience and thank you for raising the issue. |
A workaround for now is to manually cut the corrupted record. |
This should can also work. |
Thanks. I can also share the WAL with you if need to dig any deeper. |
Sure. Please, so that I can double confirm the PR indeed fixes the issue. |
can I ping you on the Kubernetes slack under @ Benjamin Wang? |
Confirmed the issue.
|
etcd build-in gRPC services
Users can configure the MaxRequestSize using flag --max-request-bytes, and its default value is 1.5MB. When users configure a value greater than 10MB, then etcd generates a warning message. The existing logic on
--max-request-bytes
looks good to me, and we shouldn't set any hard limit on it. We just need to keep it as it's for now.But there is a hard limit (10MB) on the maxWALEntrySize. If users configure a value > 10MB for the
--max-request-bytes
, then etcd can receive and process request > 10MB. But when etcd restarts later, it will fail to decode the WAL entry, see decoder.go#L88. This is definitely a bug, see also issues/14025. We can't just remove the max size limitation on the WAL entry, because etcd may run out of memory in case the WAL entry is somehow corrupted, and its size is a huge value. The proposed solution for this issue is to replace the hard code limitation 10MB with a dynamic value, which is the remaining size of the current WAL file.Let's work with an example. Assuming the reading offset is 800 bytes, and the current WAL size is 64MB, then the max size of the current WAL entry should be
(64MB - 800)
.Currently the maxSendBytes is hard coded as
math.MaxInt32
. We may add a configuration itemMaxSendMsgSize
into embed.Config, and defaults tomath.MaxInt32
.Users registered gRPC services
One related PR is 14066. @ptabor proposed to add an additional option, see below,
Just as I mentioned in issuecomment-1155881593 and issuecomment-1155894742, the good side of the proposal is more flexibility; but the down side is more chance to make mistake. When users configure a different value for the max
RecvMsgSize
orMaxSendMsgSize
via thegrpcAdditionalServerOptions
, then it may overwrite (or be overwritten by) the values configured by--max-request-bytes
.So I suggest not to add the
grpcAdditionalServerOptions
for now, until we receive valid use cases. If users want to change the max send/receive values, just use the configuration items for the etcd build-in gRPC services.cc @ptabor @serathius @spzala @biosvs @algobardo @xiang90 @rleungx
The text was updated successfully, but these errors were encountered: