-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible WAL corruption with loss of un-fsynced writes:max entry size limit exceeded
#14098
Comments
Is this issue the same as 14102 ? |
Possibly! However, since they're different kinds of faults and yield different etcd crashes, I suspect they're different issues. I also wouldn't be surprised if #14102 turns out to encompass a half-dozen different issues, just based on the number of distinct ways I've seen it fail so far. |
This is also related to 14025 |
One WAL entry's size is 13563782407139376 bytes, see log below. It's about 13563TB, obviously it isn't correct.
There are two possible reasons:
|
I think the best thing to do for now is to let etcd fail to get started in this situation (data files corrupted, including WAL file) , and it's exactly the current behavior. In the future, we may deliver a solution to recover the data file from a point in time. |
So we've traced this behavior to (we think) an issue with lazyfs: truncation filled with ASCII '0' characters (0x30), rather than 0x00. Etcd's WAL reader scanned for 0x00 to determine the end of the file, and in this case got 0x30 and... maybe interpreted those as a part of the size field? I'm not exactly sure what the correct behavior here is, filesystem-wise (perhaps @devzizu could chime in?), but for the time being we've replaced truncated bytes with 0x00, and that seems to have eliminated this particular crash. Instead, we get a new kind of crash! Here's an example:
|
Hey!
That's right, any filesystem should return null bytes on read operations (0x00) for the truncated file (in case of increasing the size). My apologies with the LazyFS's bug, as @aphyr said, I was writing 0x30 (ascii '0') instead of 0x00, because that helped me debugging at the time. Also, I thought it wouldn't be a huge deal because I was thinking that applications relied on some kind of max readable offset. Feel free to report bugs or ask me anything about LazyFS, it will be a pleasure to help! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Closing per #14098 (comment) |
error starting etcd: wal: max entry size limit in sensu-go what is the solution for this? Thanking You |
What happened?
The lazyfs filesystem lets us simulate the effects of a power failure by losing writes which were not explicitly fsync'ed to disk. When we run etcd 3.5.3 on lazyfs, killing etcd and then losing un-fsynced writes can reliably put etcd into an unbootable state. Every time we try to start the node, it complains:
We're still sanding bugs off of lazyfs, so it's possible this might be an issue in the filesystem itself. That said, this might also point to a problem with how etcd writes WAL files, so I'd like to check and see if this looks plausible to y'all. I know there's been some issues with data file corruption on process crash in the past; this approach might help find more bugs like that!
I've attached a full test run from Jepsen, which includes tarballs of the data directories for each node. Take a look at n1/ as an example: 20220607T150758.000-0400.zip.
This happens both with and without --experimental-initial-corrupt-check.
What did you expect to happen?
I expect that etcd ought to start up without crashing, even if we lose un-fsynced writes.
How can we reproduce it (as minimally and precisely as possible)?
Check out https://github.com/jepsen-io/etcd at adfc820826a947625c94d836b4017b4eaac7064d, and run:
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
etcd --enable-v2 --log-outputs stderr --logger zap --name n1 --listen-peer-urls http://192.168.122.101:2380 --listen-client-urls http://192.168.122.101:2379 --advertise-client-urls http://192.168.122.101:2379 --initial-cluster-state new --initial-advertise-peer-urls http://192.168.122.101:2380 --initial-cluster n1=http://192.168.122.101:2380,n2=http://192.168.122.102:2380,n3=http://192.168.122.103:2380,n4=http://192.168.122.104:2380,n5=http://192.168.122.105:2380 --snapshot-count 100 --experimental-initial-corrupt-check
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: