-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation time is added to wal records #17233
base: main
Are you sure you want to change the base?
Conversation
Hi @KosovGrigorii. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Grigorii Kosov <kosov.gr@gmail.com> Signed-off-by: KosovGrigorii <72564996+KosovGrigorii@users.noreply.github.com>
…o WAL records wal test was changed due to the changes in WAL structure in TestRepairWriteTearLast i changed number of expected entries from 40 to 29 because now one wal record takes morespace and therefore after truncating a wal file less records remain Signed-off-by: Grigorii Kosov <kosov.gr@gmail.com> Signed-off-by: KosovGrigorii <72564996+KosovGrigorii@users.noreply.github.com>
08bc5da
to
3dc1e4f
Compare
/cc @vivekpatani |
@siyuanfoundation: GitHub didn't allow me to request PR reviews from the following users: vivekpatani. Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @wenjiaswe |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KosovGrigorii, x4m The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @ahrtr Hi! We're currently working on the SPQR project, which is about PostgreSQL sharding. We store our metadata in etcd and already know how to restore shards to a specific point in time using WAL-G. But there is no PITR support for etcd, so this PR is very important to us. Could you please to approve us to run GitHub Workflows? |
/ok-to-test |
Bump |
Please propose a design on how the time will be used and how you will reconcile the clock drift in multi member scenario. Note, WAL writing time can be arbitrary delayed, as entries are written by members independently. You can have a node down for maintenance for a week and after it rejoins the cluster it write its WAL entries with timestamp week later than other nodes. That's why etcd doesn't have any notion of saving any timestamp anywhere. It never depends on absolute time, only to time period (1s timeout request or 10s TTL). Adding it back needs to be done with proper considerations for distributed system like etcd. |
@serathius thank you for your inquiry. When restoring a database from a backup to a point-in-time we do not need to coordinate nodes from a fault tolerance group. We need just one, restoring to specific point and leaving rest history aside.
Indeed. Response time of almost a year is completely novel to us. Developers who were working on the project already changed a team. We will find someone, of course. |
|
@ahrtr consider symmetric point of view: etcd stores data in WAL, bbolt is just a cache for the case when no crashes happened.
PITR is just a recovery constrained by some point in time. That's how PITR is working in Oracle, Postgres, MS-SQL, Cloudberry and many other databases.
Do you mean "etcd project don't see any value in PITR"? |
Those databases are all single master that decides the time. Etcd explicitly avoids trusting clock of any member and doesn't use absolute time. Not saying that you could not solve the problem, but it doesn't seem like something that could be easily integrated into etcd. |
@serathius it is not a problem for PITR at all, just use any source of time.
No, it's not true. We have different timelines, we have different primaries in different shards, and of course we have different primaries at different moments. In Postgres HA installation you have a WAL time recordeded according to clocks of current Primary for every WAL record. But you easily can promote Standby at any given moment. And the history would be recorded according to clocks of current Primary. In MySQL stuff is much more interesting, because you have a vector clock + GTID - every node writes it's own binlogs, and nodes slice logs differently. So we record history with the overlap. E.i. when primary switches over to secondary, we have some tail of committed transactions on old primary, that are available on secondary too, we deduplicate them using GTID and use earliest timestamp from different logs. Leaderless replication is much easier wrt PITR. It is not even shardeded, so we do not have to deal with cross-shard consistency as we do in MongoDB, Greenplum and Cloudberry. Please, describe any specific problem, not generic "Etcd explicitly avoids trusting clock". Because, for point-in-time you need time anyway. Time is obviously non-monotonic and you cannot trust it for event ordering within database. But for matching events with real world - it's perfectly fine. You can trust input of your user like "yesterday my data was OK", or "I've dropped important information 1 hour ago". No matter what time source you use for determining where to stop recovery. |
To prevent developers from misusing timestamps we can truncate it to, say, minutes. Anyone who will try to order something with this timestamp will quickly discovery that it's impossible. And for PITR it's OK if you can specify recovery minute, not a millisecond. |
This might not be true. Only committed WAL records are guaranteed to be safe. For example, if you recover to a WAL record, which wasn't committed, then you might end up committing a failed client write.
No any value to etcd itself to add timestamp into WAL file. Backup & restore is important. Also I did not say PITR isn't important. I was saying I do not see how easily to do point-in-time recovery using WAL files/data. Usually PITR depends on a base backup (full backup, i.e snapshots) + incremental logs (i.e. WAL logs). But in etcd, the V2 snapshots files (*.snap files) do not contain any real k/v data, and we are deprecating it. All data are stored in bbolt db, but the relationship between It isn't clear how you will implement PITR. Are you going to do periodically backup for the bbolt db & WAL files, and select a right one which falls into the time range when doing PITR? |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
This PR resolves #16962