-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prevent retention service creating orphaned shard files #24530
Conversation
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is complex enough I think I need a walk-through. Sorry that I am not fully grasping how this works.
What I am missing is how the scenario where the retention service leaves shards on disk but deletes them from the meta-data occurs, and how these changes fix that scenario. You explained the scenario to me once, but I don't remember enough details to walk through it with the new code.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comment typos, and a suggestion to simplify a little code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more change requested, an additional log message.
for _, id := range s.TSDBStore.ShardIDs() { | ||
if info, ok := deletedShardIDs[id]; ok { | ||
delete(deletedShardIDs, id) | ||
if err := s.TSDBStore.DeleteShard(id); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more change - what do you think about printing one more message before calling DeleteShard
:
attempting deletion of shard X, database Y, retention policy Z
We have no visibility into why the retention service hangs for so long, and this might give us more clues; we can look at the size and cardinality of a shard that prints the message and then never prints a success or failure message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. Backport via clean cherry-pick of #24530. fixes: #24543 (cherry picked from commit 7bd3f89)
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. Backport via clean cherry-pick of #24530. fixes: #24543 (cherry picked from commit 7bd3f89)
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529 (cherry picked from commit 7bd3f89) closes #24545
…#24547) Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529 (cherry picked from commit 7bd3f89) closes #24545 Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
…#24547) Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529 (cherry picked from commit 7bd3f89) closes #24545 Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com> (cherry picked from commit 0dc48b1) closes #24546
…#24547) (#24548) Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: #24529 (cherry picked from commit 7bd3f89) closes #24545 Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com> (cherry picked from commit 0dc48b1) closes #24546
…#24547) Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. closes: #25116 (cherry picked from commit 7bd3f89) closes #24545 Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com> (cherry picked from commit 0dc48b1)
…ata#24530) * fix: prevent retention service creating orphaned shard files Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users. This corrects that by requiring shards files be deleted before they can be removed from the meta store. fixes: influxdata#24529
Under certain circumstances, the retention service can fail to delete shards from the store in a timely manner. When the shard groups are pruned based on age, this leaves orphaned shard files on the disk. The retention service will then not attempt to remove the obsolete shard files because the meta store does not know about them. This can cause excessive disk space usage for some users.
This corrects that by requiring shards files be deleted before they can be removed from the meta store.
Closes #24529