-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not blindly delete duplicate schedules #389
Conversation
Because there are no unique constraints on schedule models (except for SolarSchedule), schedules tables can contain duplicates, those duplicate schedules should not be blindly deleted when encountered because they can be linked to existing tasks and would cause the tasks to be also deleted (on delete cascade). Instead we now just return the first duplicate found, just to avoid creating further duplicates. This fixes issue celery#322.
can you check if this PR https://github.com/celery/django-celery-beat/pull/269/files is somewhat related? |
It is related in the sense that it would prevent future duplicates, the idea of adding unique constraints is good, but there should be a warning in the changelog because it's not fully backwards compatible, existing code may try to create duplicate schedules (which currently works) and get a Furthermore, because existing databases can already contain duplicate schedules, the IMO you can merge my PR first (the bug is really critical in the sense that it deletes existing tasks !), release a bugfix version, and add constraints on a future major version. (If the PR is reworked to only add constraints, the tests on my PR will have to be removed.) |
@arnau126 can you check this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope return cls.objects.filter(**spec).first() won't create any regression as bug it is bug fix
Frankly, it can't be worse than it is now, I got tasks that disappeared on my production environment and it's really difficult to repair when you have hundreds of tasks. Also, since the |
just waiting for another eye for cross-checking. will merge soon after that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
This improves over the existing code for sure. 👍
What I am wondering is:
- Why are we using
cls()
overcls.objects.create()
so that those functions sometimes return model instances that are persisted already and sometimes return instances that are not yet persisted in the database. Is that a bug or a feature? - It seems like we effectively have a tailor-made
get_or_create
here. Why not use what Django offers and simplify most of this code away for good? - The added test cases are effectively 3 times the same test. Personally, I would use
parameterized.expand
and a single parametrized test to avoid duplication, but it's just my two cents and I'm aware there there are multiple schools on that subject.
Best, Sebastian
t/unit/test_models.py
Outdated
# create 2 duplicates schedules | ||
sched1 = CrontabSchedule.objects.create(hour="4") | ||
CrontabSchedule.objects.create(hour="4") | ||
self.assertEqual(CrontabSchedule.objects.count(), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea:
I think adding .filter(hour="4")
here and in line 93 would be a cheap way to make this test more robust to any potential cross-test effects. It would help fight future flakiness. What do you think?
PS: This applies to the other two tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
t/unit/test_models.py
Outdated
def test_duplicate_schedules(self): | ||
# Duplicates cannot be tested for solar schedules because of the | ||
# unique constraints in the SolarSchedule model | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of adding dead code, to be honest. There are other things that we cannot test for either, right? Does this code really add value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See next comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked all. Which one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the constraint is here from the very beginning (ec46367) we can indeed drop the MultipleObjectsReturned block (as well as the test method of course)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the test would be dropped as well, excellent. 👍
django_celery_beat/models.py
Outdated
except MultipleObjectsReturned: | ||
cls.objects.filter(**spec).delete() | ||
return cls(**spec) | ||
# unique_together constraint should not permit reaching this code, | ||
# but just in case, we return the first schedule found | ||
return cls.objects.filter(**spec).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of avoiding dead code, why not drop these lines altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the constraint is here from the very beginning (ec46367) we can indeed drop the MultipleObjectsReturned block (as well as the test method of course)
- removed MultipleObjectsReturned code block for solar schedules as unique_togther constraint safely prevents duplicates - removed related test method
Hi, I've updated the PR.
I asked myself the same question regarding why the schedule isn't saved, but keep in mind I don't want to break anything with this bug fix PR. Improving the code should be the role of another PR IMHO.
I won't take the responsability of adding yet another requirement just for that, instead I've factorized the code in a mixin class. Thanks for your review, regards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izimobil thanks for taking this further! 👍 🙏
Travis CI taking ages to start builds as usual… 🤷♂️ |
@auvipy I think you can safely merge this one, I can't believe no one else came across the dramatic impacts of previous code ! |
caught by cold flu. but reviewing and merging it tomorrow for sure |
All the best! |
Because there are no unique constraints on schedule models (except for
SolarSchedule), schedules tables can contain duplicates, those
duplicate schedules should not be blindly deleted when encountered
because they can be linked to existing tasks and would cause the tasks
to be also deleted (on delete cascade).
Instead we now just return the first duplicate found, just to avoid
creating further duplicates.
This fixes issue #322.