Skip to content

Remove race condition - Fixes 2143 #2203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 17, 2017
Merged

Conversation

josenavas
Copy link
Contributor

Fixes #2143

Adds a new internal command: release_validators. The validators themselves no longer try to free up the parent job. The release_validators is a command that performs an active polling checking for the jobs to complete. It is automatically submitted to the queue after the validators have been submitted.

@codecov-io
Copy link

codecov-io commented Aug 8, 2017

Codecov Report

Merging #2203 into master will decrease coverage by 0.02%.
The diff coverage is 79.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2203      +/-   ##
==========================================
- Coverage   92.04%   92.01%   -0.03%     
==========================================
  Files         191      191              
  Lines       19515    19543      +28     
==========================================
+ Hits        17963    17983      +20     
- Misses       1552     1560       +8
Impacted Files Coverage Δ
qiita_db/test/test_processing_job.py 99.8% <100%> (ø) ⬆️
qiita_db/test/test_artifact.py 99.84% <100%> (ø) ⬆️
qiita_db/private.py 24.32% <33.33%> (-0.68%) ⬇️
qiita_db/processing_job.py 91.75% <79.16%> (-0.82%) ⬇️
qiita_db/test/test_analysis.py 98.34% <0%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a058102...9c3786e. Read the comment docs.

@antgonza
Copy link
Member

antgonza commented Aug 8, 2017

@josenavas, just FYI, the PR actually has failed tests.

@josenavas
Copy link
Contributor Author

Ready for review!

AND processing_job_status != %s"""
qdb.sql_connection.TRN.add(sql, [self.id, 'waiting'])
AND processing_job_status NOT IN %s"""
sql_args = [self.id, ('waiting', 'error')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the description it sounds like here it should only be waiting, right? If this is incorrect, what happens with erred jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment explaining this, but basically here we want to wait until all validators are completed. Validator jobs can be in two states when completed: 'waiting' -> success and 'error' -> error

@@ -476,7 +483,8 @@ def test_complete_success(self):

obsjobs = set(self._get_all_job_ids())

self.assertEqual(len(obsjobs), len(alljobs) + 1)
# The complete submits the release validators job
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an explanation of why + 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed - I would, however, hold on merging as I would like to deploy this in the test environment first to make sure that everything is working as expected - since I could only perform limited testing on my machine.

AND processing_job_status != %s"""
qdb.sql_connection.TRN.add(sql, [self.id, 'waiting'])
AND processing_job_status NOT IN %s"""
sql_args = [self.id, ('waiting', 'error')]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment explaining this, but basically here we want to wait until all validators are completed. Validator jobs can be in two states when completed: 'waiting' -> success and 'error' -> error

@@ -476,7 +483,8 @@ def test_complete_success(self):

obsjobs = set(self._get_all_job_ids())

self.assertEqual(len(obsjobs), len(alljobs) + 1)
# The complete submits the release validators job
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@josenavas josenavas changed the base branch from master to dev August 16, 2017 17:15
@josenavas
Copy link
Contributor Author

I deployed this in the test environment and executed ~10 deblur jobs. All of them succeeded and I haven't seen the race condition occurring again. I'm pretty confident that the issue has been solved because conceptually the race condition has been removed.

@antgonza antgonza merged commit 24b7d8e into qiita-spots:dev Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants