Skip to content

Fix job creation race condition#3646

Open
ayushgupta704 wants to merge 1 commit intointelowlproject:developfrom
ayushgupta704:fix-job-creation-race-condition
Open

Fix job creation race condition#3646
ayushgupta704 wants to merge 1 commit intointelowlproject:developfrom
ayushgupta704:fix-job-creation-race-condition

Conversation

@ayushgupta704
Copy link
Copy Markdown
Contributor

Description

This PR fixes the django-treebeard race condition reported in #3639, where concurrent Celery workers creating root jobs at the same time could lead to “ghost” roots and silent data loss.
While testing, I also noticed that nested job creation (parent.add_child()) was completely unprotected. In cases like parallel pivots or file drops, this could result in IntegrityError crashes.

  • Added a database-level UniqueConstraint on Job.path so Postgres prevents duplicate entries.
  • Wrapped both add_root and add_child in a retry loop (up to 10 attempts) with exponential backoff to handle collisions
    gracefully.
  • On retries for child creation, the parent is refreshed from the database to ensure the latest numchild value is used.
  • Added a fix_split_brain_jobs command to safely clean up any existing ghost jobs before applying the strict constraint.

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue).

Checklist

  • I have read and understood the rules about how to Contribute to this project
  • The pull request is for the branch develop
  • I have inserted the copyright banner at the start of the file: ```# This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl # See the file 'LICENSE' for copying permission.

@ayushgupta704 ayushgupta704 marked this pull request as draft April 9, 2026 17:08
@ayushgupta704 ayushgupta704 force-pushed the fix-job-creation-race-condition branch from e03341c to 53555d0 Compare April 10, 2026 05:21
@ayushgupta704 ayushgupta704 force-pushed the fix-job-creation-race-condition branch from 53555d0 to bc820c7 Compare April 10, 2026 05:23
@ayushgupta704 ayushgupta704 marked this pull request as ready for review April 10, 2026 09:00
@ayushgupta704
Copy link
Copy Markdown
Contributor Author

ayushgupta704 commented Apr 10, 2026

Hi @mlodic,
I Fix job creation race condition.
I’ve tested this under concurrent conditions and it’s working as expected now no more ghost roots parallel job creation.

Attaching test proof:
docker exec intelowl_uwsgi python manage.py test tests.api_app.test_views --keepdb (also tested parallel root and child job creation locally)

Screen.Recording.2026-04-10.130747.mp4
Screen.Recording.2026-04-10.135138.mp4

Everything seems to be running smoothly. Let me know if there’s anything else you’d like me to check.
This is ready for review
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant