Skip to content

Fix running processes count endpoint (issue #1258)#1384

Open
torkashvand wants to merge 2 commits intomainfrom
fix/running-processes-count-1258
Open

Fix running processes count endpoint (issue #1258)#1384
torkashvand wants to merge 2 commits intomainfrom
fix/running-processes-count-1258

Conversation

@torkashvand
Copy link
Contributor

Replace flawed increment/decrement counter with actual database queries.

The previous implementation used engine_settings.running_processes which was incremented when processes started and decremented when they finished. However, this counter could drift over time due to:

  • Failed transactions that don't commit the decrement
  • Crashes or exceptions that prevent the finally block from executing
  • Race conditions in concurrent environments

This fix:

  • Adds get_actual_running_processes_count() function that queries the actual count from ProcessTable where last_status == 'running'
  • Updates generate_engine_global_status() to use the actual count
  • Updates generate_engine_status_response() to override the flawed counter with the actual count
  • Updates test to create actual running processes instead of manipulating the counter

Fixes: #1258

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 9, 2026

Merging this PR will not alter performance

✅ 13 untouched benchmarks


Comparing fix/running-processes-count-1258 (0689aff) with main (c2567dd)

Open in CodSpeed

@torkashvand torkashvand force-pushed the fix/running-processes-count-1258 branch 3 times, most recently from 099cf3f to 0d97328 Compare February 9, 2026 10:12
@sentry
Copy link

sentry bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 92.66055% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.43%. Comparing base (e65febd) to head (0689aff).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
orchestrator/services/worker_status_monitor.py 91.07% 4 Missing and 1 partial ⚠️
orchestrator/services/processes.py 94.44% 0 Missing and 2 partials ⚠️
orchestrator/services/settings.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1384      +/-   ##
==========================================
+ Coverage   80.33%   80.43%   +0.10%     
==========================================
  Files         275      276       +1     
  Lines       13854    13977     +123     
  Branches     1369     1377       +8     
==========================================
+ Hits        11129    11242     +113     
- Misses       2410     2418       +8     
- Partials      315      317       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace flawed increment/decrement counter with actual database queries.

The previous implementation used engine_settings.running_processes which
was incremented when processes started and decremented when they finished.
However, this counter could drift over time due to:
- Failed transactions that don't commit the decrement
- Crashes or exceptions that prevent the finally block from executing
- Race conditions in concurrent environments

This fix:
- Adds get_actual_running_processes_count() function that queries the
  actual count from ProcessTable where last_status == 'running'
- Updates generate_engine_global_status() to use the actual count
- Updates generate_engine_status_response() to override the flawed
  counter with the actual count
- Updates test to create actual running processes instead of
  manipulating the counter

Fixes: #1258
@torkashvand torkashvand force-pushed the fix/running-processes-count-1258 branch from 0d97328 to 975ced7 Compare February 9, 2026 10:34
  - Add WorkerStatusMonitor with periodic background updates
  - Track actual running jobs via ThreadPool active job counter
  - Replace database-based counting with worker inspection
  - Add thread-safe singleton with double-checked locking
  - Use threading.Event for clean shutdown handling
  - Make update interval configurable (WORKER_STATUS_INTERVAL setting)
  - Add refresh_once() for deterministic testing
  - Refactor active job tracking with context manager to reduce complexity.
@torkashvand torkashvand force-pushed the fix/running-processes-count-1258 branch from 994371d to 0689aff Compare February 16, 2026 09:43
Copy link
Contributor

@Mark90 Mark90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :)

Left a few questions, but LGTM!

self.number_of_workers_online = getattr(thread_pool, "_max_workers", -1)
self.number_of_queued_jobs = thread_pool._work_queue.qsize() if hasattr(thread_pool, "_work_queue") else 0
self.number_of_running_jobs = len(getattr(thread_pool, "_threads", []))
self.number_of_running_jobs = get_active_threadpool_jobs_count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this was indeed broken, didn't know that. Thanks for fixing this as well :)


_workflow_executor = None
_active_threadpool_jobs = 0
_active_jobs_lock = __import__("threading").Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it not possible to import it the usual way?

"""Get the global WorkerStatusMonitor instance.

Thread-safe singleton pattern with double-checked locking.
Restarts the monitor if it was previously stopped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When could this be needed?

Comment on lines +42 to +43
with self._lock:
self._running_jobs_count = count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning a new value is atomic, so I don't think this requires locking

@Mark90 Mark90 requested a review from tjeerddie February 16, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Fix the running processes count endpoint

3 participants