Description
Actions executed via the transport client would fork to the LISTENER
threadpool to complete their listener to avoid problems with running too much work on a transport thread. In contrast, actions executed internally via NodeClient
don't fork by default. We removed this forking behaviour entirely in #53049 as part of the removal of the transport client.
Since then we've seen at least a couple of issues where internal actions end up inadvertently doing nontrivial work on a transport thread:
- [ML] Improve performance of job stats searches for data counts, model size stats and timing stats #82255 (see linked support case from there)
- Deprecation info API puts long-running work on the transport_worker thread #86764
I think this is trappy, there's not even anywhere to specify which thread to use to complete the listener so it's understandable if a developer might think it's going to be completed somewhere safe, perhaps on the same threadpool as was used to call the action in the first place.
I think we should at least require an executor
parameter when executing an action via the client so that callers must make a conscious decision about whether to fork for the response listener or not. Alternatively we could just make them all fork by default (to where? reinstate the LISTENER
threadpool?) and allow an opt-in non-forking API for performance-critical cases.