Skip to content

Galaxy unable to delete jobs submitted through TES Pulsar library #153

Open
@micoleaoo

Description

@micoleaoo

You can delete datasets throught UI but they still linger in database, resulting in nonstop cycle of these logs -->

Galaxy log:

Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: galaxy.jobs.handler DEBUG 2025-04-11 07:57:24,426 [pN:handler_0,p:875355,tN:JobHandlerStopQueue.monitor_thread] Stopping job 260 in pulsar_tes runner
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: galaxy.jobs.runners.pulsar DEBUG 2025-04-11 07:57:24,438 [pN:handler_0,p:875355,tN:JobHandlerStopQueue.monitor_thread] Attempt remote Pulsar kill of job with url pulsar_tes and id 67f52f266dd4e6cf81e81e87
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: galaxy.jobs.handler ERROR 2025-04-11 07:57:24,467 [pN:handler_0,p:875355,tN:JobHandlerStopQueue.monitor_thread] Exception in monitor_step
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: Traceback (most recent call last):
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/venv/lib/python3.11/site-packages/requests/models.py", line 974, in json
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     return complexjson.loads(self.text, **kwargs)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     return _default_decoder.decode(s)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     raise JSONDecodeError("Expecting value", s, err.value) from None
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: During handling of the above exception, another exception occurred:
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: Traceback (most recent call last):
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/server/lib/galaxy/jobs/handler.py", line 1086, in __monitor
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     self.__monitor_step()
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/server/lib/galaxy/jobs/handler.py", line 1118, in __monitor_step
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     self._check_jobs(session, jobs_to_check)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/server/lib/galaxy/jobs/handler.py", line 1188, in _check_jobs
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     self.dispatcher.stop(job, job_wrapper)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/server/lib/galaxy/jobs/handler.py", line 1267, in stop
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     self.job_runners[runner_name].stop_job(job_wrapper)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/server/lib/galaxy/jobs/runners/pulsar.py", line 773, in stop_job
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     client.kill()
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/venv/lib/python3.11/site-packages/pulsar/client/client.py", line 745, in kill
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     self._tes_client.cancel_task(self.job_id)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/venv/lib/python3.11/site-packages/pydantictes/api.py", line 45, in cancel_task
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     return TesCancelTaskResponse(**response.json())
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:                                    ^^^^^^^^^^^^^^^
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:   File "/srv/galaxy/venv/lib/python3.11/site-packages/requests/models.py", line 978, in json
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]:     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875355]: requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875356]: galaxy.jobs.handler DEBUG 2025-04-11 07:57:24,689 [pN:handler_1,p:875356,tN:JobHandlerStopQueue.monitor_thread] Stopping job 258 in pulsar_tes runner
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875356]: galaxy.jobs.runners.pulsar DEBUG 2025-04-11 07:57:24,693 [pN:handler_1,p:875356,tN:JobHandlerStopQueue.monitor_thread] Attempt remote Pulsar kill of job with url pulsar_tes and id 67f52cf35172e48cefc64499
Apr 11 07:57:24 galaxy-qa-nd-2 galaxyctl[875356]: galaxy.jobs.handler ERROR 2025-04-11 07:57:24,715 [pN:handler_1,p:875356,tN:JobHandlerStopQueue.monitor_thread] Exception in monitor_step

TESP log:

tesp-api       | 2025-04-11 08:03:10.812 | INFO     | uvicorn.protocols.http.h11_impl:send:431 - 147.251.245.115:37333 - "POST /v1/tasks/67f52cf35172e48cefc64499%3Acancel HTTP/1.1" 200
tesp-api       | 2025-04-11 08:03:10.820 | DEBUG    | asyncio.selector_events:_read_ready__on_eof:885 - <_SelectorSocketTransport fd=12 read=polling write=<idle, bufsize=0>> received EOF
tesp-api       | 2025-04-11 08:03:11.426 | DEBUG    | asyncio.selector_events:_accept_connection:161 - <Server sockets=(<asyncio.TransportSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 8080)>,)> got a new connection from ('147.251.245.115', 21065): <socket.socket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.16.4', 8080), raddr=('147.251.245.115', 21065)>

Temporary fix:
Changing job status from deleting to deleted in Galaxy SQL database.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions