Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [3007.0] Channels are closed while listened to #66568

Open
2 of 9 tasks
AppCrashExpress opened this issue May 21, 2024 · 2 comments
Open
2 of 9 tasks

[BUG] [3007.0] Channels are closed while listened to #66568

AppCrashExpress opened this issue May 21, 2024 · 2 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@AppCrashExpress
Copy link

Description

Hello!

We are currently trying to update Saltstack to 3007.0.

This issue was tested on the commit ID: 31c9d0d

When using TCP transport to run something like salt-call grains.get fqdn, logs will contain two types of errors:

  • When TCP PubChannel is closed:

    2024-05-21 20:57:18,572 [salt.channel.client:279 ][ERROR   ][410050] Caught exception in PubChannel connect callback AttributeError("'NoneType' object has no attribute 'fire_event'")
    Traceback (most recent call last):
      File "/opt/salt/channel/client.py", line 542, in connect_callback
        self.event.fire_event({"master": self.opts["master"]}, "__master_connected")
        ^^^^^^^^^^^^^^^^^^^^^
    
  • When TCP ReqChannel (the one wrapped in SyncWrapper) is closed:

    2024-05-21 20:57:19,893 [salt.channel.client:279 ][DEBUG   ][410050] Closing AsyncReqChannel instance
    2024-05-21 20:57:19,895 [salt.loader.lazy :279 ][PROFILE ][410050] LazyLoader utils deepcopy with deepcopy took 0.001188
    2024-05-21 20:57:19,913 [asyncio          :1821][ERROR   ][410050] Task was destroyed but it is pending!
    task: <Task pending name='Task-7' coro=<RequestClient._stream_return() done, defined at /opt/salt/transport/tcp.py:1756> wait_for=<Future finished exception=StreamClosedError('Stream is closed')>>
    

Setup
This is shortened configuration, since we use patched custom installation and I'm not sure what I'm allowed to show:

master:
  - salt-master.example.org

ipv6: true
transport: tcp 
random_master: True

log_level: debug

Should you find it insufficient, please let me know.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior

  1. Install salt-master 3007 on remote
  2. Install salt-minion 3007 locally
  3. Run salt-call -l info grains.get fqdn on local machine

Running it you should see something akin to:

[ERROR   ] Caught exception in PubChannel connect callback AttributeError("'NoneType' object has no attribute 'fire_event'")
Traceback (most recent call last):
  File "/opt/salt/channel/client.py", line 542, in connect_callback
    self.event.fire_event({"master": self.opts["master"]}, "__master_connected")
    ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'fire_event'
...
local:
    salt-minion.example.org
Task was destroyed but it is pending!
task: <Task pending name='Task-7' coro=<RequestClient._stream_return() running at /opt/salt/transport/tcp.py:1761> wait_for=<Future finished exception=StreamClosedError('Stream is closed')>>

Expected behavior
Closing channels and transports should properly wait on underlying streams before closing them fully.

Screenshots
Probably inapplicable.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3007.0
 
Python Version:
        Python: 3.12.3 (main, May 13 2024, 10:19:24) [Clang 16.0.6 ]
 
Dependency Versions:
          cffi: 1.16.0
      cherrypy: Not Installed
      dateutil: 2.9.0.post0
     docker-py: 7.0.0
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.4
       libgit2: Not Installed
  looseversion: 1.3.0
      M2Crypto: 0.38.0
          Mako: 1.3.3
       msgpack: 1.0.8
  msgpack-pure: Not Installed
  mysql-python: 1.4.6
     packaging: 21.3
     pycparser: 2.22
      pycrypto: Not Installed
  pycryptodome: Not Installed
        pygit2: Not Installed
  python-gnupg: 0.5.2
        PyYAML: 5.4.1
         PyZMQ: 25.1.2
        relenv: Not Installed
         smmap: 5.0.1
       timelib: 0.3.0
       Tornado: 6.4
           ZMQ: 4.1.2
 
Salt Package Information:
  Package Type: Not Installed
 
System Versions:
          dist: ubuntu 22.04 jammy
        locale: utf-8
       machine: x86_64
       release: 5.15.0-30-generic
        system: Linux
       version: Ubuntu 22.04 jammy

Additional context

I was able to trace examples of code lines, where the errors happen.

For PubChannel, it happens in SMinion during master evaluation. Closing channel will both close underlying transport and destroy event queue as seen here: https://github.com/saltstack/salt/blob/v3007.0/salt/channel/client.py#L463-L470
However connect_callback and, by extension, self.event.fire_event still seem to run, causing None.fire_event

Somewhat same thing happens with SyncWrapper, where, as seen in RemotePillar, it both calls close method on AsyncReqChannel and destroys io_loop passed to it. However, as seen in error, underlying TCP transport fails to process StreamClosedError in time, leaving it as a Task in closed io_loop

Technically it also happens in TCP transport, but try catch wrapper simply catches StreamClosedError and puts it out as debug message.

@AppCrashExpress AppCrashExpress added Bug broken, incorrect, or confusing behavior needs-triage labels May 21, 2024
@AppCrashExpress
Copy link
Author

Also a seperate issue that doesn't really deserve a full write up (in my eyes), but TCP RequestClient needs to set self._closing = True to avoid TransportWarning: Unclosed transport! in logs

@AppCrashExpress AppCrashExpress changed the title [BUG] Channels are closed while listened to in 3007.0 [BUG] [3007.0] Channels are closed while listened to Jun 13, 2024
@Sxderp
Copy link

Sxderp commented Jul 26, 2024

Yeah, this causes issues with TCP transport. Returns and jobs aren't working properly. There's some TCP close loop or something another. This is even more problematic in that the internal messaging is hard coded to tcp [1].

Unfortunately I can't figure out how to fix /this/ issue. It's beyond me.

[1] Probably because zeromq was broken .. (#66751) and chose to just disable zeromq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants