Skip to content

Clarify asyncio API in a backward compatible manner #92679

Closed as not planned
Closed as not planned
@arhadthedev

Description

@arhadthedev

While studying asyncio to add the TLS shutdown in a non-hacky way, I noticed that the transport/protocol concept is quite misrepresented.

The documentation uses terms colliding with the ISO model. It leaves the impression that asyncio.BaseTransport is related to the transport layer (TCP/UDP), and asyncio.BaseProtocol is related to the application layer (HTTP, FTP).

However, the source code reveals that such an assumption is incorrect and the transport and the protocol are facing sides of links in the chain that starts from a data source outwards:

<<<--------------------------------------------------------------------------------------------<<<

(another I/O API)       (inter-API wrapper)      (for a user,
┌──────────────┐                                  pretends to be the transport)
│ StreamReader │<-+  ┌────────────────────────┐   ┌───────────────────────┐
└──────────────┘  |  │                        │<->│ _SSLProtocolTransport │
                  +->│  StreamReaderProtocol  │   └───────────┐┄┄┄┄┄┄┄┄┄┄┄└─┐   ┌───────────⸾───
┌──────────────┐  |  │                        │               │ SSLProtocol │<->│ Transport ⸾ Unix socket
│ StreamWriter │<-+  └────────────────────────┘               └─────────────┘   └───────────⸾───
└──────────────┘                                    (for a socket,
                                                    pretends to be the protocol)
    (daemon I/O processor)
                                  ┌──────────────────────────┐     ┌──────────────────────────⸾───
┌───────────────────────────┐     │ SubprocessStreamProtocol │     │ _UnixSubprocessTransport ⸾
│                           │     ├──────────────────────────┤     ├──────────────────────────⸾ subprocess
│ _SendfileFallbackProtocol │ <-> ├──────────────────────────┤ <-> ├──────────────────────────⸾  module
│                           │     │ + get_pipe_transport     │     │ + _start                 ⸾
└───────────────────────────┘     │ + get_returncode         │     └──────────────────────────⸾───
                                  │ + pipe_connection_lost   │
                                  │ + pipe_data_received     │
                                  │   ...                    │
                                  └──────────────────────────┘

As a result, I have an idea of doing some improvements and I'd like to get an opinion on whether anything should be changed or postponed to not break the world:

  • Merge _SSLProtocolTransport class into SSLProtocol (need a feedback before a PR)

    The TLS codec is implemented as two tightly coupled classes-interfaces of the same socketless BIO (OpenSSL Binary I/O) object. Moreover, _SSLProtocolTransport is a thin do-nothing wrapper with methods like this:

    def get_extra_info(self, name, default=None):
        """Get optional transport information."""
        return self._ssl_protocol._get_extra_info(name, default)
    
    def set_protocol(self, protocol):
        self._ssl_protocol._set_app_protocol(protocol)
    
    def get_protocol(self):
        return self._ssl_protocol._app_protocol

    Backward compatibility: the public SSLProtocol doesn't change semantics or composition of already existing methods. For third-party hacks that add TLS start/shutdown to StreamWriter, we can permanently soft deprecate _SSLProtocolTransport and replace all of its content with the redirecting __setitem__/__getitem__. The permadeprecation is required because SSL switch-on/off is a common use case (STARSSL and others) so we can be sure that many private hacks exist here, both homebrew and enterprise.

  • Extend https://docs.python.org/3/library/asyncio-protocol.html with a howto (need a feedback before a PR)

    Currently, a concept of the transport/protocol looks as a byside internal detail that leaks through loop methods. However, this concept is powerful. A programmer can spawn a new socket/process transport from an event loop and start threading arbitrary amount of modules onto it.

    For now, the only such a module ("transprotocol") is SSLProtocol but there can be a stream compressor, a base64-cator, or a verifier/monitor/scanner. There is a huge potential of and for third party PyPI packages in a field of stream data processing.

  • Extend loop.start_tls docs with an explanation on parameters (gh-92679: Clarify asyncio.loop.start_tls parameters #92682)

    The method requires transport and protocol parameters but gives no explanation what they're used for (to insert the TLS "transportocol" between them).

  • Allow transports to have multiple outputs (need a feedback before a PR)

    Transports like *SubprocessTransport naturally have multiple outputs, one for each standard stream. Currently, they're provided via SubprocessStreamProtocol with extra methods. However, we can officially recognize that any transport can have one anonymous or many named outputs and allow a user to connect any random protocol to any individual output.

    The named outputs can be presented as a plain dictionary with keys documented for each class. For example, SubprocessTransport could look like this:

    class UnixSubprocessTransport(...):
    
        def __init__(...)
            # [...]
            self.outputs = {stdin: ..., stdout: ..., stderr: ...}
            # [...]

    Backward compatibility: single-output transports continue to behave the same so the compatibility is preserved.

  • Allow protocols to have multiple inputs (need a feedback before a PR)

    With multiple outputs, there can be a need to merge them for further links of the chain. For example, SubprocessTransport's stdout and stderr.

    Backward compatibility: the same as above.

  • Generalize SSLProtocol (need a feedback before a PR)

    Currently, each transport defines a private _start_tls_compatible field used by loop.start_ssl. However, the asyncio module uses in-memory OpenSSL BIO (Binary I/O) that inherently allows to connect any transport and feed any data.

    Also, a programmer could create and insert SSLProtocol into whatever place of the chain they want. Since the class works passively, waiting for events on both sides, it isn't bound to the loop so does not need a special loop.start_tls method.

    Backward compatibility: the loop.start_tls method may be left as a wrapper.

  • Break the reference loop (need a feedback before a PR)

    Currently protocols and transports that face each other hold mutual strong links. As a result, to destroy the whole chain a programmer needs to manually traverse it and break all links, or delegate it to the loop.

    However, a chain has no sence if there is no consumer. This allows to either replace strong Transport → Interface references with the weak ones, or transfer to a model where a transport provides self.on_data = lambda: 0-like connection points, and a protocol connects/disconnects these points to itself by itself. As a result, a transport feeds data into the void hoping that there will be a listner (or listeners).

    Backward compatibility: [help needed]


Conclusion: the changes above will turn asyncio into a powerful tool of asynchronous dataflow processing. Maybe (a little of pipe dream) there'll be a pandas-grade library on this topic.


cc @1st1 @asvetlov as asyncio experts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12bugs and security fixesdocsDocumentation in the Doc dirtopic-asynciotype-featureA feature request or enhancement

    Projects

    • Status

      Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions