Description
While studying asyncio to add the TLS shutdown in a non-hacky way, I noticed that the transport/protocol concept is quite misrepresented.
The documentation uses terms colliding with the ISO model. It leaves the impression that asyncio.BaseTransport
is related to the transport layer (TCP/UDP), and asyncio.BaseProtocol
is related to the application layer (HTTP, FTP).
However, the source code reveals that such an assumption is incorrect and the transport and the protocol are facing sides of links in the chain that starts from a data source outwards:
<<<--------------------------------------------------------------------------------------------<<<
(another I/O API) (inter-API wrapper) (for a user,
┌──────────────┐ pretends to be the transport)
│ StreamReader │<-+ ┌────────────────────────┐ ┌───────────────────────┐
└──────────────┘ | │ │<->│ _SSLProtocolTransport │
+->│ StreamReaderProtocol │ └───────────┐┄┄┄┄┄┄┄┄┄┄┄└─┐ ┌───────────⸾───
┌──────────────┐ | │ │ │ SSLProtocol │<->│ Transport ⸾ Unix socket
│ StreamWriter │<-+ └────────────────────────┘ └─────────────┘ └───────────⸾───
└──────────────┘ (for a socket,
pretends to be the protocol)
(daemon I/O processor)
┌──────────────────────────┐ ┌──────────────────────────⸾───
┌───────────────────────────┐ │ SubprocessStreamProtocol │ │ _UnixSubprocessTransport ⸾
│ │ ├──────────────────────────┤ ├──────────────────────────⸾ subprocess
│ _SendfileFallbackProtocol │ <-> ├──────────────────────────┤ <-> ├──────────────────────────⸾ module
│ │ │ + get_pipe_transport │ │ + _start ⸾
└───────────────────────────┘ │ + get_returncode │ └──────────────────────────⸾───
│ + pipe_connection_lost │
│ + pipe_data_received │
│ ... │
└──────────────────────────┘
As a result, I have an idea of doing some improvements and I'd like to get an opinion on whether anything should be changed or postponed to not break the world:
-
Merge
_SSLProtocolTransport
class intoSSLProtocol
(need a feedback before a PR)The TLS codec is implemented as two tightly coupled classes-interfaces of the same socketless BIO (OpenSSL Binary I/O) object. Moreover,
_SSLProtocolTransport
is a thin do-nothing wrapper with methods like this:def get_extra_info(self, name, default=None): """Get optional transport information.""" return self._ssl_protocol._get_extra_info(name, default) def set_protocol(self, protocol): self._ssl_protocol._set_app_protocol(protocol) def get_protocol(self): return self._ssl_protocol._app_protocol
Backward compatibility: the public
SSLProtocol
doesn't change semantics or composition of already existing methods. For third-party hacks that add TLS start/shutdown toStreamWriter
, we can permanently soft deprecate_SSLProtocolTransport
and replace all of its content with the redirecting__setitem__
/__getitem__
. The permadeprecation is required because SSL switch-on/off is a common use case (STARSSL and others) so we can be sure that many private hacks exist here, both homebrew and enterprise. -
Extend https://docs.python.org/3/library/asyncio-protocol.html with a howto (need a feedback before a PR)
Currently, a concept of the transport/protocol looks as a byside internal detail that leaks through
loop
methods. However, this concept is powerful. A programmer can spawn a new socket/process transport from an event loop and start threading arbitrary amount of modules onto it.For now, the only such a module ("transprotocol") is
SSLProtocol
but there can be a stream compressor, a base64-cator, or a verifier/monitor/scanner. There is a huge potential of and for third party PyPI packages in a field of stream data processing. -
Extend
loop.start_tls
docs with an explanation on parameters (gh-92679: Clarify asyncio.loop.start_tls parameters #92682)The method requires
transport
andprotocol
parameters but gives no explanation what they're used for (to insert the TLS "transportocol" between them). -
Allow transports to have multiple outputs (need a feedback before a PR)
Transports like
*SubprocessTransport
naturally have multiple outputs, one for each standard stream. Currently, they're provided viaSubprocessStreamProtocol
with extra methods. However, we can officially recognize that any transport can have one anonymous or many named outputs and allow a user to connect any random protocol to any individual output.The named outputs can be presented as a plain dictionary with keys documented for each class. For example,
SubprocessTransport
could look like this:class UnixSubprocessTransport(...): def __init__(...) # [...] self.outputs = {stdin: ..., stdout: ..., stderr: ...} # [...]
Backward compatibility: single-output transports continue to behave the same so the compatibility is preserved.
-
Allow protocols to have multiple inputs (need a feedback before a PR)
With multiple outputs, there can be a need to merge them for further links of the chain. For example,
SubprocessTransport
's stdout and stderr.Backward compatibility: the same as above.
-
Generalize
SSLProtocol
(need a feedback before a PR)Currently, each transport defines a private
_start_tls_compatible
field used byloop.start_ssl
. However, theasyncio
module uses in-memory OpenSSL BIO (Binary I/O) that inherently allows to connect any transport and feed any data.Also, a programmer could create and insert
SSLProtocol
into whatever place of the chain they want. Since the class works passively, waiting for events on both sides, it isn't bound to theloop
so does not need a specialloop.start_tls
method.Backward compatibility: the
loop.start_tls
method may be left as a wrapper. -
Break the reference loop (need a feedback before a PR)
Currently protocols and transports that face each other hold mutual strong links. As a result, to destroy the whole chain a programmer needs to manually traverse it and break all links, or delegate it to the loop.
However, a chain has no sence if there is no consumer. This allows to either replace strong Transport → Interface references with the weak ones, or transfer to a model where a transport provides
self.on_data = lambda: 0
-like connection points, and a protocol connects/disconnects these points to itself by itself. As a result, a transport feeds data into the void hoping that there will be a listner (or listeners).Backward compatibility: [help needed]
Conclusion: the changes above will turn asyncio
into a powerful tool of asynchronous dataflow processing. Maybe (a little of pipe dream) there'll be a pandas-grade library on this topic.
Metadata
Metadata
Assignees
Projects
Status
Done