Skip to content

Conversation

@lla-dane
Copy link
Contributor

@lla-dane lla-dane commented Dec 1, 2025

Comment on lines 582 to 585
public_addrs = [f"/ip4/13.126.88.127/tcp/{port}/p2p/{host.get_id()}"]

server_id, bearer = http_peer_id_auth(host.get_private_key(), key_auth, public_addrs)
Copy link
Contributor Author

@lla-dane lla-dane Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-TLS will only work with public-routable multiaddrs, so here this ip 13.126.88.127 is from an aws-ec2 instance that I was using for testing this PR.

@lla-dane
Copy link
Contributor Author

lla-dane commented Dec 4, 2025

So presently the auto-tls code is running correctly upto fetching the DNS-01 challenge from the ACME servers, and completing the peer-id authentication with AUTO-TLS broker (registration.libp2p.direct). After that there is an error coming up when the AUTO-TLS Broker tries to dial-in our node. The logs are like this:

image

As seen in the last lines of these logs the inbound connection is failing.
So I did a tcp-dump to see what was going on, basically to see the multistream-select handshakes coming from the broker node. They were like this:

image

as seen from this particular packet-log:

18:07:50.937274 ens5  In  IP (tos 0x0, ttl 57, id 48811, offset 0, flags [DF], proto TCP (6), length 84)
    ec2-18-188-47-119.us-east-2.compute.amazonaws.com.43378 > ip-172-31-35-71.ap-south-1.compute.internal.9000: Flags [P.], cksum 0xb207 (correct), seq 1:33, ack 1, win 71, options [nop,nop,TS val 768382321 ecr 3436111510], length 32
	0x0000:  4500 0054 beab 4000 3906 715f 12bc 2f77  E..T..@.9.q_../w
	0x0010:  ac1f 2347 a972 2328 d891 3c5e 57f2 4f65  ..#G.r#(..<^W.Oe
	0x0020:  8018 0047 b207 0000 0101 080a 2dcc 9571  ...G........-..q
	0x0030:  ccce e696 132f 6d75 6c74 6973 7472 6561  ...../multistrea
	0x0040:  6d2f 312e 302e 300a 0b2f 746c 732f 312e  m/1.0.0../tls/1.
	0x0050:  302e 300a                                0.0. 

The inbound connection, was negotiation for /tls/1.0.0 instead of /noise. So I guess this is why the connection was rejected.

@seetadev

@seetadev
Copy link
Contributor

seetadev commented Dec 8, 2025

So presently the auto-tls code is running correctly upto fetching the DNS-01 challenge from the ACME servers, and completing the peer-id authentication with AUTO-TLS broker (registration.libp2p.direct). After that there is an error coming up when the AUTO-TLS Broker tries to dial-in our node. The logs are like this:

image As seen in the last lines of these logs the inbound connection is failing. So I did a tcp-dump to see what was going on, basically to see the multistream-select handshakes coming from the broker node. They were like this: image as seen from this particular packet-log:
18:07:50.937274 ens5  In  IP (tos 0x0, ttl 57, id 48811, offset 0, flags [DF], proto TCP (6), length 84)
    ec2-18-188-47-119.us-east-2.compute.amazonaws.com.43378 > ip-172-31-35-71.ap-south-1.compute.internal.9000: Flags [P.], cksum 0xb207 (correct), seq 1:33, ack 1, win 71, options [nop,nop,TS val 768382321 ecr 3436111510], length 32
	0x0000:  4500 0054 beab 4000 3906 715f 12bc 2f77  E..T..@.9.q_../w
	0x0010:  ac1f 2347 a972 2328 d891 3c5e 57f2 4f65  ..#G.r#(..<^W.Oe
	0x0020:  8018 0047 b207 0000 0101 080a 2dcc 9571  ...G........-..q
	0x0030:  ccce e696 132f 6d75 6c74 6973 7472 6561  ...../multistrea
	0x0040:  6d2f 312e 302e 300a 0b2f 746c 732f 312e  m/1.0.0../tls/1.
	0x0050:  302e 300a                                0.0. 

The inbound connection, was negotiation for /tls/1.0.0 instead of /noise. So I guess this is why the connection was rejected.

@seetadev

@lla-dane : Hi Abhinav. Fantastic progress on autotls module.

Thank you so much for sharing the details. Appreciate it.

Wish to ask if you found the fix in trio.py.

Please also resolve the ci/cd issues whenever you get a chance.

lla-dane and others added 2 commits December 12, 2025 13:49
- Enhanced get_remote_address() in TrioTCPStream with address caching
  and defensive checks to handle socket state transitions gracefully
- Fixed Ed25519PublicKey initialization to use from_bytes() method
- Added proper type annotation for server_id: ID | None
- Added None check for hostname before passing to ClientInitiatedHandshake
- Removed unused variables (commented with explanations for future use)
- Removed dead code (unused function calls with hardcoded port)
- Removed debug print statements in favor of proper logging
- Fixed code formatting, import ordering, and line length violations

This resolves the get_remote_address() exception that was occurring
when the Auto-TLS broker dials back into the node.

Fixes issue reported in PR libp2p#1072 comments.
@acul71
Copy link
Contributor

acul71 commented Dec 13, 2025

Fixes for Auto-TLS PR #1072

This commit addresses the get_remote_address() exception and resolves all linting/type checking issues.

🔧 Main Fix: Enhanced get_remote_address() Implementation

Problem: When the Auto-TLS broker dialed back, get_remote_address() was throwing exceptions, causing connection failures.

Solution: Enhanced the method with address caching, defensive checks, and improved error handling.

Key Changes in libp2p/io/trio.py:

# Added caching to handle socket state transitions
_cached_remote_address: tuple[str, int] | None

def get_remote_address(self) -> tuple[str, int] | None:
    # Return cached value if available
    if self._cached_remote_address is not None:
        return self._cached_remote_address
    
    # Defensive checks before accessing socket
    if not hasattr(self.stream, "socket"):
        logger.debug("SocketStream has no 'socket' attribute")
        return None
    
    socket = self.stream.socket
    if socket is None:
        logger.debug("Socket is None")
        return None
    
    # Get and cache remote address
    remote_addr = socket.getpeername()
    # ... validation and caching logic ...

🐛 Type Error Fixes

  1. Ed25519PublicKey initialization - Use from_bytes() instead of direct constructor
  2. server_id type annotation - Added self.server_id: ID | None = None
  3. hostname None check - Added validation before passing to ClientInitiatedHandshake

🧹 Code Quality

  • Removed unused variables (commented with explanations)
  • Removed dead code (get_available_interfaces(8000) calls)
  • Removed debug print() statements
  • Fixed code formatting and import ordering

✅ All Checks Pass

  • ✅ Linting: All ruff checks pass
  • ✅ Type checking: All pyrefly errors resolved
  • ✅ Tests: All 1744 tests pass
  • ✅ Pre-commit hooks: All pass

This should resolve the broker dial-back issue. Ready for testing! 🚀

acul71 and others added 5 commits December 13, 2025 04:15
Add examples.autotls to the examples.rst toctree to resolve
the documentation build warning about the document not being
included in any toctree.
Add the auto-generated examples.autotls.rst file to the repository
so that ReadTheDocs can find it when building the documentation.
This file is generated by sphinx-apidoc and is referenced in the
examples.rst toctree.
Comment on lines 51 to 105
async def negotiate(
self,
communicator: IMultiselectCommunicator,
negotiate_timeout: int = DEFAULT_NEGOTIATE_TIMEOUT,
) -> tuple[TProtocol | None, StreamHandlerFn | None]:
"""
Negotiate performs protocol selection.
:param stream: stream to negotiate on
:param negotiate_timeout: timeout for negotiation
:return: selected protocol name, handler function
:raise MultiselectError: raised when negotiation failed
"""
try:
with trio.fail_after(negotiate_timeout):
await self.handshake(communicator)

while True:
try:
print("\nNEGOTIATE LOOP")
command = await communicator.read()
print("COMMAND: ", command)
except MultiselectCommunicatorError as error:
print("ERROR IN NEGOTIATE READ")
raise MultiselectError() from error

if command == "ls":
supported_protocols = [
p for p in self.handlers.keys() if p is not None
]
response = "\n".join(supported_protocols) + "\n"

try:
await communicator.write(response)
except MultiselectCommunicatorError as error:
raise MultiselectError() from error

else:
protocol_to_check = None if not command else TProtocol(command)
if protocol_to_check in self.handlers:
try:
await communicator.write(command)
except MultiselectCommunicatorError as error:
raise MultiselectError() from error

return protocol_to_check, self.handlers[protocol_to_check]
try:
await communicator.write(PROTOCOL_NOT_FOUND_MSG)
print("PROTOCOL NOT IN HANDLERS: ", command)

except MultiselectCommunicatorError as error:
print("ERROR IN NEGOTIATE WRITE")
raise MultiselectError() from error

raise MultiselectError("Negotiation failed: no matching protocol")
Copy link
Contributor Author

@lla-dane lla-dane Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugged further and found an issue happening here:

These are the logs:
Image

as we see the broker wrote tls/1.0.0 and we wrote back na as we did not had the handler for tls, so now after this, the loop should have continued, and the broker should try for another security option, but rather we got a read error.

@seetadev @acul71

Copy link
Contributor Author

@lla-dane lla-dane Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this does not happen, when I dialed back to our python node, from a go-libp2p node.

image

Here the negotiation continued after this log

NEGOTIATE LOOP
COMMAND:  /tls/1.0.0
PROTOCOL NOT IN HANDLERS:  /tls/1.0.0

but the same thing does happen when the auto-tls broker dials in. I dont understand why this happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lla-dane I can't see the full images, can you please include the logs. with full commands, and output in text .
And a clear explanation to how to do the test and what you expect.
I'm confused sometimes I see echo and ping why ?
thanks

@lla-dane
Copy link
Contributor Author

lla-dane commented Dec 19, 2025

Yeah sure @acul71, I will explain everything properly.

So in the autotls procedure, the autotls-broker has to dial in our node (which has to bee publicly accesible) and run identify protocol on our node, too see that our node is real or not.

So presently when the autotls-broker is dialing in our node, there is some issue happening in the multiselect-stream protocol negotiation.

LOGS:

(.venv) ubuntu@ip-172-31-35-71:~/py-libp2p$ autotls-demo 
Listener ready, listening on:

/ip4/172.31.35.71/tcp/52577/p2p/12D3KooWQHYxsdkXCNZXw3qtuby71PKvj8AqfjBp6y7kVoq9CfmP
/ip4/127.0.0.1/tcp/52577/p2p/12D3KooWQHYxsdkXCNZXw3qtuby71PKvj8AqfjBp6y7kVoq9CfmP

Run this from the same folder in another console:

autotls-demo -d /ip4/172.31.35.71/tcp/52577/p2p/12D3KooWQHYxsdkXCNZXw3qtuby71PKvj8AqfjBp6y7kVoq9CfmP -psk 0 -t tcp

Waiting for incoming connection...

Base36 PeerID: k51qzi5uqu5dljhbovkdodhhbbgiz9q6kd92rw0u24gs8ti28ncbl58mtw5gri

GENERATING RSA-KEY (2048)...
STARTING ACME ACCOUTN CREATION SEQUENCE...

ACCOUNT-URL: https://acme-staging-v02.api.letsencrypt.org/acme/acct/251768963
ORDER-URL:  https://acme-staging-v02.api.letsencrypt.org/acme/order/251768963/29664454213
AUTH-URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/251768963/20774698483
FINALIZE-URL:  https://acme-staging-v02.api.letsencrypt.org/acme/finalize/251768963/29664454213

GETTING THE DNS-01 CHALLENGE FROM ACME...

CHALL-URL:  https://acme-staging-v02.api.letsencrypt.org/acme/chall/251768963/20774698483/KF7sIg
DNS-TOKEN:  zKPMtnHad1scCwuE1qOUdrEFESt0jDL0Y8VeEUhuVC8
JWK-THUMBPRINT:  0u519mEYgNxqfvo8pPIaT1blGieK3Yw-8HKLCQm8zRQ
KEY-AUTH:  sUFBRPBv-L4GWkM9lyzkcEB14ucoCOhvFQKQqHoqYxA

INITIATION PEER-ID AUTHENTICATION WITH AUTO-TLS BROKER...

 {'User-Agent': 'py-libp2p/example/autotls', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'libp2p-PeerID opaque="_g6BGCUng3W7suhCmo7HxtP7BTUhPcAvftVUlmh4ZFV7ImNsaWVudC1wdWJsaWMta2V5IjoiQ0FFU0lOYjZnVkNsVFExa2dISjdzK1crWDRmaDhLdUVNWUJyTEU0VEdTV2ZBdmdlIiwiY2hhbGxlbmdlLWNsaWVudCI6IjNMS2hIbmtHVmtVV2d1eEtwOTBNWURaUkdMVHl1akI2al9nWXRxX1V2V0k9IiwiaG9zdG5hbWUiOiJyZWdpc3RyYXRpb24ubGlicDJwLmRpcmVjdCIsImNyZWF0ZWQtdGltZSI6IjIwMjUtMTItMTlUMDM6MTU6MDUuMzIxMzU3ODExWiJ9", sig="UhgmGw1e_qjnl4o4vBRfwlpUmEO6Ttdg4oedG5CAYkV9Unnp7QsHAQB5I_PuCDgSSMyQvm63Lzf-1VGGcP3JCQ=="', 'Content-Length': '160'}

 {"value": "sUFBRPBv-L4GWkM9lyzkcEB14ucoCOhvFQKQqHoqYxA", "addresses": ["/ip4/13.126.88.127/tcp/52577/p2p/12D3KooWQHYxsdkXCNZXw3qtuby71PKvj8AqfjBp6y7kVoq9CfmP"]}

SERVER_PEER_ID:  12D3KooWAtWdWqQkWFqkWMix92dTmG6mEC781isvRfarhneMSqUy
BEARER TOKEN:  UJ_k2P9S1oqOMAAdNzsfIn-21L0f_pUQ1gVPvTir6397ImlzLXRva2VuIjp0cnVlLCJwZWVyLWlkIjoiMTJEM0tvb1dRSFl4c2RrWENOWlh3M3F0dWJ5NzFQS3ZqOEFxZmpCcDZ5N2tWb3E5Q2ZtUCIsImhvc3RuYW1lIjoicmVnaXN0cmF0aW9uLmxpYnAycC5kaXJlY3QiLCJjcmVhdGVkLXRpbWUiOiIyMDI1LTEyLTE5VDAzOjE1OjA1Ljg1NTg2NzEzMloifQ==

GOT A STREAM
GET-REMOTE-ADDR:  <trio.socket.socket fd=10, family=2, type=1, proto=6, laddr=('172.31.35.71', 52577)>
REOMTE_ADDR:  None
GOT THE STREAM HERE NOW
GOING FOR UPGRADING THE INBOUND RAW CONN...
GOING TO UPGRADE INBOUND CONN
SECURE-INBOUND

NOT THE INITIATOR, SO NEGOTIATING..

NEGOTIATE LOOP
COMMAND:  /tls/1.0.0
PROTOCOL NOT IN HANDLERS:  /tls/1.0.0

NEGOTIATE LOOP
ERROR IN NEGOTIATE READ

INBOUND CONNECTION CAME AND THREW ERROR
failed to upgrade security for peer at /ip4/172.31.35.71/tcp/52577

These are the first logs. There are basically to run the autotls-demo script. Here we got dialed in here in this part GOT A STREAM by the broker, after this it seems from the logs that, the broker wrote tls/1.0.0 for security upgrade, and then out node wrote back with na in this log: PROTOCOL NOT IN HANDLERS: /tls/1.0.0, as it did not have tls transport, and after that somehow the connection gets dropped, which shouldn't happen.

@lla-dane
Copy link
Contributor Author

lla-dane commented Dec 19, 2025

Since the p2p-forge autotls-broker repo: https://github.com/ipshipyard/p2p-forge, uses go-libp2p, I dialed in our node from a go-libp2p node to see what happens during the multistream-select protocol neogtiation.

DIALER:

shelby@soiarch ~/Desktop/libp2p/go/go-libp2p/examples/echo ❯ ./echo -l 9000 -d /ip4/192.168.31.130/tcp/37465/p2p/12D3KooWQKT71wATmA8guPXkKvnMMcFDDCz5XPWeKT1JnwKSTbw5
2025/12/19 08:52:29 I am /ip4/127.0.0.1/tcp/9000/p2p/QmZuswjvmoVeVHr8URgoXgkMp5kWgQuj1WqN8cbJtydg9P
2025/12/19 08:52:29 sender opening stream
2025/12/19 08:52:29 failed to negotiate protocol: protocols not supported: [/echo/1.0.0]

LISTENER:

(.venv) shelby@soiarch ~/Desktop/libp2p/py-libp2p ❯ ping-demo 
Listener ready, listening on:

/ip4/192.168.31.130/tcp/37465/p2p/12D3KooWQKT71wATmA8guPXkKvnMMcFDDCz5XPWeKT1JnwKSTbw5
/ip4/192.168.122.1/tcp/37465/p2p/12D3KooWQKT71wATmA8guPXkKvnMMcFDDCz5XPWeKT1JnwKSTbw5
/ip4/127.0.0.1/tcp/37465/p2p/12D3KooWQKT71wATmA8guPXkKvnMMcFDDCz5XPWeKT1JnwKSTbw5

Run this from the same folder in another console:

ping-demo -d /ip4/192.168.31.130/tcp/37465/p2p/12D3KooWQKT71wATmA8guPXkKvnMMcFDDCz5XPWeKT1JnwKSTbw5 -psk 0 -t tcp

Waiting for incoming connection...

GOT A STREAM
GET-REMOTE-ADDR:  <trio.socket.socket fd=11, family=2, type=1, proto=6, laddr=('192.168.31.130', 37465), raddr=('192.168.31.130', 38950)>
REOMTE_ADDR:  ('192.168.31.130', 38950)
GOT THE STREAM HERE NOW
GOING FOR UPGRADING THE INBOUND RAW CONN...
GOING TO UPGRADE INBOUND CONN
SECURE-INBOUND

NOT THE INITIATOR, SO NEGOTIATING..

NEGOTIATE LOOP
COMMAND:  /tls/1.0.0
PROTOCOL NOT IN HANDLERS:  /tls/1.0.0

NEGOTIATE LOOP
COMMAND:  /noise
PROTOCOL:  /noise
TRANSPORT SELECTED
GET-REMOTE-ADDR:  <trio.socket.socket fd=11, family=2, type=1, proto=6, laddr=('192.168.31.130', 37465), raddr=('192.168.31.130', 38950)>
GET-REMOTE-ADDR:  <trio.socket.socket fd=11, family=2, type=1, proto=6, laddr=('192.168.31.130', 37465), raddr=('192.168.31.130', 38950)>

NEGOTIATE LOOP
COMMAND:  /ipfs/id/1.0.0
GET-REMOTE-ADDR:  <trio.socket.socket fd=11, family=2, type=1, proto=6, laddr=('192.168.31.130', 37465), raddr=('192.168.31.130', 38950)>

SOME ADDR:  /ip4/192.168.31.130/tcp/38950

NEGOTIATE LOOP
COMMAND:  /echo/1.0.0
PROTOCOL NOT IN HANDLERS:  /echo/1.0.0

NEGOTIATE LOOP
Error in handle_incoming for peer QmZuswjvmoVeVHr8URgoXgkMp5kWgQuj1WqN8cbJtydg9P: IncompleteReadError: {'requested_count': 2, 'received_count': 0}
ERROR IN NEGOTIATE READ

for just debugging purpose, I dialed to our py-libp2p node from the echo example of go-libp2p. I just needed to see how the multistream-select protocol negotiation goes.
Now as we see from the LISTENER logs, the go-libp2p node wrote "tls/1.0.0", and our node again wrote back with na, but here go-libp2p node comes again with noise.
Now this particular thing did not happened when the autotls-broker dialed in. So this is mainly the issue. We have to make sure that the broker node's dial in goes successfully.

@lla-dane
Copy link
Contributor Author

@acul71: For testing, I have DM'd you the ec2 instance keys and how to connect to the instance on discord. There you can simply run the autotls-demo command in the py-libp2p repo, and everything will run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AutoTLS Support for py-libp2p

3 participants