Skip to content

request_response (at least!) cancels its own dials due to race condition #5996

Open
@MatthiasvB

Description

@MatthiasvB

Summary

I'm building a file sharing app that uses request_response to actually transport the bits.

There are many conditions when this fails, one of them is when I have multiple files queued up, discover the target node, and try sending them all at once.

It seems some weird interaction of all these request_response threads started at once ends up cancelling all of them, something to do with peer conditions (see logs below)

I have to "manually" take great care to send all queued up files sequentially, in order for dialing not to break

In the logs, lines such as Is dialing: false --- Is connected: false are logged from a patched version of this code

Expected behavior

I should not have to care about when a request is sent

Actual behavior

I can not initiate multiple file transfers using request_response at once if the target peer isn't connected yet, because dialing fails

Relevant log output

mDNS discovered a new peer: 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw
Currently pending files: 5
Sending chunk
Sending chunk
Sending chunk
Sending chunk
Sending chunk
Is dialing: false  ---  Is connected: false
Is dialing: true  ---  Is connected: false
Dial peer condition false: DisconnectedAndNotDialing
Is dialing: true  ---  Is connected: false
Dial peer condition false: DisconnectedAndNotDialing
Is dialing: true  ---  Is connected: false
Dial peer condition false: DisconnectedAndNotDialing
Is dialing: true  ---  Is connected: false
Dial peer condition false: DisconnectedAndNotDialing
Outbound request failed to peer 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw with error: Failed to dial the requested peer
Dial Failure
Outbound request failed to peer 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw with error: Failed to dial the requested peer
Dial Failure
Outbound request failed to peer 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw with error: Failed to dial the requested peer
Dial Failure
Outbound request failed to peer 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw with error: Failed to dial the requested peer
Dial Failure
Outbound request failed to peer 12D3KooWD9smX6tANTiVamKw9KKLKgBfzq2Hx2fzNUnNHRc4NAFw with error: Failed to dial the requested peer
Dial Failure

Possible Solution

No response

Version

0.55.0

Would you like to work on fixing this bug?

No

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions