Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Leaking dial errors #119

Closed
Stebalien opened this issue Apr 21, 2019 · 9 comments · Fixed by libp2p/go-libp2p-kad-dht#328
Closed

Leaking dial errors #119

Stebalien opened this issue Apr 21, 2019 · 9 comments · Fixed by libp2p/go-libp2p-kad-dht#328
Assignees
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@Stebalien
Copy link
Member

I'm seeing a massive memory leak leaking errors returned by doDial. Not sure where they're getting stashed, maybe in the dialer somewhere?

@Stebalien Stebalien added the kind/bug A bug in existing code (including security flaws) label Apr 21, 2019
@vyzo
Copy link
Contributor

vyzo commented Apr 21, 2019

Maybe introduced by #115? We do aggregate all errors now.

@vyzo
Copy link
Contributor

vyzo commented Apr 21, 2019

Is there any more detail?

@vyzo
Copy link
Contributor

vyzo commented Apr 21, 2019

Also are there any stuck goroutines precluding the dial from finishing and accumulating all these errors?
I don't see any place where the errors could be stashed, other than the multierror contraption.

@vyzo
Copy link
Contributor

vyzo commented Apr 21, 2019

Maybe stuck in a channel though.

edit: that would correlate well with a stuck goroutine precluding the dial from finishing.

@raulk
Copy link
Member

raulk commented Apr 21, 2019

Yeah, was thinking the same as @vyzo. Could you try with the commit previous to that PR being merged? Not sure if it’s a leak (us unintentionally retaining state) or just a side effect of accumulating all errors from the dial process (especially in the face of addrexplosion).

@vyzo
Copy link
Contributor

vyzo commented Apr 21, 2019

Can you also check for signs of libp2p/go-libp2p-quic-transport#57? We do happen to have an open bug with stuck dials.

@Stebalien
Copy link
Member Author

Maybe introduced by #115? We do aggregate all errors now.

This has almost certainly made it worse. However, we shouldn't be storing these errors anywhere.

Can you also check for signs of libp2p/go-libp2p-quic-transport#57? We do happen to have an open bug with stuck dials.

It's probably not that. This error is allocated after all the transport dials finish so a stuck dial would prevent it.

@raulk
Copy link
Member

raulk commented Apr 22, 2019

It could be the event logging of errors -- the weight of which is now augmented by multierrors. @Stebalien can you generate annotated source for these allocs?

@Stebalien
Copy link
Member Author

I don't have the traces but the buildup came from

return nil, fmt.Errorf("dial attempt failed: %s", err)
.

Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this issue Apr 24, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
@ghost ghost assigned Stebalien Apr 24, 2019
@ghost ghost added the status/in-progress In progress label Apr 24, 2019
Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this issue Apr 24, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
Stebalien added a commit that referenced this issue Apr 24, 2019
Instead of storing _every_ error, store at most 32 errors (plus a "too many errors" error).

Helps address #119
@ghost ghost removed the status/in-progress In progress label Apr 24, 2019
Stebalien added a commit that referenced this issue Apr 24, 2019
* Limits the number of dial errors we keep
* Exposes the dial errors
* Avoids string formatting unless we actually need to.

Helps address #119
Stebalien added a commit that referenced this issue Apr 24, 2019
* Limits the number of dial errors we keep
* Exposes the dial errors
* Avoids string formatting unless we actually need to.

Helps address #119
Stebalien added a commit that referenced this issue Apr 24, 2019
* Limits the number of dial errors we keep
* Exposes the dial errors
* Avoids string formatting unless we actually need to.

Helps address #119
aarshkshah1992 pushed a commit to aarshkshah1992/go-libp2p-kad-dht that referenced this issue Aug 11, 2019
Long-running queries can build up large error sets that we never actually use.
This is exacerbated by libp2p/go-libp2p-swarm#115.

fixes libp2p/go-libp2p-swarm#119
marten-seemann pushed a commit to libp2p/go-libp2p that referenced this issue Apr 21, 2022
* Limits the number of dial errors we keep
* Exposes the dial errors
* Avoids string formatting unless we actually need to.

Helps address libp2p/go-libp2p-swarm#119
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants