Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak #76

Closed
StefanKarpinski opened this issue Mar 22, 2015 · 27 comments
Closed

memory leak #76

StefanKarpinski opened this issue Mar 22, 2015 · 27 comments
Labels

Comments

@StefanKarpinski
Copy link
Contributor

Example code:

pid = getpid()
vsz(s) = println(s*split(open(readall,`ps -p $pid -o vsz`),"\n")[2])
vsz("Initial VSZ=")

using ZMQ
vsz("After loading ZMQ, my VSZ=")

ctx = Context()
socket = Socket(ctx, PUB)
ZMQ.bind(socket, "ipc:///tmp/testZMQ")

vsz("After setting up ZMQ, my VSZ=")
println("Sending")
for i = 1:10000000
    ZMQ.send(socket, "abcdefghijklmnopqrstuvwxyz")
    if i % 100000 == 0
        println("Sent $i messages")
        println("Length of gc_protect: $(length(ZMQ.gc_protect))")
        vsz("My current VSZ=")
    end
end
vsz("Final VSZ=")

The virtual size keeps growing endlessly.

@ViralBShah
Copy link
Contributor

Cc: @tanmaykm @amitmurthy

jakebolewski referenced this issue in JuliaLang/julia Mar 23, 2015
close(t::Timer) also works for SingleAsyncWork, so let that method take both.
Further, the close hook for SingleAsyncWork needs to remove it from the preservation dict.
@Keno Keno closed this as completed in 791b5d4 Mar 23, 2015
@StefanKarpinski
Copy link
Contributor Author

Seems to be fixed. Thank you, @Keno!

@tkelman
Copy link
Contributor

tkelman commented Mar 24, 2015

Was this a problem with both 0.3 and 0.4?

@ViralBShah
Copy link
Contributor

Yes.

@tkelman
Copy link
Contributor

tkelman commented Mar 24, 2015

And after 791b5d4 in the package, 0.3 still leaks memory?

@ViralBShah
Copy link
Contributor

@StefanKarpinski knows the details best about what had to be done on 0.3. Let's wait for him to chime in.

@tkelman
Copy link
Contributor

tkelman commented Mar 24, 2015

I'm going to sleep. @staticfloat may be online for a little while, and can do anything necessary with binaries. If we decide to immediately backport the corresponding Julia commit and re-tag, I'd personally be in favor of leaving the 0.3.7 tag in place since who knows how many people have fetched it by now, and just go straight to 0.3.8.

@ViralBShah
Copy link
Contributor

Yes, it can certainly wait a week or two for 0.3.8.

@Keno
Copy link
Contributor

Keno commented Mar 24, 2015

I should have fixed this on both 0.3 and 0.4.

@StefanKarpinski
Copy link
Contributor Author

So this original example still memory leaks – just much more slowly than before. Looking into the cause.

@ViralBShah
Copy link
Contributor

Cc @tanmaykm, since you are also a heavy user of this package...

@nkottary
Copy link

This seems to be a bug upstream

@tanmaykm
Copy link
Contributor

Can see the leak even with just open and close of sockets.

test code: https://gist.github.com/tanmaykm/8352059108c6b34f5ecf

@nkottary
Copy link

I see that leaks are present even after closing the context after calling doopenclose in the above script. Calling zmq_unbind for each bind prevents these. I've added it here.

This does not fix the bug.

@StefanKarpinski
Copy link
Contributor Author

The original script to reproduce this leaks for a different reason – the calls to readall, but these scripts also shows a memory leak:

screen shot 2016-02-26 at 11 58 05 am

Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY SANDYBRIDGE)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@StefanKarpinski StefanKarpinski changed the title apparent memory leak memory leak Feb 26, 2016
@StefanKarpinski
Copy link
Contributor Author

Similar leakage on OS X:

screen shot 2016-02-26 at 2 49 04 pm

Julia Version 0.4.4-pre+26
Commit 386d77b (2016-01-29 21:53 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) M-5Y71 CPU @ 1.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@StefanKarpinski
Copy link
Contributor Author

The only operation in the loop in this script is ZMQ.send(socket, "abcd"), so there's a leak in the code that creates the ZMQ message object and sends it. It seems highly dubious that ZMQ's send code having a memory leak, so I'm guessing this is about how we are creating message objects.

@yuyichao
Copy link
Contributor

I suspect it's due to the finalizer.

@StefanKarpinski
Copy link
Contributor Author

Ah, good thought, @yuyichao!

@yuyichao
Copy link
Contributor

I'm trying to rebase and fix JuliaLang/julia#13995 now ...

@yuyichao
Copy link
Contributor

Hmm, it seems that you are plotting the virtual address space size? It's not the most useful measure since you are mostly measuring the 8G gc memory pool. This also kind of means that the leak is not in the GC pool objects.....

@StefanKarpinski
Copy link
Contributor Author

That's a fair point and I'm happy to measure something else, but this does reflect the impact of the program from the system's perspective – and it keeps using more and more resources while doing a very trivial loop.

@yuyichao
Copy link
Contributor

I agree, I just mean that the reason of the leak is a little strange since it's apparently not JuliaLang/julia#13993 and isn't really fixed by JuliaLang/julia#13995

@amitmurthy
Copy link
Contributor

The leak maybe in libuv like JuliaLang/julia#13529 probably is due to a libuv issue.

@Keno
Copy link
Contributor

Keno commented Mar 1, 2016

Has anybody run massif on this?

@joelfrederico
Copy link
Contributor

Hey all. So looking into things, people don't advise finalizers. They advise using https://docs.julialang.org/en/latest/manual/functions/#Do-Block-Syntax-for-Function-Arguments-1. This is recommended by Tim Holy: JuliaLang/julia#11207 (comment)

I'm thinking that we should not rely on finalizers. The issue is that lifetimes aren't strictly managed by scopes in Julia, things are garbage-collected. If lifetimes aren't managed by scopes, then finalizers could run whenever the gc is tuned to, after the scope closes. That's just how resource management is done in Julia. So while memory management could be handled by Julia's gc, sockets, contexts, and messages shouldn't be, because they could hang around until gc deigns to release them and this would result in resource leaks, specifically open threads and memory. This requires a bit of a redesign, obviously.

I can do it (actually I already did it in my own clone of ZMQ.jl). Would people be interested in this? The nice bit is that it really only involves removing a lot of code from ZMQ.jl, simplifying interfaces. It also makes the Julia bindings more in line with both ZMQ and Julia paradigms.

I am thinking we can remove the Julia bindings that relate to ZMQ contexts entirely. AFAICT there really is only one use case for having more than one ZMQ context: ZMQ being imported in multiple places. That can be accomplished by a global variable holding the context handle. Julia takes care of each import having its own global, and then we can hide contexts from the user nearly altogether. (If users really want to control contexts, we can make some package-level functions for that.)

Thoughts?

@ViralBShah ViralBShah mentioned this issue May 13, 2024
@JamesWrigley
Copy link
Member

For the record, I cannot reproduce this on Julia 1.11 and latest master of ZMQ.jl using an updated version of Stefans scripts: https://gist.github.com/JamesWrigley/e85eba789719c818912067f5aa43ec57

On my machine:

julia> data = readdlm("vsz.csv", ',', Float64, '\n');

julia> using UnicodePlots

julia> lineplot(data[:, 2], data[:, 3] / 1e6; width=60, xlabel="Step number", ylabel="Memory usage [MB]", title="Memory usage vs time")

image

(pasting an image because Github doesn't format the plot nicely)

I think we can close this, if no one else can reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants