-
Notifications
You must be signed in to change notification settings - Fork 96
Connections still not closing (branch-0.11) #131
Comments
I didn't use it in spark streaming yet. will you try to use it in a normal spark app, will it close the connection? #118 fixes the problem that the ActorSystem gets stuck on exit of the application. And according to your description, maybe you can have a look at this file. It will reuse old connections if the old connection is not busy. |
Hi, After just over 3 hours of 1 minute spark streaming batches (186 batches) I can see 1092 connections to mongo (~5 per batch). Eventually it hits the open file limit for Ubuntu and stops being able to open connections. Interestingly it has not caused problems with WRITING using Perhaps the problem can be diagnosed by comparing how connections are handled. |
Did you check that file yet? I think it could be caused by keeping some connection in BUSY state which maybe caused by not calling freeConnection() method on every generated writer. Because after a quick searching, I find a suspicious place in the file
The newed MongodbSimpleWriter doesn't call @pfcoperez is this a leak of connection? |
thanks @wuciawe I think i will have to make do with some hacking with |
@seddonm1 @wuciawe The currently implemented connection pool has a problem related to the fact that there are not destructors in Java nor in Scala. If it were, we could guarantee that a provided connection, extracted from the pool, would close by using something RAII like pattern. The main problem is that the connection pool is providing Client instances which should be freed explicitly by calling to one of Some spark hooks are being used to automatically call these methods after Spark tasks have finished. However there are other cases of use, as described in your conversation, for which the client is not being freed. The current approach would work just right if after any possible use of We've decided to follow a new approach: To imitate the way other well known JVM resources pools work. That is, by passing the pool a task to perform and letting it, the pool, to assign it to a connection. Hence, the pool is responsible for resource deallocation instead its client code. That focuses resource deallocation into a single point of responsibility. We've already started the task of changing the pool implementation but it will take a while. In the mean time, we'll remove the connections pool thus removing all the issues you are finding concerning connection leaks. I hope this helps. |
thanks @pfcoperez. I'm glad you acknowledge the issue and that you have a plan for resolution. good luck. |
Hi, |
Hi, |
Hi @jaminglam |
Hi,
I have just built the branch-0.11 with the fix #118 applied.
It still appears to be holding connections open. Here is my connection string:
MongodbConfigBuilder(Map(Host -> List("servername:27017"), Database -> "myDB", Collection -> "myCollection", ConnectionsTime -> "120000", ConnectionsPerHost -> "10")).build
executed using:
sqlContext.fromMongoDB
If i watch the open files from a spark streaming app I can see more and more connections being opened:
lsof -p 14169 | grep servername:27017
Is there something else that needs to be configured to allow the scheduler to release these connections?
The text was updated successfully, but these errors were encountered: