Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: CachedClientPool connection leak #2649

Closed

Conversation

StefanXiepj
Copy link
Contributor

@StefanXiepj StefanXiepj commented May 28, 2021

What changes were proposed in this pull request?

This pr adds shutdownhook to close the connection in the cache of CachedClientPool before the jvm exit.

Why are the changes needed?

ExpireAfterAccess is 5 minutes, this is too long, if the last access time is much less than 5 minutes, the connections in the cache will leak,we should close it.

@StefanXiepj
Copy link
Contributor Author

@rymurr @kbendick would you mind reviewing this?

@@ -65,6 +65,8 @@ private synchronized void init() {
.removalListener((key, value, cause) -> ((HiveClientPool) value).close())
.build();
}
// cleanup cache before jvm exit, avoid some clients may not be closed.
Runtime.getRuntime().addShutdownHook(new Thread(CachedClientPool::cleanupCache));
Copy link
Contributor

@southernriver southernriver May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should close the HiveClientPool first~

CachedClientPool.clientPoolCache().getIfPresent(metastoreUri).close();
CachedClientPool.clientPoolCache().invalidateAll();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the downside of not closing the clients when the JVM exits? IMHO the lifetime of the client pool should be tied to the owning resources and be cleaned up through a Closeable interface rather than at system exit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a large number of tasks exit, many connections may not be closed. But as you said, this is not a serious problem.

@@ -65,6 +65,8 @@ private synchronized void init() {
.removalListener((key, value, cause) -> ((HiveClientPool) value).close())
.build();
}
// cleanup cache before jvm exit, avoid some clients may not be closed.
Runtime.getRuntime().addShutdownHook(new Thread(CachedClientPool::cleanupCache));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the downside of not closing the clients when the JVM exits? IMHO the lifetime of the client pool should be tied to the owning resources and be cleaned up through a Closeable interface rather than at system exit

@@ -36,6 +38,10 @@
private volatile int currentSize;
private boolean closed;

private static volatile AtomicInteger openCount = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand the purpose of these three variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for counting the number of connections,maybe it can be used as a metric in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the variables as well.

If I'm not mistaken, it's the total number of connections ever opened and the total number of connections closed? Looking at the declaration, I would have thought that it was the current number of open connections, not the number of connections ever opened.

My only indication that it might be otherwise came from the unit test - https://github.com/apache/iceberg/pull/2649/files#diff-8b2035adec2ca6d247f624aeb6dbb2cc3789bd188541c500de4319b969813758R78-R80

  private void waitingForCleanup() {
     long startTime = System.currentTimeMillis();
     while ((ClientPoolImpl.openCount().get() != ClientPoolImpl.closeCount().get()) 

Maybe you could add a small comment?

Copy link
Contributor Author

@StefanXiepj StefanXiepj May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your time, yes your right, it's the total number of connections ever opened and the total number of connections closed,just for unit test, maybe the log is also useful.
As @southernriver suggested, closed directly would be better. it is no need to wait.

@@ -36,6 +38,10 @@
private volatile int currentSize;
private boolean closed;

private static volatile AtomicInteger openCount = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the variables as well.

If I'm not mistaken, it's the total number of connections ever opened and the total number of connections closed? Looking at the declaration, I would have thought that it was the current number of open connections, not the number of connections ever opened.

My only indication that it might be otherwise came from the unit test - https://github.com/apache/iceberg/pull/2649/files#diff-8b2035adec2ca6d247f624aeb6dbb2cc3789bd188541c500de4319b969813758R78-R80

  private void waitingForCleanup() {
     long startTime = System.currentTimeMillis();
     while ((ClientPoolImpl.openCount().get() != ClientPoolImpl.closeCount().get()) 

Maybe you could add a small comment?

Comment on lines +152 to +157
public static AtomicInteger openCount() {
return openCount;
}

public static AtomicInteger closeCount() {
return closeCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to call .get() in the return statement and just return an Integer?

It seems the only places in the code that these functions are called, .get() is called immediately. It seems to me that calling .get() would provide the benefit of a defensive copy to truly leave these private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,your suggestion is very helpful, i'll do it.

@StefanXiepj StefanXiepj requested a review from rymurr May 28, 2021 19:47
@StefanXiepj
Copy link
Contributor Author

@rymurr @kbendick @southernriver thanks, i will close this pr.

@southernriver
Copy link
Contributor

@StefanXiepj I reopened this and do some fix in another pr #2785

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants