-
Notifications
You must be signed in to change notification settings - Fork 195
Optimize client pool #1731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize client pool #1731
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yhmo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughAdds a new generic per-key ClientCache that maintains active and retired client wrappers, tracks per-key QPS, and dynamically borrows/retires clients from a GenericKeyedObjectPool. ClientPool was refactored to use a ConcurrentMap of ClientCache instances with lazy initialization, per-key lifecycle methods (preparePool, clear), typed pool config, per-key metrics/accessors, and updated return/close flows. PoolClientFactory.configForKey changed to void and logging was adjusted. PoolConfig.Builder per-key defaults were tightened. Tests and examples were added/updated to exercise per-key pool preparation, concurrent workloads, and the new metrics/accessors. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.3)sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javaComment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Nitpick comments (6)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)
96-118: Global lock may cause contention across different keys.The
cacheMapLockis a single lock shared across all keys. When multiple threads initialize different keys concurrently, they will block each other unnecessarily. Consider using a per-key lock pattern orConcurrentHashMap.computeIfAbsent()which provides atomic lazy initialization.🔎 Proposed refactor using computeIfAbsent
private ClientCache<T> getCache(String key) { - ClientCache<T> cache = clientsCache.get(key); - if (cache == null) { - // If clientsCache doesn't contain this key, there might be multiple threads run into this section. - // Although ConcurrentMap.putIfAbsent() is atomic action, we don't intend to allow multiple threads - // to create multiple ClientCache objects at this line, so we add a lock here. - // Only one thread that first obtains the lock runs into putIfAbsent(), the others will be blocked - // and get the object after obtaining the lock. - // This section is entered one time for each key, the lock basically doesn't affect performance. - cacheMapLock.lock(); - try { - if (!clientsCache.containsKey(key)) { - cache = new ClientCache<>(key, clientPool); - clientsCache.put(key, cache); - } else { - cache = clientsCache.get(key); - } - } finally { - cacheMapLock.unlock(); - } - } - return cache; + return clientsCache.computeIfAbsent(key, k -> new ClientCache<>(k, clientPool)); }
69-76: Inconsistent error handling: logs error but returns silently.If
getCache(key)returns null (which shouldn't happen given current implementation), the method logs an error but continues silently. Consider throwing an exception for consistency with other pool operations.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
3028-3058: Worker class exception handling could mask test failures.The
catchblock prints the exception but doesn't fail the test or track failures. If requests fail, the test will still pass. Consider usingAtomicIntegerto count failures and assert on it after completion.🔎 Proposed improvement
+ AtomicInteger failureCount = new AtomicInteger(0); + class Worker implements Runnable { private int id = 0; public Worker(int id) { this.id = id; } @Override public void run() { MilvusClientV2 client = null; try { client = pool.getClient(dummyDb); Assertions.assertEquals(dummyDb, client.currentUsedDatabase()); // ... search logic ... } catch (Exception e) { System.out.printf("request failed: %s%n", e); + failureCount.incrementAndGet(); } finally { pool.returnClient(dummyDb, client); } } } // ... after executor termination ... + Assertions.assertEquals(0, failureCount.get(), "Some requests failed during the test");
3081-3084: Busy-wait loop with 1-second sleep may cause test flakiness.The loop waits indefinitely for active clients to drop to 1. If the pool's QPS-based retirement logic is slow or doesn't trigger, this could hang. Consider adding a timeout.
🔎 Proposed improvement
+ int maxWaitSeconds = 30; + int waited = 0; while (pool.getActiveClientNumber(dummyDb) > 1) { TimeUnit.SECONDS.sleep(1); System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb)); + waited++; + if (waited > maxWaitSeconds) { + Assertions.fail("Timed out waiting for clients to be retired"); + } }sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)
174-175: Setting thread priority to MAX_PRIORITY in a timer task is unusual.This can starve other threads. The QPS check is lightweight and doesn't need elevated priority. Consider removing this.
🔎 Proposed fix
@Override public void run() { - Thread currentThread = Thread.currentThread(); - currentThread.setPriority(Thread.MAX_PRIORITY); - checkQPS(); }
189-228:getClient()incrementstotalCallNumberbut doesn't decrement on failure paths.If
fetchFromPool()returns null at line 198,totalCallNumberwas already incremented but no client is returned. This skews QPS calculations. Consider decrementing on failure paths.🔎 Proposed fix
public T getClient() { totalCallNumber.incrementAndGet(); if (activeClientList.isEmpty()) { clientListLock.lock(); try { if (activeClientList.isEmpty()) { T client = fetchFromPool(); if (client == null) { + totalCallNumber.decrementAndGet(); return null; }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (5)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (1)
CreateCollectionReq(36-1112)sdk-core/src/main/java/io/milvus/v2/service/collection/request/LoadCollectionReq.java (1)
LoadCollectionReq(25-227)sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
SearchReq(32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Deploy milvus server,build and test
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (5)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
93-93: Good fix: passing exception as a separate parameter enables proper stack trace logging.This is the correct pattern for SLF4J - passing the exception as the last argument ensures the full stack trace is logged.
39-41: Breaking API change:configForKeyno longer returns a value.The method signature changed from returning
C(previous config) tovoid. While no internal callers in this codebase depend on the return value, external consumers of this public API that relied on the previous return value would be affected.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
2975-2997: Good test setup: creates dedicated collection for pool testing.The test properly sets up an isolated collection in a dedicated database for testing pool behavior, ensuring test isolation.
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)
41-61: Double initialization:preparePool()thenborrowObject()creates more clients than intended.
clientPool.preparePool(this.key)already createsminIdlePerKeyidle clients in the pool. The subsequent loop borrows those same clients, so the pool ends up with 0 idle andminIdlePerKeyactive. If this is intentional (pre-warm the cache), the comment should clarify this. Otherwise, just borrowing would suffice.
274-320:ClientWrapper.getClient()incrementsrefCounton each call - potential for mismatched counts.If the caller calls
wrapper.getClient()multiple times without correspondingreturnClient()calls, the refCount will be incorrect. Consider separating the "get client" and "increment ref" operations, or documenting thatgetClient()should only be called once per borrow.Review all call sites to ensure
getClient()is called exactly once per borrow andreturnClient()once per return.
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (12)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
58-60: Log level inconsistency already flagged.This issue was identified in a previous review - the code checks
isDebugEnabled()but callslogger.info().
80-82: Log level inconsistency already flagged.Same issue as in
create()- already identified in a previous review.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)
128-133: Silent no-op issue already flagged.This issue was identified in a previous review - when the cache doesn't exist, the client is silently not returned, potentially causing resource leaks.
221-227: Side effect issue already flagged.This issue was identified in a previous review - calling
getCache(key)creates a new cache with timer if the key doesn't exist, which is an unexpected side effect for a metrics query method.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3028-3058: Null client handling issue already flagged.The
finallyblock at lines 3055-3057 unconditionally callspool.returnClient(dummyDb, client)which could pass null ifgetClientfailed. This was identified in a previous review.sdk-core/src/main/java/io/milvus/pool/ClientCache.java (7)
19-20: Public mutable static fields already flagged.This issue was identified in a previous review - public mutable static fields are not thread-safe and violate encapsulation.
27-27: Non-daemon Timer already flagged.This issue was identified in a previous review - the Timer should be created as daemon to prevent blocking JVM shutdown.
88-91: Log level inconsistency already flagged.This issue was identified in a previous review.
120-122: Log level inconsistency already flagged.This issue was identified in a previous review.
140-144: TOCTOU race already flagged.This issue was identified in a previous review - the list can change between
get(maxIndex)andremove(maxIndex).
156-158: Log level inconsistency already flagged.This issue was identified in a previous review.
265-265: String concatenation in logging already flagged.This issue was identified in a previous review.
🧹 Nitpick comments (1)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
33-39: Consider making the class package-private.The class is declared as
publicbut the constructor isprotected. If this class is intended only for internal use within the pool package, consider making the class package-private to prevent unintended external instantiation or subclassing.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (10)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
133-136: Verify impact of reduced default pool sizes on existing users.The default values have been significantly reduced:
minIdlePerKey: 0 → 1maxIdlePerKey: 10 → 2maxTotalPerKey: 50 → 5maxTotal: 1000 → 20These changes could be breaking for users who rely on default configurations for high-throughput scenarios. Existing applications may experience increased blocking or reduced concurrency unless they explicitly override these values.
Consider documenting this change in release notes as a potential breaking change.
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
39-41: LGTM: SimplifiedconfigForKeysignature.The method now returns
voidinstead of the previous value, which is appropriate since callers typically don't need the prior configuration. This simplifies the API.
93-93: LGTM: Proper parameterized logging for exceptions.Good improvement - using the exception as a separate parameter enables proper stack trace logging.
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)
9-12: LGTM: Thread-safe per-key cache infrastructure.The addition of
ConcurrentHashMapforclientsCacheand a fairReentrantLockfor cache creation provides proper thread-safety for the per-key caching mechanism.Also applies to: 20-21, 30-30
65-76: LGTM: Pool preparation with proper error handling.The
preparePoolmethod correctly delegates to the per-key cache and handles the null case with appropriate error logging.
96-118: LGTM: Thread-safe lazy cache initialization.The double-checked locking pattern with explicit lock is correctly implemented. The comment accurately explains why
putIfAbsentalone isn't sufficient—it prevents multipleClientCacheinstantiations when threads race on a new key.
153-164: LGTM: Proper cleanup of per-key caches on clear.The implementation correctly stops timers for all caches before clearing the cache map and underlying pool. The inline comment about unreturned clients and UNAVAILABLE errors provides helpful context.
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
2975-2998: LGTM: Comprehensive test setup for pool testing.The test properly creates a dedicated database and collection for pool testing, with data insertion and loading to enable actual query operations.
3060-3089: LGTM: Thorough pool behavior verification.The test properly:
- Uses
ExecutorServicefor controlled concurrent execution- Verifies active/idle client counts after workload completion
- Waits for clients to become idle before final assertions
- Validates pool metrics match expected configuration
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
274-320: LGTM: Thread-safe ClientWrapper implementation.The
ClientWrappercorrectly usesAtomicIntegerfor thread-safe reference counting. Theequalsimplementation properly handles comparison with bothClientWrapperinstances and raw client objects, which is necessary for thereturnClientlookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)
128-133: Duplicate: Silent no-op may leak clients (previously flagged).This issue was flagged in a previous review. When
returnClientis called for a key with no cache (e.g., afterclear(key)), the client is never returned to the underlying pool, potentially causing a resource leak. The recommended fix is to return the client directly toclientPoolwhen the cache is missing.
221-227: Duplicate: Metrics method has side effect (previously flagged).This issue was flagged in a previous review. Calling
getCache(key)will create a newClientCache(and start its timer) if the key doesn't exist. A metrics query method should not have this side effect. UseclientsCache.get(key)directly instead.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3053-3058: Null client passed toreturnClientifgetClientfails.If
pool.getClient(dummyDb)throws or returns null, thefinallyblock will callpool.returnClient(dummyDb, null). Add a null check before returning.🔎 Proposed fix
} catch (Exception e) { System.out.printf("request failed: %s%n", e); } finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)
150-154: TOCTOU race: list can change betweenget(maxIndex)andremove(maxIndex).
CopyOnWriteArrayListcreates a new array on mutation, so between getting the wrapper atmaxIndexand removing it, another thread could have modified the list, causingIndexOutOfBoundsExceptionor removing the wrong element.🔎 Proposed fix using remove(Object)
if (maxIndex >= 0) { - ClientWrapper<T> wrapper = activeClientList.get(maxIndex); - activeClientList.remove(maxIndex); + ClientWrapper<T> wrapper = activeClientList.get(maxIndex); + if (activeClientList.remove(wrapper)) { // Removes by object reference + retireClientList.add(wrapper); + } - retireClientList.add(wrapper); }
194-233: Race condition in min-load client selection.The loop at lines 221-226 iterates through
activeClientListto find the minimum-load wrapper, but between finding the wrapper and callingwrapper.getClient()at line 232, the wrapper could be retired bycheckQPS()running on the scheduler thread. This could result in returning a client that is already in the retire list, though functionally this may be acceptable since the client is still valid.Consider whether this race is acceptable for your use case. If strict consistency is needed, you may want to verify the wrapper is still in
activeClientListbefore returning, or accept that occasionally a "retired" client may be returned (which is still functional).
🧹 Nitpick comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)
30-30: Consider whether fair lock is necessary.The fair
ReentrantLockguarantees FIFO acquisition order but has lower throughput than non-fair locks. SincegetCacheis called only once per key and primarily during initialization, fairness may not be critical here. Consider using the default non-fair lock (new ReentrantLock()) unless FIFO ordering is a strict requirement.🔎 Alternative implementation
- this.cacheMapLock = new ReentrantLock(true); + this.cacheMapLock = new ReentrantLock();
96-118: Minor inconsistency in getCache implementation.The method uses
clientsCache.containsKey(key)inside the lock butclientsCache.get(key)outside. For consistency and slightly better performance, useget()in both places.🔎 Proposed refactor
private ClientCache<T> getCache(String key) { ClientCache<T> cache = clientsCache.get(key); if (cache == null) { cacheMapLock.lock(); try { - if (!clientsCache.containsKey(key)) { + cache = clientsCache.get(key); + if (cache == null) { cache = new ClientCache<>(key, clientPool); clientsCache.put(key, cache); - } else { - cache = clientsCache.get(key); } } finally { cacheMapLock.unlock(); } } return cache; }
69-76: Unnecessary null check.Since
getCache(key)always returns a non-nullClientCache(it creates one if absent), the null check at line 71 is unreachable. Consider removing it or adding a comment explaining it's defensive programming.🔎 Proposed simplification
public void preparePool(String key) { ClientCache<T> cache = getCache(key); - if (cache == null) { - logger.error("preparePool() gets null for key: {}", key); - return; - } cache.preparePool(); }
87-94: Unnecessary null check in getClient.Similar to
preparePool, sincegetCache(key)never returns null, the null check and error logging at lines 89-91 are unreachable. The method can be simplified.🔎 Proposed simplification
public T getClient(String key) { ClientCache<T> cache = getCache(key); - if (cache == null) { - logger.error("getClient() return null for key: {}", key); - return null; - } return cache.getClient(); }sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
265-273: Exception thrown fromreturnToPoolmay disrupt caller.If
returnObjectfails, throwing aMilvusClientExceptionhere could be unexpected for callers who are just trying to clean up. The client is already retired, so the pool state might be inconsistent. Consider logging the error and continuing rather than propagating the exception.🔎 Proposed fix
private void returnToPool(T grpcClient) { try { clientPool.returnObject(this.key, grpcClient); } catch (Exception e) { // the pool might return exception if the key doesn't exist or the grpcClient doesn't belong to this pool logger.error("Failed to return client, exception: ", e); - throw new MilvusClientException(ErrorCode.CLIENT_ERROR, e); + // Don't throw - the client is being cleaned up, throwing here could disrupt callers } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
- sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
- sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and test
- GitHub Check: Deploy milvus server,build and test
- GitHub Check: Summary
🔇 Additional comments (15)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (3)
139-148: Good cleanup sequence.Calling
clear()before closing the pool ensures per-key cache timers are stopped and resources are released properly. The comments appropriately document the behavior with unreturned clients.
153-164: LGTM: Proper per-key cleanup.The method correctly stops all cache timers, clears the cache map, and then clears the underlying pool. The cleanup order is appropriate.
171-183: LGTM: Per-key cleanup is correct.The method properly handles per-key cleanup by stopping the timer, removing from the cache map, and clearing the underlying pool for that key.
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (4)
3141-3153: Good test configuration.The pool configuration with
minIdlePerKey=1,maxIdlePerKey=2, andmaxTotalPerKey=4provides a realistic test scenario. The subsequent call topreparePool(key)and assertion that active clients equalminIdlePerKeyvalidates the pre-warming behavior.
3156-3157: Significantly increased test load.The test now executes 20,000 total requests (20 threads × 1000 requests) compared to the previous 100 requests. This better exercises the pool under concurrent load but may increase test execution time. Ensure this is acceptable for CI/CD pipelines.
3183-3185: LGTM: Pool capacity assertions.The assertions correctly validate that the pool reached maximum capacity (
maxTotalPerKey) under concurrent load, and the QPS logging provides useful runtime metrics.
3159-3173: The finally block pattern is safe—ClientCache.returnClienthandles null clients gracefully.When
pool.getClient(key)fails and returns null, the finally block will callpool.returnClient(key, null). The method iterates through active and retired client lists and silently ignores the null client if no match is found, so no exception is thrown.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (5)
88-90: LGTM!The new imports for
ExecutorServiceandExecutorsare correctly added to support the concurrent pool testing below.
2975-2998: LGTM!Test setup properly creates a collection in the dummy database, inserts test data, and loads the collection before testing the pool functionality.
3004-3011: LGTM!Pool configuration is well-defined with explicit per-key limits. The variable names clearly document the expected pool behavior constraints.
3060-3078: LGTM!The concurrent workload testing with
ExecutorServiceis well-structured. The test properly validates:
- Pool respects
maxTotalPerKeyconstraint- QPS logging occurs periodically
- Proper timeout handling with
awaitTermination
3081-3092: LGTM!The post-workload validation is thorough:
- Waits for pool to stabilize after workload
- Validates idle/active counts against configured limits
- Confirms client database context is preserved
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (3)
17-19: LGTM!The threshold constants are now declared as
final, addressing the previous thread-safety concern about mutable static fields.
50-70: LGTM!The
preparePool()method properly initializes the pool and handles exceptions with appropriate logging. The loop correctly populates the active client list with pre-created clients.
279-325: LGTM!The
ClientWrapperinner class is well-designed:
- Uses
AtomicIntegerfor thread-safe reference counting- Properly implements
equals()to support comparison with both wrapper and client instances- Clean encapsulation of client lifecycle management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
58-60: Inconsistent log level: checking debug but logging info.This issue was already flagged in a previous review.
80-82: Same log level inconsistency as increate().This issue was already flagged in a previous review.
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3055-3057: Null client passed toreturnClientifgetClientfails.This issue was already flagged in a previous review.
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
126-131: Silent no-op when cache doesn't exist may leak clients.This issue was already flagged in a previous review.
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3187-3194: Add timeout to wait loop.The while loop at Line 3187 waits for active clients to drain but has no timeout. If clients fail to become idle due to a bug or resource leak, the test will hang indefinitely.
🔎 Proposed fix with timeout
- while (pool.getActiveClientNumber(key) > 1) { + int maxWaitSeconds = 30; + int waited = 0; + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { TimeUnit.SECONDS.sleep(1); + waited++; System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); } + Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/main/java/io/milvus/pool/ClientCache.java (3)
37-48: Scheduler thread should be daemon to prevent blocking JVM shutdown.The
ThreadFactorycreates non-daemon threads, so ifstopTimer()is not called before application shutdown, the scheduler thread will keep the JVM running indefinitely. This can cause issues in testing and production environments.🔎 Proposed fix
ThreadFactory threadFactory = new ThreadFactory() { @Override public Thread newThread(@NotNull Runnable r) { Thread t = new Thread(r); + t.setDaemon(true); + t.setName("ClientCache-QPS-Checker-" + key); t.setPriority(Thread.MAX_PRIORITY); // set the highest priority for the timer return t; } };
150-154: TOCTOU race: list can change betweenget(maxIndex)andremove(maxIndex).Between calling
activeClientList.get(maxIndex)andactiveClientList.remove(maxIndex), another thread could modify the list (e.g., ingetClient()adding a new wrapper), potentially causingIndexOutOfBoundsExceptionor removing the wrong element.🔎 Proposed fix using remove by object reference
if (maxIndex >= 0) { ClientWrapper<T> wrapper = activeClientList.get(maxIndex); - activeClientList.remove(maxIndex); + if (activeClientList.remove(wrapper)) { retireClientList.add(wrapper); + } else { + logger.warn("ClientCache key: {} failed to retire wrapper, already removed", key); + } }Using
remove(Object)removes by identity, which is safer for concurrent access.
219-232: Race condition in min-load client selection.Between finding the minimum-load wrapper (lines 219-226) and calling
wrapper.getClient()(line 232), thecheckQPS()method running on the scheduler thread could retire this wrapper and move it toretireClientList. This results in returning a client that is being phased out.🔎 Suggested mitigation
int minLoad = Integer.MAX_VALUE; ClientWrapper<T> wrapper = null; for (ClientWrapper<T> tempWrapper : activeClientList) { if (tempWrapper.getRefCount() < minLoad) { minLoad = tempWrapper.getRefCount(); wrapper = tempWrapper; } } + // Verify wrapper is still in active list (checkQPS might have retired it) + if (wrapper != null && !activeClientList.contains(wrapper)) { + // Wrapper was retired, retry selection + if (!activeClientList.isEmpty()) { + wrapper = activeClientList.get(0); + } + } if (wrapper == null) { // should not be here wrapper = activeClientList.get(0); } return wrapper.getClient();Alternatively, use a lock around both selection and
getClient()to make the operation atomic, though this may impact performance.
🧹 Nitpick comments (3)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
3081-3084: Potential infinite loop without a timeout safeguard.If clients are never returned (e.g., due to a bug), this
whileloop will run indefinitely, causing the test to hang. Consider adding a timeout or maximum iteration count.🔎 Proposed fix
+ int maxWaitSeconds = 30; + int waitedSeconds = 0; - while (pool.getActiveClientNumber(dummyDb) > 1) { + while (pool.getActiveClientNumber(dummyDb) > 1 && waitedSeconds < maxWaitSeconds) { TimeUnit.SECONDS.sleep(1); + waitedSeconds++; System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb)); } + if (waitedSeconds >= maxWaitSeconds) { + Assertions.fail("Timed out waiting for active clients to reduce"); + }
3028-3034: Minor: Fieldidshadows constructor parameter.The field
idand the constructor parameter share the same name. While this works, usingthis.id = id;explicitly clarifies intent. Consider renaming the parameter or keeping it as-is with the explicitthis.qualifier.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
21-21: Consider makingcacheMapLockfinal.Since the lock is assigned once in the constructor and never reassigned, marking it as
finalwould clearly communicate immutability and prevent accidental reassignment.🔎 Proposed fix
private final ConcurrentMap<String, ClientCache<T>> clientsCache = new ConcurrentHashMap<>(); - private Lock cacheMapLock; + private final Lock cacheMapLock = new ReentrantLock(true);And remove the assignment in the constructor:
protected ClientPool(PoolConfig config, PoolClientFactory<C, T> clientFactory) { this.config = config; this.clientFactory = clientFactory; - this.cacheMapLock = new ReentrantLock(true);
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
- sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and test
- GitHub Check: Deploy milvus server,build and test
- GitHub Check: Summary
🔇 Additional comments (21)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
39-41: LGTM! Simplified method signature.The change from returning the previous config to
voidis a valid simplification, aligning with a "set-only" pattern.
93-93: Good improvement: parameterized exception logging.Using the exception as a separate parameter (
logger.error("...", e)) instead of concatenation is the correct SLF4J pattern, ensuring the full stack trace is logged properly.sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)
88-90: LGTM! Added necessary imports for concurrent test execution.
2975-2997: LGTM! Proper test setup for per-key pool testing.The test correctly creates a separate database and collection to exercise the per-key pool functionality.
3004-3011: LGTM! Well-defined pool configuration for testing.The configuration explicitly sets
minIdlePerKey,maxIdlePerKey, andmaxTotalPerKeyto verify pool behavior under concurrent load.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (6)
9-13: LGTM! Appropriate concurrency utilities for per-key caching.The use of
ConcurrentHashMapandReentrantLockprovides correct thread-safe access to the per-key cache.Also applies to: 20-21
65-74: LGTM! Pre-warming capability for per-key pools.The
preparePoolmethod allows callers to initialize clients before actual usage, reducing latency for the firstgetClient()call.
94-116: LGTM! Correct double-checked locking pattern.The implementation correctly uses a lock to prevent multiple
ClientCacheinstances from being created for the same key. The comment explaining whyputIfAbsentalone is insufficient is helpful.
137-146: Good documentation of edge case behavior.The comments explaining what happens if clients are not returned before
clear()is called are helpful for understanding the expected behavior (clients receiveUNAVAILABLEerror and retry).Also applies to: 151-162, 169-181
219-225: LGTM! Side-effect issue fixed.The
fetchClientPerSecondmethod now correctly usesclientsCache.get(key)instead ofgetCache(key), preventing the unintended side effect of creating a newClientCachewhen querying metrics.
53-55: The signature change from returningCtovoidis safe. Only one call site exists (MilvusClientV2DockerTest.java:3093), and it does not capture or use the return value (pool.removeConfig(dummyDb);). All callers are compatible with the new void signature.sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (3)
3141-3148: LGTM! Pool configuration is appropriate for testing.The per-key pool configuration values (minIdlePerKey=1, maxIdlePerKey=2, maxTotalPerKey=4) are well-suited for exercising the pool's per-key scaling behavior in a test environment.
3151-3153: Verify the semantics ofgetActiveClientNumber(key)afterpreparePool().The assertion expects
minIdlePerKey(1) active clients immediately afterpreparePool(key). Based on the enriched summary,preparePool()createsminIdlePerKeyclients and adds them toactiveClientList. However, it's important to confirm:
- Does
getActiveClientNumber(key)count clients in theClientCache.activeClientList, or does it reflect the underlying pool's active count?- Are these pre-created clients considered "active" (borrowed from the pool) or "idle" (available in the pool)?
If the clients are borrowed and placed in the cache, they should indeed show as active. Please verify this aligns with the intended pool accounting.
3191-3194: Final assertions look correct assuming proper pool draining.The assertions expect the pool to stabilize at
maxIdlePerKey(2) idle clients and 1 active client after the concurrent workload completes and clients are retired. This assumes theClientCache.checkQPS()timer has successfully retired and returned excess clients to the pool during the draining loop.sdk-core/src/main/java/io/milvus/pool/ClientCache.java (7)
18-19: LGTM! Static threshold fields are now immutable.The
THRESHOLD_INCREASEandTHRESHOLD_DECREASEfields are now declared aspublic static final, which addresses thread-safety and encapsulation concerns from previous reviews.
50-70: LGTM!preparePool()correctly pre-warms the client cache.The method properly initializes the cache by borrowing
minIdlePerKeyclients from the pool and adding them to the active list. Exception handling is appropriate with proper logging and error propagation.
161-174: LGTM!returnRetiredClients()safely returns idle retired clients.The method correctly uses
removeIfwith a predicate that checksrefCount <= 0before returning clients to the pool. The atomic nature ofremoveIfensures thread-safe list modification.
235-250: LGTM!returnClient()safely decrements reference counts.The method correctly searches both
activeClientListandretireClientListto find the matching wrapper and decrement itsrefCount. TheCopyOnWriteArrayListensures thread-safe iteration.
252-273: LGTM! Pool interaction methods have proper error handling.Both
fetchFromPool()andreturnToPool()correctly use parameterized logging and wrap pool exceptions inMilvusClientException. The size check at line 254 prevents borrowing beyondMaxTotalPerKey.
275-277: LGTM! Simple metric accessor.The
fetchClientPerSecond()method is a straightforward getter for the QPS metric computed bycheckQPS().
279-325: LGTM!ClientWrappercorrectly manages per-client reference counting.The inner class properly encapsulates a client with thread-safe reference counting using
AtomicInteger. Theequals()andhashCode()implementations correctly handle comparison with bothClientWrapperinstances and raw client objects, enabling thereturnClient()lookup pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
118-131: returnClient behavior differs from Javadoc and is silent on unknown keysThe Javadoc claims
returnClientwill throw if the key doesn’t exist or the client doesn’t belong to the key group, but the implementation now silently no‑ops when noClientCacheis present. That mismatch can make misuse harder to spot and, ifclear(key)is called while callers still hold clients, may leave those clients effectively unmanaged.Either update the Javadoc to match the new behavior, or log a warning when
clientsCache.get(key)returns null so incorrect usage is visible.Example: log on missing cache
- public void returnClient(String key, T grpcClient) { - ClientCache<T> cache = clientsCache.get(key); - if (cache != null) { - cache.returnClient(grpcClient); - } - } + public void returnClient(String key, T grpcClient) { + ClientCache<T> cache = clientsCache.get(key); + if (cache != null) { + cache.returnClient(grpcClient); + } else { + logger.warn("returnClient() called for non-existent cache key: {}", key); + } + }
♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
219-225: fetchClientPerSecond avoids side effects nowSwitching
fetchClientPerSecondto useclientsCache.get(key)instead ofgetCache(key)ensures a metrics call does not create a newClientCacheor start its timer when none exists. Returning0.0Fwhen the cache is absent is a sensible default.sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
58-60: Align debug guard with log levelYou still guard with
logger.isDebugEnabled()but log at INFO, so the guard is ineffective and the message always appears at INFO. Either log at DEBUG or change the guard toisInfoEnabled().Proposed adjustment
- if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} creates a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} creates a client", key); + } @@ - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} closes a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} closes a client", key); + }Also applies to: 80-82
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3183-3195: Add timeout to active‑drain loop to avoid hanging testsThe loop waiting for
pool.getActiveClientNumber(key) > 1has no timeout. If the active count never drops (e.g., due to a bug in retire logic), this test can hang indefinitely.Recommend bounding the wait and asserting if active clients fail to drain within a reasonable period.
Example bounded-drain loop
- while (pool.getActiveClientNumber(key) > 1) { - TimeUnit.SECONDS.sleep(1); - System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); - } + int waited = 0; + int maxWaitSeconds = 30; + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + System.out.printf( + "waiting idle %d, active %d%n", + pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); + } + Assertions.assertTrue( + pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3055-3057: Add null check before returning client to pool.If
pool.getClient(dummyDb)at line 3039 throws an exception or returns null, thefinallyblock will callpool.returnClient(dummyDb, null). Guard against passing null toreturnClient.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }
🧹 Nitpick comments (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
50-55: Example changes are behaviorally consistent; consider idiomatic constant naming
- Using
ServerUriin both default and per‑DBConnectConfigis consistent and keeps all traffic directed to the same endpoint.- Tighter pool settings (
maxIdlePerKey(1),maxTotalPerKey(5)) align with the “optimize pool” objective and make contention more visible in the example.- The updated total‑request calculation with
dbNames.size()correctly reflects the number of databases rather than assuming a fixed value.If you’d like to polish style, you might make
ServerUriapublic static final String SERVER_URI, but that’s cosmetic.Also applies to: 97-103, 289-299, 341-344
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
180-222: concurrentSearch logic exercises pool behavior well; consider a bounded waitThe concurrent search flow with a fixed thread pool plus periodic
printPoolState()is a good way to visualize idle/active counts and QPS. The final loop that waits whilepool.getActiveClientNumber(DemoKey) > 1has no timeout, though, so if retirement logic regresses the demo could hang.Adding a maximum wait duration with a clear message would keep this example safe to run unattended.
Illustrative bounded-wait pattern
- // after all requests are done, the active clients will be retired and eventually only one idle client left - while (pool.getActiveClientNumber(DemoKey) > 1) { - TimeUnit.SECONDS.sleep(1); - printPoolState(); - } + // after all requests are done, the active clients should retire and only one idle client remain + int waited = 0; + int maxWaitSeconds = 60; + while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + printPoolState(); + }sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3155-3175: Threaded workload is representative; optional null handlingThe threaded loop that repeatedly acquires a client, calls
getVersion(), and returns the client stresses the per‑key limits as intended. GivengetClientmay return null on timeout, you could defensively checkclient != nullbefore invokinggetVersion()to make this test more robust under extreme latency, but with the current limits it’s unlikely to fail in practice.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
85-92: getClient uses per‑key cache; remove dead null‑cache branch
getClientnow always goes throughClientCache.getClient(), which is consistent with the per‑key design. GivengetCachewill always either find or create a cache whenclientPoolis non‑null, theif (cache == null)branch is effectively unreachable and can be dropped to simplify the method. Any inability to borrow a client should be surfaced from withinClientCache.getClient()(e.g., via null or exception).Possible simplification
- public T getClient(String key) { - ClientCache<T> cache = getCache(key); - if (cache == null) { - logger.error("getClient() return null for key: {}", key); - return null; - } - return cache.getClient(); - } + public T getClient(String key) { + return getCache(key).getClient(); + }sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3091-3095: Consider returning the client before closing the pool.The client obtained at line 3091 is not returned to the pool before
pool.close()is called. Whileclose()should handle cleanup, explicitly returning borrowed clients is better practice and makes the test's intent clearer.🔎 Suggested improvement
// get client connect to the dummy db MilvusClientV2 dummyClient = pool.getClient(dummyDb); Assertions.assertEquals(dummyDb, dummyClient.currentUsedDatabase()); + pool.returnClient(dummyDb, dummyClient); pool.removeConfig(dummyDb); Assertions.assertNull(pool.getConfig(dummyDb)); pool.close();
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
- sdk-core/src/main/java/io/milvus/pool/ClientCache.java
- sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧬 Code graph analysis (4)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (8)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
IndexParam(24-245)sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
DropCollectionReq(22-120)sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
HasCollectionReq(22-80)sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
InsertResp(25-81)sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
QueryResp(27-122)sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
SearchResp(27-185)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
SearchReq(32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (10)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
39-41: configForKey side‑effect‑only API looks correctChanging
configForKeyto returnvoidand only updateconfigForKeysmatches its actual behavior and avoids implying a transformed config is returned. No further changes needed here.
88-95: Improved exception logging in validateObjectPassing the exception as a separate parameter to
logger.erroris the correct pattern and preserves stack traces; wrapping it inMilvusClientExceptionis consistent with the rest of the factory.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)
42-62: Static pool init and basic collection setup look soundThe static initializer builds a
MilvusClientV2Poolwith a conservativePoolConfigand wires it into the helper methods.createCollection/insertDatacorrectly borrow and return clients infinallyblocks, and the schema/index setup matches the subsequent insert/search patterns.Also applies to: 64-119
224-235: Main demo flow is coherent
mainwires together collection creation, two large concurrent search runs, and final pool close. The ordering is sensible and ensures the pool is closed after use; no additional lifecycle changes seem necessary.sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3137-3153: Per‑key pool configuration and preparePool() usage align with new APIConfiguring
minIdlePerKey,maxIdlePerKey, andmaxTotalPerKeyand then callingpreparePool(key)before issuing requests is exactly how the new per‑key cache is intended to be exercised. The assertion againstgetActiveClientNumber(key)verifies thatpreparePooleagerly warms the pool as designed.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (5)
19-33: Per‑key cache map and lock initialization look correctThe introduction of
clientsCacheand a fairReentrantLockin thePoolConfig/PoolClientFactoryconstructor cleanly sets up the infrastructure for per‑keyClientCacheinstances and avoids contention on unrelated keys.
65-75: preparePool delegates correctly to per‑key cache
preparePool(String key)simply resolves theClientCacheand forwards tocache.preparePool(), which is exactly the right layering for pre‑warming a specific key’s pool. No additional synchronization is needed here givengetCache’s locking.
94-116: getCache’s single‑creation logic for ClientCache is soundThe combination of
clientsCache.get(key)with a lock‑guardedputIfAbsentstyle ensures only oneClientCacheis created per key, while normal fast‑path lookups remain lock‑free. This matches the single‑writer pattern used insideClientCacheand avoids unnecessary contention.
137-181: close/clear/clear(key) lifecycle coordination is reasonableCalling
clear()beforeclientPool.close()and ensuring eachClientCache.stopTimer()is invoked before clearing the underlying pool is a sensible lifecycle sequence. It letsClientCachemanage closing any tracked clients (idle or active) while the pool is still available, then tears down pool state. The repeated comment explaining behavior with unreturned clients is helpful and consistent acrossclose,clear, andclear(key).
183-217: Metric accessors correctly document thread‑safety and delegate to poolThe idle/active count methods are thin pass‑throughs to
GenericKeyedObjectPooland are properly documented as thread‑safe. No issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
58-60: Align log level withisDebugEnabled()guardHere you guard with
logger.isDebugEnabled()but log at INFO. Either switch the call tologger.debug(...)or change the guard tologger.isInfoEnabled()so level and guard are consistent.Proposed adjustment
- if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} creates a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} creates a client", key); + } ... - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} closes a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} closes a client", key); + }Also applies to: 80-82
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3187-3194: Add a timeout around the active-client drain loopThe loop waiting for
pool.getActiveClientNumber(key) <= 1has no timeout. If clients fail to retire due to a regression, this test can hang indefinitely and stall the suite.Example bounded wait with assertion
- while (pool.getActiveClientNumber(key) > 1) { - TimeUnit.SECONDS.sleep(1); - System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); - } + int maxWaitSeconds = 30; + int waited = 0; + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + System.out.printf("waiting idle %d, active %d%n", + pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); + } + Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3055-3057: Null client passed toreturnClientifgetClientfails.If
pool.getClient(dummyDb)at line 3039 throws an exception or returns null, the finally block will callpool.returnClient(dummyDb, null). Add a null check before returning the client.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
131-136: Silent no-op when cache doesn't exist may leak clients.If
returnClientis called for a key whose cache has been removed (e.g., afterclear(key)was called), the client is never returned to the underlying pool. This could lead to resource leaks, especially in scenarios where clients are obtained but the cache is cleared before they are returned.Consider logging a warning and attempting to return the client directly to the underlying pool as a fallback.
🔎 Proposed fix
public void returnClient(String key, T grpcClient) { ClientCache<T> cache = clientsCache.get(key); if (cache != null) { cache.returnClient(grpcClient); + } else { + logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key); + try { + clientPool.returnObject(key, grpcClient); + } catch (Exception e) { + logger.error("Failed to return client directly to pool, exception: ", e); + } } }
🧹 Nitpick comments (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
50-50: Consider using standard Java naming convention for constants.The field
ServerUriuses PascalCase, which is inconsistent with Java naming conventions. Static final fields (constants) should useUPPER_CASE_WITH_UNDERSCORES(e.g.,SERVER_URI). While the current naming is consistent with other fields in this example file, following standard conventions improves readability and sets a better example for SDK users.🔎 Suggested refactor to follow Java conventions
- public static String ServerUri = "http://localhost:19530"; + public static final String SERVER_URI = "http://localhost:19530"; public static String CollectionName = "java_sdk_example_pool_v2"; public static String VectorFieldName = "vector"; public static int DIM = 128;Then update all usages:
- .uri(ServerUri) + .uri(SERVER_URI)examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
168-239: Consider bounding the wait loops to avoid potential hangsBoth the retry loop in
search()and the drain loop inconcurrentSearch()can, in theory, run indefinitely if the pool or retirement logic misbehaves. That’s unlikely but can make the demo hang instead of failing fast.Illustrative bounded wait changes
private static void search() { MilvusClientV2 client = null; try { - client = pool.getClient(DemoKey); - while (client == null) { + client = pool.getClient(DemoKey); + int attempts = 0; + int maxAttempts = 20; // ~100s total at 5s per attempt + while (client == null && attempts < maxAttempts) { try { // getClient() might exceeds the borrowMaxWaitMillis and throw exception if the pool is full // retry to call until it return a client client = pool.getClient(DemoKey); } catch (Exception e) { System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage()); } + attempts++; } + if (client == null) { + throw new RuntimeException("Failed to acquire client from pool after retries"); + } @@ - // after all requests are done, the active clients will be retired and eventually only one idle client left - while (pool.getActiveClientNumber(DemoKey) > 1) { - TimeUnit.SECONDS.sleep(1); - printPoolState(); - } + // after all requests are done, the active clients will be retired + int waited = 0; + int maxWaitSeconds = 300; + while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + printPoolState(); + } + if (pool.getActiveClientNumber(DemoKey) > 1) { + System.out.printf("Active clients did not drain within %d seconds%n", maxWaitSeconds); + }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
- sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/collection/request/LoadCollectionReq.java (1)
LoadCollectionReq(25-227)sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (7)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
IndexParam(24-245)sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
AddFieldReq(30-406)sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
DropCollectionReq(22-120)sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
QueryReq(26-277)sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
SearchReq(32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Deploy milvus server,build and test
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (10)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (2)
343-343: Good improvement: UsingdbNames.size()instead of magic number.Replacing the hardcoded
3withdbNames.size()makes the code more maintainable and eliminates a potential source of bugs if the number of databases changes.
296-297: The pool configuration already includes a GitHub issue reference (issue #1577) explaining the rationale for these settings. No changes needed.sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
39-41: Per-key config storage change looks goodMaking
configForKeyvoid and simply updating theConcurrentMapkeeps the API simple and thread-safe for per-key configuration. No issues from a concurrency or correctness standpoint.
93-94: Improved exception logging invalidateObjectPassing the exception as a separate parameter to
logger.errorpreserves the stack trace and avoids string concatenation. This is the preferred SLF4J style.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)
44-56: Pool initialization and configuration are sensible for the demoStatic initialization of
MilvusClientV2Poolwith a small per-key pool and a 5‑secondmaxBlockWaitDurationmatches the documented behavior and keeps the example easy to reason about.
64-166: Collection creation and insert flow are clear and pool-safeThe demo cleanly acquires a client from the pool, handles a potential null, and uses STRONG consistency for the row count check after inserts. The overall pattern (borrow → use →
finallyreturn) is sound for example code.sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3141-3185: Per-key pool configuration and concurrency test are well-structuredUsing explicit
minIdlePerKey,maxIdlePerKey, andmaxTotalPerKey, pluspreparePool(key)and the concurrentgetVersion()workload, is a good way to validate the new per-key pool behavior and metrics (getActiveClientNumber,getTotalActiveClientNumber,fetchClientPerSecond).sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
1-334: LGTM! Well-structured per-key client cache implementation.The implementation addresses previous review feedback and follows a sound concurrency model:
- Thresholds are now final constants
- Single-writer pattern (timer thread only) ensures thread-safe list modifications
CopyOnWriteArrayListenables lock-free iteration for readers- Ref-counting via
ClientWrapperprotects clients during retirement- Proper lifecycle management with
preparePool()andstopTimer()The QPS-based dynamic scaling logic is well-designed with incremental adjustments to prevent over-reaction to transient load changes.
Based on learnings: The single-writer pattern combined with CopyOnWriteArrayList and ref-counting ensures thread-safe client selection without additional locking.
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)
99-121: Well-implemented double-checked locking for cache creation.The
getCachemethod correctly uses double-checked locking to prevent multiple threads from creating duplicateClientCacheinstances for the same key. The lock is only held during the initial cache creation, ensuring minimal performance impact since this is a one-time operation per key.
142-186: Proper lifecycle management for cache and pool cleanup.The
close()andclear()methods correctly handle the lifecycle:
- Timers are stopped before clearing caches to prevent race conditions
- Both per-key and global clear operations are properly implemented
- The comment explaining behavior for unreturned clients is helpful
The order of operations (stop timers → clear caches → clear pool) ensures resources are released cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3187-3194: Add timeout to prevent test hang.The wait loop lacks a timeout and could hang indefinitely if active clients fail to drain due to a bug or resource leak. This issue was flagged in previous reviews and remains unresolved.
🔎 Suggested fix with bounded wait
+ int maxWaitSeconds = 30; + int waited = 0; - while (pool.getActiveClientNumber(key) > 1) { + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { TimeUnit.SECONDS.sleep(1); + waited++; System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); } + Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3028-3059: Add null check before returning client to pool.If
pool.getClient(dummyDb)returns null (e.g., when the pool is exhausted), thefinallyblock will passnulltopool.returnClient(dummyDb, client). While this won't cause an exception inClientCache.returnClient()(it will simply not find a matching wrapper), it's cleaner to guard the call.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
131-136: Silent no-op in returnClient may leak resources.If
returnClient(key, grpcClient)is called for a key that has no cache (e.g., afterclear(key)was called or before any client was borrowed for that key), the method silently does nothing. The client is never returned to the underlying pool, potentially leaking the resource.Consider either:
- Logging a warning and attempting to return directly to the underlying pool via
clientPool.returnObject(key, grpcClient).- Throwing an
IllegalStateExceptionto fail-fast and alert the caller to the programming error.🔎 Proposed fix (option 1 - return to pool)
public void returnClient(String key, T grpcClient) { ClientCache<T> cache = clientsCache.get(key); if (cache != null) { cache.returnClient(grpcClient); + } else { + logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key); + try { + clientPool.returnObject(key, grpcClient); + } catch (Exception e) { + logger.error("Failed to return client directly to pool for key: {}, exception: ", key, e); + } } }
🧹 Nitpick comments (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)
37-48: Verify necessity of MAX_PRIORITY for scheduler thread.Setting
Thread.MAX_PRIORITY(line 41) for the QPS-checking timer thread may cause it to preempt other application threads excessively. This priority is typically reserved for critical system tasks. Consider usingThread.NORM_PRIORITYor a slightly elevated priority (e.g.,NORM_PRIORITY + 1) unless high-priority scheduling is definitively required for the QPS monitoring logic.🔎 Optional adjustment
public Thread newThread(@NotNull Runnable r) { Thread t = new Thread(r); - t.setPriority(Thread.MAX_PRIORITY); // set the highest priority for the timer + t.setPriority(Thread.NORM_PRIORITY + 1); // slightly elevated priority for timer return t; }
255-271: Document or resolve TODO about MaxTotal check.The TODO at line 261 notes that
borrowObject()may throw "Timeout waiting for idle object" when the globalMaxTotallimit is reached, even if the per-keyMaxTotalPerKeylimit hasn't been hit. Consider:
- Documenting this behavior in the method's Javadoc or class-level comments so callers understand the failure mode.
- If feasible, adding a pre-check for the global limit (e.g.,
clientPool.getNumActive() >= clientPool.getMaxTotal()) before callingborrowObject(), though this may introduce a TOCTOU race.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
- sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (3)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and test
- GitHub Check: Deploy milvus server,build and test
- GitHub Check: Summary
🔇 Additional comments (4)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
343-343: LGTM: Improved calculation accuracy.The change from
threadCount * repeatRequests * 3tothreadCount * repeatRequests * dbNames.size()correctly derives the count from the actual number of databases rather than a hardcoded value, improving maintainability.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)
128-169: LGTM: Null-safety improved.The
insertData()method now includes an explicit null check at lines 133-136, which addresses the potential null return frompool.getClient(). This ensures the method handles pool exhaustion gracefully.
171-198: LGTM: Retry loop handles pool exhaustion.The
search()method implements a retry loop (lines 175-183) that continues attempting to acquire a client until successful. This is a robust approach to handling pool exhaustion and aligns with the pattern used inClientPoolExample.sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
133-135: Document undocumented reduction in per-key pool defaults.The default per-key pool sizing has been reduced from the values in version 2.6.5 (maxIdlePerKey: 10→2, maxTotalPerKey: 50→5, minIdlePerKey: 1). This is a breaking change for code expecting the 2.6.5 pool capacity. This reduction is not documented in the CHANGELOG and should be added to release notes. Users relying on implicit defaults or those migrating from 2.6.5 will experience reduced pool capacity and potential blocking without explicit reconfiguration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3141-3194: Bound the active-client drain loop in testClientPool() to avoid hangs.The loop that waits for
pool.getActiveClientNumber(key) > 1has no timeout, so a bug in pool retirement could hang the test indefinitely.Example bounded wait
- while (pool.getActiveClientNumber(key) > 1) { - TimeUnit.SECONDS.sleep(1); - System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); - } + int maxWaitSeconds = 30; + int waited = 0; + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + System.out.printf("waiting idle %d, active %d%n", + pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); + } + Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
39-41: Fix debug guard vs log level mismatch in create()/destroyObject().Both
create()anddestroyObject()guard onlogger.isDebugEnabled()but log at INFO, so the guard is ineffective. Switching to DEBUG will make the check meaningful.Suggested change
try { - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} creates a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} creates a client", key); + } @@ @Override public void destroyObject(String key, PooledObject<T> p) throws Exception { - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} closes a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} closes a client", key); + }Also applies to: 56-61, 80-82, 93-94
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3055-3057: Null client passed toreturnClientifgetClientfails.If
pool.getClient(dummyDb)throws or returns null, the finally block will callpool.returnClient(dummyDb, null). Add a null check before returning.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
131-136: Silent no-op when cache doesn't exist may leak clients.If
returnClientis called for a key that has no cache (e.g., afterclear(key)was called or before anygetClientcall), the client is never returned to the underlying pool. This could lead to resource leaks.🔎 Proposed fix
public void returnClient(String key, T grpcClient) { ClientCache<T> cache = clientsCache.get(key); if (cache != null) { cache.returnClient(grpcClient); + } else { + logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key); + try { + clientPool.returnObject(key, grpcClient); + } catch (Exception e) { + logger.error("Failed to return client directly to pool, exception: ", e); + } } }
🧹 Nitpick comments (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
49-55: Example config now matches per-key sizing and multi-DB workload.Using
ServerUriin allConnectConfigbuilders, tightening.maxIdlePerKey(1)/.maxTotalPerKey(5), and multiplying totals bydbNames.size()in the final summary correctly reflect the per-database loops.As a small style tweak for a constant, you could make
ServerUriprivate static finaland rename it toSERVER_URI.Also applies to: 295-301, 341-344
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
172-199: Simplify search() client acquisition loop.The initial
pool.getClient(DemoKey)before thewhile (client == null)is redundant, since the loop immediately re-tries until a client is obtained. You can start withclient = nulland let the loop handle all attempts.Possible simplification
- MilvusClientV2 client = null; - try { - client = pool.getClient(DemoKey); - while (client == null) { + MilvusClientV2 client = null; + try { + while (client == null) { try { // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full. // retry to call until it return a client. client = pool.getClient(DemoKey); } catch (Exception e) { System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage()); } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.java
🧬 Code graph analysis (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (7)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
194-236: getClient() selection logic matches the intended concurrency model.Using a
CopyOnWriteArrayListforactiveClientListwith only the scheduler mutating it, combined with per-wrapper ref counts and a lock only for the first client creation, gives a safe, low-contention min-load selection under concurrent access. No extra locking seems needed here. Based on learnings, this matches the single-writer/ref-counted design.sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
133-135: Per-key default limits are coherent with pool behavior.
minIdlePerKey = 1,maxIdlePerKey = 2, andmaxTotalPerKey = 5preservemin ≤ maxIdle ≤ maxTotaland match the small-per-key pool sizing used in the examples/tests. No changes needed.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
45-56: Static pool setup and warmup look good.The shared
MilvusClientV2Pooluses conservative per-key limits andpreparePool(DemoKey)to pre-create the first client, which is a sensible baseline for the demo’s concurrent workload.Also applies to: 61-64
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)
99-121: Lazy initialization with proper double-checked locking.The per-key cache initialization uses correct double-checked locking to prevent multiple
ClientCacheinstances from being created concurrently. The lock ensures only one cache is created per key.
224-230: Metrics query without side effects.The method correctly uses
clientsCache.get(key)to retrieve the cache without creating a new one, avoiding unintended side effects from metrics queries.
69-74: Per-key pool preparation reduces first-call latency.The
preparePoolmethod correctly initializes the per-key cache and pre-warms the pool withminIdlePerKeyclients, reducing latency for the firstgetClient()call.
156-186: Clear operations properly stop per-key timers.The
clear()andclear(String key)methods correctly stop theClientCachetimer threads before clearing. The comments appropriately document the edge case where unreturned clients will receiveUNAVAILABLEerrors after pool closure.
4e3836e to
eb41d5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
58-60: Fix log level mismatch withisDebugEnabled()guardBoth in
create()anddestroyObject(), you guard withlogger.isDebugEnabled()but log at INFO, so the guard is ineffective and the message always appears at INFO.Proposed change
- if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} creates a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} creates a client", key); + } ... - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} closes a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} closes a client", key); + }Also applies to: 80-82
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
3028-3057: Guard against returning a null client inWorkerIf
pool.getClient(dummyDb)ever returns null or throws beforeclientis assigned, thefinallyblock still callspool.returnClient(dummyDb, client), passing null.Proposed fix
} catch (Exception e) { System.out.printf("request failed: %s%n", e); } finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
17-19: Align QPS threshold comment with constantsThe comment still says “Treat qps >= 75 as high, qps <= 50 as low”, but
THRESHOLD_INCREASEis 100 andTHRESHOLD_DECREASEis 50, andcheckQPS()compares directly to those constants.Suggested fix
Either update the comment:
- // Treat qps >= 75 as high, qps <= 50 as low + // Treat qps >= 100 as high, qps <= 50 as lowor, if 75 was the intended cutoff, change
THRESHOLD_INCREASEaccordingly so code and docs match.Also applies to: 72-82
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
131-138: Handle missing cache keys inreturnClientto avoid leaking clientsWhen
returnClient(key, grpcClient)is invoked andclientsCache.get(key)returns null (e.g., afterclear(key)orclear()), the method only logs a warning and never returns the client to the underlyingGenericKeyedObjectPool. That client then stays outside the pool’s lifecycle and is never closed or reused.More robust `returnClient` behavior
public void returnClient(String key, T grpcClient) { ClientCache<T> cache = clientsCache.get(key); if (cache != null) { cache.returnClient(grpcClient); } else { - logger.warn("No such key: {}", key); + logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key); + if (clientPool != null && !clientPool.isClosed() && grpcClient != null) { + try { + clientPool.returnObject(key, grpcClient); + } catch (Exception e) { + logger.error("Failed to return client directly to pool for key {}: ", key, e); + } + } } }This preserves safety even if callers return a client after a key has been cleared.
Also applies to: 155-169, 176-188
🧹 Nitpick comments (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)
27-31: Consider makingfetchClientPerSecondvolatile for accurate cross-thread reads
fetchClientPerSecondis written by the scheduler thread incheckQPS()and read by other threads viafetchClientPerSecond(). Withoutvolatile(or other synchronization), callers may observe stale values.Low-impact improvement
- private float fetchClientPerSecond = 0.0F; + private volatile float fetchClientPerSecond = 0.0F;This keeps the metric best-effort but avoids visibility issues with the Java memory model.
Also applies to: 283-285
255-270: Downgrade log level infetchFromPoolfor expected exhaustion cases
fetchFromPool()is an internal helper that may legitimately return null when the pool is exhausted or a borrow times out. Logging every such case at ERROR can be noisy under load.Suggested adjustment
- } catch (Exception e) { - // the pool might return timeout exception if it could not get a client in PoolConfig.maxBlockWaitDuration - // fetchFromPool() is internal use, return null here, let the caller handle. - logger.error("Failed to get client, exception: ", e); - return null; - } + } catch (Exception e) { + // The pool might time out if it cannot get a client within PoolConfig.maxBlockWaitDuration. + // fetchFromPool() is internal; return null and let the caller decide how to react. + logger.warn("Failed to get client from pool for key {}, returning null: ", key, e); + return null; + }Or even
debugif timeouts are considered normal in some scenarios.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (3)
- examples/src/main/java/io/milvus/v2/ClientPoolExample.java
- sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
- sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (13)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/v2/client/MilvusClientV2.java (1)
MilvusClientV2(74-1213)sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
IndexParam(24-245)sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
AddFieldReq(30-406)sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (2)
CreateCollectionReq(36-1112)CollectionSchema(404-529)sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
DropCollectionReq(22-120)sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
HasCollectionReq(22-80)sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
QueryReq(26-277)sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
SearchReq(32-482)sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
InsertResp(25-81)sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
QueryResp(27-122)sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
SearchResp(27-185)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (6)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
39-41:configForKeyimplementation looks fineUsing a
ConcurrentMapfor per-key configs with a simpleputis appropriate and thread-safe for this use case.
93-94: Good switch to throwable-aware logging invalidateObjectPassing the exception as a separate parameter (
logger.error("Failed to validate client, exception: ", e);) aligns with SLF4J best practices and preserves stack traces cleanly.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
172-199: Example logic and pool-usage pattern look solidThe search,
concurrentSearch, andmainmethods exercise the new per-key pool API (preparePool, QPS metrics, idle/active counts) in a realistic way: clients are always returned infinally, metrics are sampled while the executor runs, and the pool is properly closed at the end.Also applies to: 208-256, 258-272
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
2975-3079: Per-key pool behavior and assertions look appropriateThe test’s use of
minIdlePerKey,maxIdlePerKey,maxTotalPerKey,preparePool, and the per-key/total idle+active counters aligns with the newClientCache/ClientPoolbehavior and validates the intended scaling and retirement semantics for the dummy DB.Also applies to: 3090-3095
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
194-236: Client selection and ref-counting design looks correctThe
getClient()logic (min-load selection fromactiveClientList, with per-wrapper ref counting) combined with the single-writer pattern for list mutations andCopyOnWriteArrayListiteration gives a clean, lock-free read path for callers, with QPS-based scaling handled by the scheduler. TheClientWrapperequality semantics also correctly support matching a raw client instance inreturnClient(...).Based on learnings, this concurrency design for active/retire lists and ref counting is sound.
Also applies to: 287-333
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
65-75: Per-key cache wiring and metrics accessors look correctThe integration with
ClientCache(preparePool,getClient,clear, per-key and total idle/active counts, andfetchClientPerSecond) is coherent and provides a clean per-key abstraction overGenericKeyedObjectPool, with timers properly stopped before clearing/closing the pool.Also applies to: 158-169, 176-188, 196-232
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
55-71: Align debug guard with debug-level logging increateanddestroyObject
logger.isDebugEnabled()is used but the guarded calls arelogger.info(...), so the guard is ineffective and the messages always log at INFO.Proposed fix
@Override public T create(String key) throws Exception { try { - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} creates a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} creates a client", key); + } @@ @Override public void destroyObject(String key, PooledObject<T> p) throws Exception { - if (logger.isDebugEnabled()) { - logger.info("PoolClientFactory key: {} closes a client", key); - } + if (logger.isDebugEnabled()) { + logger.debug("PoolClientFactory key: {} closes a client", key); + }Also applies to: 78-85
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3187-3190: Bound the drain-wait loop to avoid hanging the test suiteThe loop waiting for
pool.getActiveClientNumber(key) > 1has no timeout. If active clients never drop (due to a regression), this test will hang indefinitely.Proposed bounded wait with assertion
- while (pool.getActiveClientNumber(key) > 1) { - TimeUnit.SECONDS.sleep(1); - System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); - } + int maxWaitSeconds = 30; + int waited = 0; + while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) { + TimeUnit.SECONDS.sleep(1); + waited++; + System.out.printf("waiting idle %d, active %d%n", + pool.getIdleClientNumber(key), pool.getActiveClientNumber(key)); + } + Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, + String.format("Active clients did not drain within %d seconds", maxWaitSeconds));sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
3036-3057: Add null check before returning client in finally block.If
pool.getClient(dummyDb)returns null (e.g., pool exhausted or cache creation failed), the finally block will callpool.returnClient(dummyDb, null). While the currentreturnClientimplementation may handle this gracefully (line 136 in ClientPool.java logs a warning for missing cache), passing null is semantically incorrect and could mask pool issues.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }
3081-3088: Add timeout to prevent infinite hang in drain loop.The loop waiting for
pool.getActiveClientNumber(dummyDb) > 1has no upper bound. If the pool's retirement logic has a bug and clients don't converge to the expected state, this test will hang indefinitely and block CI/CD pipelines.🔎 Proposed fix
+ long waitStart = System.currentTimeMillis(); while (pool.getActiveClientNumber(dummyDb) > 1) { TimeUnit.SECONDS.sleep(1); System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb)); + if (System.currentTimeMillis() - waitStart > 60_000L) { + Assertions.fail("Pool did not converge to a single active client within 60s. Current state: idle=" + + pool.getIdleClientNumber(dummyDb) + ", active=" + pool.getActiveClientNumber(dummyDb)); + } }sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)
21-21: InitializecacheMapLockat field declaration to prevent NPE.
cacheMapLockis uninitialized in the protected no-arg constructor (lines 23-25). Any subclass or code path using this constructor will encounter a NullPointerException whencacheMapLock.lock()is invoked at line 108 ingetCache(). Initialize the lock at declaration:🔎 Proposed fix
- private Lock cacheMapLock; + private final Lock cacheMapLock = new ReentrantLock(true);Then remove the redundant initialization from line 30 in the 2-arg constructor.
131-138: Return client to underlying pool when cache is missing.If
returnClientis called for a key whose cache was cleared (e.g., viaclear(key)), the client is neither returned to the pool nor closed, causing a resource leak. While this scenario is unlikely in correct usage, defensive programming suggests falling back to the underlying pool:🔎 Proposed fix
public void returnClient(String key, T grpcClient) { ClientCache<T> cache = clientsCache.get(key); if (cache != null) { cache.returnClient(grpcClient); } else { - logger.warn("No such key: {}", key); + logger.warn("No cache for key: {}, returning client directly to pool", key); + try { + clientPool.returnObject(key, grpcClient); + } catch (Exception e) { + logger.error("Failed to return client to pool for key: {}", key, e); + } } }
🧹 Nitpick comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
27-31: Consider makingfetchClientPerSecondvolatile for clearer cross-thread visibility
fetchClientPerSecondis written from the scheduler thread and read from others without synchronization. Althoughfloatwrites are atomic and eventual consistency is acceptable for metrics, marking itvolatilewould guarantee up-to-date reads with negligible cost.Proposed tweak
- private float fetchClientPerSecond = 0.0F; + private volatile float fetchClientPerSecond = 0.0F;Also applies to: 95-97, 283-285
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
49-55: TreatServerUrias an actual constant and follow Java naming conventions
public static String ServerUriis effectively a constant but mutable and non-idiomatic. Making itprivate static finaland renaming toSERVER_URIimproves clarity and prevents accidental modification.Proposed change
- public static String ServerUri = "http://localhost:19530"; + private static final String SERVER_URI = "http://localhost:19530"; @@ - .uri(ServerUri) + .uri(SERVER_URI) @@ - .uri(ServerUri) + .uri(SERVER_URI)Also applies to: 97-101, 290-293
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)
172-185: Include the initialgetClientcall in the retry loop insearch()Right now the first
pool.getClient(DemoKey)is outside the retry loop, so an exception there won’t be retried or logged like subsequent attempts. Folding that initial call into thewhile (client == null)loop keeps behavior consistent for all attempts.Proposed refactor
- try { - client = pool.getClient(DemoKey); - while (client == null) { - try { - // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full. - // retry to call until it return a client. - client = pool.getClient(DemoKey); - } catch (Exception e) { - System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage()); - } - } + try { + while (client == null) { + try { + // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full. + // retry to call until it returns a client. + client = pool.getClient(DemoKey); + } catch (Exception e) { + System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage()); + } + }
246-253: Optional: add a timeout when waiting for active clients to retireThe loop waiting for
pool.getActiveClientNumber(DemoKey) > 1has no upper bound, so a bug in retirement logic could make the demo hang. A simple max-wait counter (or time-based limit) with a message would keep behavior predictable.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)
69-74: Consider documenting exception behavior inpreparePool.While the null check at line 71 guards against a missing cache,
getCache(key)(lines 99-121) should never return null under normal circumstances—it creates a cache if absent. IfClientCacheconstruction throws an exception, it will propagate to the caller. Consider documenting this behavior or adding exception handling if silent failure is preferred.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.java
🧬 Code graph analysis (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (1)
CreateCollectionReq(36-1112)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (9)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
IndexParam(24-245)sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
DropCollectionReq(22-120)sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
HasCollectionReq(22-80)sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
QueryReq(26-277)sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
InsertResp(25-81)sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
QueryResp(27-122)sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
SearchResp(27-185)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (11)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
194-236: Min-load selection and ref-counting look correct with the single-writer + COW designThe
getClient()implementation combined withClientWrapperref-counting is consistent with the single-writer pattern onactiveClientListandretireClientList; iteration overCopyOnWriteArrayListwithout extra locking is safe here. The min-load selection logic is straightforward and appropriate for the expected small active-list size.Based on learnings, this matches the intended concurrency model.
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
295-301: Updated pool sizing and final accounting look consistentUsing a tighter
PoolConfig(maxIdlePerKey(1),maxTotalPerKey(5)) and computing totals withdbNames.size()correctly aligns resource usage and reported operation counts with the per-database workload.Also applies to: 341-344
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
132-143: Conservative per-key defaults are reasonable and consistent with usageThe new
Builderdefaults (minIdlePerKey = 1,maxIdlePerKey = 2,maxTotalPerKey = 5) match the examples/tests and should reduce memory/connection footprint out of the box while still allowing tuning via setters.sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)
39-53: configForKey as a void setter matches per-key config usageChanging
configForKeyto a void method that just updatesconfigForKeysmakes it a straightforward per-key configuration setter and simplifies the API; this aligns with how it’s used from the pools.sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3137-3153: Per-key pool configuration and assertions intestClientPoollook correctThe test’s use of
minIdlePerKey,maxIdlePerKey, andmaxTotalPerKeytogether with per-key/total active-idle assertions validates the new pool behavior nicely and exercises the adaptive scaling logic under load.Also applies to: 3183-3185, 3191-3194
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
3060-3072: LGTM: Proper ExecutorService lifecycle management.The executor is correctly shut down and awaits termination with a reasonable timeout. The test fails explicitly if tasks don't complete within 100 seconds, preventing CI hangs.
3085-3092: Pool state assertions validate per-key behavior correctly.The assertions confirm that the pool has converged to the expected state with
maxIdlePerKeyidle clients and 1 active client remaining. This validates the per-key pool lifecycle and retirement logic.sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)
27-46: Constructor properly initializes per-key pool configuration.The constructor correctly sets up the
GenericKeyedObjectPoolConfig<T>with all necessary parameters fromPoolConfig, including per-key limits, eviction settings, and test-on-borrow/return flags.
99-121: Double-checked locking pattern correctly prevents redundant cache creation.The implementation ensures only one
ClientCacheinstance is created per key, even under concurrent access. The lock is held only during initialization, minimizing contention. The pattern is sound once thecacheMapLockfield initialization issue (line 21) is resolved.Based on learnings: The
ClientCacheimplementation uses CopyOnWriteArrayList with a single-writer pattern (timer thread only) for modifying client lists, ensuring thread-safe client selection ingetClient()without additional locking.
159-169: Cache cleanup properly stops timers before clearing pool.The
clear()method correctly stops all per-key timers before clearing the cache map and underlying pool. This prevents timer threads from attempting to access cleared caches. The comment about unreturned clients receiving UNAVAILABLE errors provides useful context for understanding the cleanup semantics.
226-232: Metrics method correctly avoids side effects.The
fetchClientPerSecondmethod properly usesclientsCache.get(key)instead ofgetCache(key), ensuring no cache is created as a side effect of querying metrics. Returning 0.0F for a missing cache is semantically correct for a metrics query.
Signed-off-by: yhmo <yihua.mo@zilliz.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
58-60: Inconsistent log level: checking debug but logging info.This issue was already flagged in previous review.
80-82: Same log level inconsistency as increate().This issue was already flagged in previous review.
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
3187-3190: Wait loop without timeout was previously flagged.This concern was raised in a previous review suggesting a timeout to prevent indefinite hangs.
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)
3036-3057: Null client can be passed toreturnClientifgetClientfails.If
pool.getClient(dummyDb)throws an exception or returns null, thefinallyblock at line 3056 will callpool.returnClient(dummyDb, null). Add a null check before returning the client.🔎 Proposed fix
} finally { - pool.returnClient(dummyDb, client); + if (client != null) { + pool.returnClient(dummyDb, client); + } }
3081-3088: Add a timeout to the post-workload drain loop.The loop waiting for
pool.getActiveClientNumber(dummyDb) > 1has no timeout. If the active count doesn't converge to 1 (e.g., due to a bug in retirement logic), this test can hang indefinitely and block CI/CD pipelines.🔎 Proposed fix
+ long waitStart = System.currentTimeMillis(); while (pool.getActiveClientNumber(dummyDb) > 1) { TimeUnit.SECONDS.sleep(1); System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb)); + if (System.currentTimeMillis() - waitStart > 60_000L) { + Assertions.fail("Pool did not converge to a single active client within 60s"); + } }
🧹 Nitpick comments (1)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)
246-252: Add timeout to prevent indefinite waiting.The while loop at line 249 polls until active clients drain to 1, but has no timeout. If clients fail to become idle due to a bug, the demo will hang indefinitely. Consider adding a bounded wait for robustness.
🔎 Proposed fix with timeout
// just demo the pool can automatically destroy idle clients, you can directly close the pool without waiting // it in practice. + int maxWaitSeconds = 60; + int waited = 0; - while (pool.getActiveClientNumber(DemoKey) > 1) { + while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) { TimeUnit.SECONDS.sleep(1); + waited++; printPoolState(); } + if (waited >= maxWaitSeconds) { + System.out.println("Warning: Timed out waiting for clients to become idle"); + }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.javaexamples/src/main/java/io/milvus/v2/ClientPoolExample.javasdk-core/src/main/java/io/milvus/pool/ClientCache.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/PoolClientFactory.javasdk-core/src/main/java/io/milvus/pool/PoolConfig.javasdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javasdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
Applied to files:
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.javaexamples/src/main/java/io/milvus/v2/ClientPoolDemo.javasdk-core/src/main/java/io/milvus/pool/ClientPool.javasdk-core/src/main/java/io/milvus/pool/ClientCache.java
🧬 Code graph analysis (4)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (11)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
ConnectConfig(31-455)sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
IndexParam(24-245)sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
AddFieldReq(30-406)sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (2)
CreateCollectionReq(36-1112)CollectionSchema(404-529)sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
HasCollectionReq(22-80)sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
InsertReq(26-141)sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
QueryReq(26-277)sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
SearchReq(32-482)sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
InsertResp(25-81)sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
QueryResp(27-122)sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
SearchResp(27-185)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
PoolConfig(5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
MilvusClientException(22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and test
- GitHub Check: Summary
🔇 Additional comments (13)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)
93-93: Good improvement: parameterized logging for exception.The exception is now passed as a separate parameter rather than concatenated, which follows SLF4J best practices and ensures the full stack trace is logged.
39-41: The method signature change appears safe. All usages ofconfigForKeythroughout the codebase (test files, examples, and the delegation inClientPool.java) call the method without expecting or capturing a return value, making them compatible with the newvoidreturn type.Likely an incorrect or invalid review comment.
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
132-136: Significant reduction in default pool sizing values.The new defaults are considerably more restrictive:
minIdlePerKey: 0 → 1maxIdlePerKey: 10 → 2maxTotalPerKey: 50 → 5This is a breaking change for users relying on default values. Existing applications using
PoolConfig.builder().build()without explicit sizing will now have much smaller pools, which could cause increased latency under load if clients are exhausted.Consider documenting this change prominently in release notes.
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (3)
50-50: LGTM: Constant naming follows Java conventions.Renaming
serverUritoServerUrifollows Java naming conventions forpublic staticfields used as constants.
295-301: Pool configuration aligns with new defaults.The example now uses more restrictive pool settings (
maxIdlePerKey=1,maxTotalPerKey=5) which align with the updatedPoolConfigdefaults and demonstrate realistic per-key pool usage.
342-343: Correct timing calculation for per-database operations.The formula
threadCount * repeatRequests * dbNames.size()correctly accounts for operations across all databases, fixing the previous calculation that only counted operations for a single database.examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (4)
43-67: LGTM: Static pool initialization with warmup.The static block properly initializes the pool with appropriate configuration and calls
preparePool(DemoKey)to pre-create clients according tominIdlePerKey, reducing first-call latency. The exception handling correctly wraps checked exceptions in RuntimeException.
69-127: Null client handling in finally block.The finally block at lines 124-126 calls
pool.returnClient(DemoKey, client)even whenclientcould be null (from early return at line 77). Based on previous review discussion,returnClientsafely handles null clients, so this is acceptable.
172-199: Retry loop for client acquisition is appropriate.The retry loop at lines 176-184 handles the case where
getClient()returns null or throws an exception when the pool is exhausted, ensuring the demo doesn't fail under heavy load. This pattern is consistent withClientPoolExample.
258-272: Demo main method is well-structured.The main method demonstrates a complete workflow: collection creation, data insertion, two rounds of concurrent searches with different thread counts, and proper pool cleanup. This provides a good reference for users.
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (3)
3141-3153: LGTM: Per-key pool configuration and preparation.The test properly configures per-key pool limits and calls
preparePool(key)before testing. The assertion at line 3153 correctly verifies thatminIdlePerKeyclients are pre-created as active clients in the pool.
3159-3172: Client handling in concurrent test loop.The try-catch-finally pattern correctly handles client acquisition and return. Per previous discussion,
returnClientsafely handles null clients.
3183-3194: Good validation of pool metrics.The assertions properly validate:
- Active client count matches
maxTotalPerKeyafter concurrent load- Total active/idle counts via new aggregate methods
- QPS metric is logged for observability
- Final state shows expected idle and active counts after drain
#1713