Skip to content

Conversation

@yhmo
Copy link
Contributor

@yhmo yhmo commented Dec 25, 2025

#1713

  • Core invariant: each database/key is managed by its own ClientCache (sdk-core/src/main/java/io/milvus/pool/ClientCache.java). ClientCache owns activeClientList/retireClientList, runs checkQPS() on a ~1s timer, and enforces underlying GenericKeyedObjectPool limits (maxTotalPerKey/minIdlePerKey) while using THRESHOLD_INCREASE/THRESHOLD_DECREASE to autoscale per-key clients.
  • Logic removed/simplified: global borrow/return and global least-loaded selection were moved into per-key ClientCache; ClientPool now delegates getClient/returnClient to per-key caches (getCache/ClientCache.preparePool), centralizing pre-warm, least-loaded selection (ClientWrapper.refCount), retire/return cycles, and removing the previous duplicated global management. PoolClientFactory.configForKey was simplified to void since its previous return value was unused.
  • Why no data loss or behavior regression: all operations still call the same pool APIs (ClientCache.fetchFromPool → clientPool.borrowObject(key); returnToPool → clientPool.returnObject(key, client); clear()/close() stop per-key timers then clear/close the underlying pool). Concurrency controls remain in place (cacheMapLock for cache creation; clientListLock for first-client creation; CopyOnWriteArrayList + AtomicInteger/AtomicLong for per-client refCounts and counters). Retired wrappers are only returned when refCount <= 0, ensuring in-flight requests are not dropped.
  • New capability: adds a QPS-aware, per-key self-tuning client cache with runtime metrics and control points (ClientPool.preparePool(String), fetchClientPerSecond(String), ClientCache.fetchClientPerSecond()), plus tightened PoolConfig per-key defaults and updated tests/examples exercising per-key pooling and lifecycle.

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yhmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Dec 25, 2025

Walkthrough

Adds a new generic per-key ClientCache that maintains active and retired client wrappers, tracks per-key QPS, and dynamically borrows/retires clients from a GenericKeyedObjectPool. ClientPool was refactored to use a ConcurrentMap of ClientCache instances with lazy initialization, per-key lifecycle methods (preparePool, clear), typed pool config, per-key metrics/accessors, and updated return/close flows. PoolClientFactory.configForKey changed to void and logging was adjusted. PoolConfig.Builder per-key defaults were tightened. Tests and examples were added/updated to exercise per-key pool preparation, concurrent workloads, and the new metrics/accessors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Optimize client pool' is vague and generic, using non-descriptive language that doesn't convey meaningful information about the specific changes implemented. Consider a more specific title like 'Add per-key client caching with QPS-based scaling to client pool' that better describes the main innovation introduced in this changeset.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.3)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (6)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)

96-118: Global lock may cause contention across different keys.

The cacheMapLock is a single lock shared across all keys. When multiple threads initialize different keys concurrently, they will block each other unnecessarily. Consider using a per-key lock pattern or ConcurrentHashMap.computeIfAbsent() which provides atomic lazy initialization.

🔎 Proposed refactor using computeIfAbsent
     private ClientCache<T> getCache(String key) {
-        ClientCache<T> cache = clientsCache.get(key);
-        if (cache == null) {
-            // If clientsCache doesn't contain this key, there might be multiple threads run into this section.
-            // Although ConcurrentMap.putIfAbsent() is atomic action, we don't intend to allow multiple threads
-            // to create multiple ClientCache objects at this line, so we add a lock here.
-            // Only one thread that first obtains the lock runs into putIfAbsent(), the others will be blocked
-            // and get the object after obtaining the lock.
-            // This section is entered one time for each key, the lock basically doesn't affect performance.
-            cacheMapLock.lock();
-            try {
-                if (!clientsCache.containsKey(key)) {
-                    cache = new ClientCache<>(key, clientPool);
-                    clientsCache.put(key, cache);
-                } else {
-                    cache = clientsCache.get(key);
-                }
-            } finally {
-                cacheMapLock.unlock();
-            }
-        }
-        return cache;
+        return clientsCache.computeIfAbsent(key, k -> new ClientCache<>(k, clientPool));
     }

69-76: Inconsistent error handling: logs error but returns silently.

If getCache(key) returns null (which shouldn't happen given current implementation), the method logs an error but continues silently. Consider throwing an exception for consistency with other pool operations.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

3028-3058: Worker class exception handling could mask test failures.

The catch block prints the exception but doesn't fail the test or track failures. If requests fail, the test will still pass. Consider using AtomicInteger to count failures and assert on it after completion.

🔎 Proposed improvement
+            AtomicInteger failureCount = new AtomicInteger(0);
+
             class Worker implements Runnable {
                 private int id = 0;
 
                 public Worker(int id) {
                     this.id = id;
                 }
 
                 @Override
                 public void run() {
                     MilvusClientV2 client = null;
                     try {
                         client = pool.getClient(dummyDb);
                         Assertions.assertEquals(dummyDb, client.currentUsedDatabase());
                         // ... search logic ...
                     } catch (Exception e) {
                         System.out.printf("request failed: %s%n", e);
+                        failureCount.incrementAndGet();
                     } finally {
                         pool.returnClient(dummyDb, client);
                     }
                 }
             }
             // ... after executor termination ...
+            Assertions.assertEquals(0, failureCount.get(), "Some requests failed during the test");

3081-3084: Busy-wait loop with 1-second sleep may cause test flakiness.

The loop waits indefinitely for active clients to drop to 1. If the pool's QPS-based retirement logic is slow or doesn't trigger, this could hang. Consider adding a timeout.

🔎 Proposed improvement
+            int maxWaitSeconds = 30;
+            int waited = 0;
             while (pool.getActiveClientNumber(dummyDb) > 1) {
                 TimeUnit.SECONDS.sleep(1);
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb));
+                waited++;
+                if (waited > maxWaitSeconds) {
+                    Assertions.fail("Timed out waiting for clients to be retired");
+                }
             }
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)

174-175: Setting thread priority to MAX_PRIORITY in a timer task is unusual.

This can starve other threads. The QPS check is lightweight and doesn't need elevated priority. Consider removing this.

🔎 Proposed fix
             @Override
             public void run() {
-                Thread currentThread = Thread.currentThread();
-                currentThread.setPriority(Thread.MAX_PRIORITY);
-
                 checkQPS();
             }

189-228: getClient() increments totalCallNumber but doesn't decrement on failure paths.

If fetchFromPool() returns null at line 198, totalCallNumber was already incremented but no client is returned. This skews QPS calculations. Consider decrementing on failure paths.

🔎 Proposed fix
     public T getClient() {
         totalCallNumber.incrementAndGet();
         if (activeClientList.isEmpty()) {
             clientListLock.lock();
             try {
                 if (activeClientList.isEmpty()) {
                     T client = fetchFromPool();
                     if (client == null) {
+                        totalCallNumber.decrementAndGet();
                         return null;
                     }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb10051 and e6796d1.

📒 Files selected for processing (4)
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (5)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (1)
  • CreateCollectionReq (36-1112)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/LoadCollectionReq.java (1)
  • LoadCollectionReq (25-227)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (5)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

93-93: Good fix: passing exception as a separate parameter enables proper stack trace logging.

This is the correct pattern for SLF4J - passing the exception as the last argument ensures the full stack trace is logged.


39-41: Breaking API change: configForKey no longer returns a value.

The method signature changed from returning C (previous config) to void. While no internal callers in this codebase depend on the return value, external consumers of this public API that relied on the previous return value would be affected.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

2975-2997: Good test setup: creates dedicated collection for pool testing.

The test properly sets up an isolated collection in a dedicated database for testing pool behavior, ensuring test isolation.

sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)

41-61: Double initialization: preparePool() then borrowObject() creates more clients than intended.

clientPool.preparePool(this.key) already creates minIdlePerKey idle clients in the pool. The subsequent loop borrows those same clients, so the pool ends up with 0 idle and minIdlePerKey active. If this is intentional (pre-warm the cache), the comment should clarify this. Otherwise, just borrowing would suffice.


274-320: ClientWrapper.getClient() increments refCount on each call - potential for mismatched counts.

If the caller calls wrapper.getClient() multiple times without corresponding returnClient() calls, the refCount will be incorrect. Consider separating the "get client" and "increment ref" operations, or documenting that getClient() should only be called once per borrow.

Review all call sites to ensure getClient() is called exactly once per borrow and returnClient() once per return.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (12)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

58-60: Log level inconsistency already flagged.

This issue was identified in a previous review - the code checks isDebugEnabled() but calls logger.info().


80-82: Log level inconsistency already flagged.

Same issue as in create() - already identified in a previous review.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)

128-133: Silent no-op issue already flagged.

This issue was identified in a previous review - when the cache doesn't exist, the client is silently not returned, potentially causing resource leaks.


221-227: Side effect issue already flagged.

This issue was identified in a previous review - calling getCache(key) creates a new cache with timer if the key doesn't exist, which is an unexpected side effect for a metrics query method.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3028-3058: Null client handling issue already flagged.

The finally block at lines 3055-3057 unconditionally calls pool.returnClient(dummyDb, client) which could pass null if getClient failed. This was identified in a previous review.

sdk-core/src/main/java/io/milvus/pool/ClientCache.java (7)

19-20: Public mutable static fields already flagged.

This issue was identified in a previous review - public mutable static fields are not thread-safe and violate encapsulation.


27-27: Non-daemon Timer already flagged.

This issue was identified in a previous review - the Timer should be created as daemon to prevent blocking JVM shutdown.


88-91: Log level inconsistency already flagged.

This issue was identified in a previous review.


120-122: Log level inconsistency already flagged.

This issue was identified in a previous review.


140-144: TOCTOU race already flagged.

This issue was identified in a previous review - the list can change between get(maxIndex) and remove(maxIndex).


156-158: Log level inconsistency already flagged.

This issue was identified in a previous review.


265-265: String concatenation in logging already flagged.

This issue was identified in a previous review.

🧹 Nitpick comments (1)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

33-39: Consider making the class package-private.

The class is declared as public but the constructor is protected. If this class is intended only for internal use within the pool package, consider making the class package-private to prevent unintended external instantiation or subclassing.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6796d1 and c08d7b5.

📒 Files selected for processing (5)
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (10)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)

133-136: Verify impact of reduced default pool sizes on existing users.

The default values have been significantly reduced:

  • minIdlePerKey: 0 → 1
  • maxIdlePerKey: 10 → 2
  • maxTotalPerKey: 50 → 5
  • maxTotal: 1000 → 20

These changes could be breaking for users who rely on default configurations for high-throughput scenarios. Existing applications may experience increased blocking or reduced concurrency unless they explicitly override these values.

Consider documenting this change in release notes as a potential breaking change.

sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

39-41: LGTM: Simplified configForKey signature.

The method now returns void instead of the previous value, which is appropriate since callers typically don't need the prior configuration. This simplifies the API.


93-93: LGTM: Proper parameterized logging for exceptions.

Good improvement - using the exception as a separate parameter enables proper stack trace logging.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)

9-12: LGTM: Thread-safe per-key cache infrastructure.

The addition of ConcurrentHashMap for clientsCache and a fair ReentrantLock for cache creation provides proper thread-safety for the per-key caching mechanism.

Also applies to: 20-21, 30-30


65-76: LGTM: Pool preparation with proper error handling.

The preparePool method correctly delegates to the per-key cache and handles the null case with appropriate error logging.


96-118: LGTM: Thread-safe lazy cache initialization.

The double-checked locking pattern with explicit lock is correctly implemented. The comment accurately explains why putIfAbsent alone isn't sufficient—it prevents multiple ClientCache instantiations when threads race on a new key.


153-164: LGTM: Proper cleanup of per-key caches on clear.

The implementation correctly stops timers for all caches before clearing the cache map and underlying pool. The inline comment about unreturned clients and UNAVAILABLE errors provides helpful context.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

2975-2998: LGTM: Comprehensive test setup for pool testing.

The test properly creates a dedicated database and collection for pool testing, with data insertion and loading to enable actual query operations.


3060-3089: LGTM: Thorough pool behavior verification.

The test properly:

  • Uses ExecutorService for controlled concurrent execution
  • Verifies active/idle client counts after workload completion
  • Waits for clients to become idle before final assertions
  • Validates pool metrics match expected configuration
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

274-320: LGTM: Thread-safe ClientWrapper implementation.

The ClientWrapper correctly uses AtomicInteger for thread-safe reference counting. The equals implementation properly handles comparison with both ClientWrapper instances and raw client objects, which is necessary for the returnClient lookups.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)

128-133: Duplicate: Silent no-op may leak clients (previously flagged).

This issue was flagged in a previous review. When returnClient is called for a key with no cache (e.g., after clear(key)), the client is never returned to the underlying pool, potentially causing a resource leak. The recommended fix is to return the client directly to clientPool when the cache is missing.


221-227: Duplicate: Metrics method has side effect (previously flagged).

This issue was flagged in a previous review. Calling getCache(key) will create a new ClientCache (and start its timer) if the key doesn't exist. A metrics query method should not have this side effect. Use clientsCache.get(key) directly instead.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3053-3058: Null client passed to returnClient if getClient fails.

If pool.getClient(dummyDb) throws or returns null, the finally block will call pool.returnClient(dummyDb, null). Add a null check before returning.

🔎 Proposed fix
                     } catch (Exception e) {
                         System.out.printf("request failed: %s%n", e);
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)

150-154: TOCTOU race: list can change between get(maxIndex) and remove(maxIndex).

CopyOnWriteArrayList creates a new array on mutation, so between getting the wrapper at maxIndex and removing it, another thread could have modified the list, causing IndexOutOfBoundsException or removing the wrong element.

🔎 Proposed fix using remove(Object)
             if (maxIndex >= 0) {
-                ClientWrapper<T> wrapper = activeClientList.get(maxIndex);
-                activeClientList.remove(maxIndex);
+                ClientWrapper<T> wrapper = activeClientList.get(maxIndex);
+                if (activeClientList.remove(wrapper)) {  // Removes by object reference
+                    retireClientList.add(wrapper);
+                }
-                retireClientList.add(wrapper);
             }

194-233: Race condition in min-load client selection.

The loop at lines 221-226 iterates through activeClientList to find the minimum-load wrapper, but between finding the wrapper and calling wrapper.getClient() at line 232, the wrapper could be retired by checkQPS() running on the scheduler thread. This could result in returning a client that is already in the retire list, though functionally this may be acceptable since the client is still valid.

Consider whether this race is acceptable for your use case. If strict consistency is needed, you may want to verify the wrapper is still in activeClientList before returning, or accept that occasionally a "retired" client may be returned (which is still functional).

🧹 Nitpick comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)

30-30: Consider whether fair lock is necessary.

The fair ReentrantLock guarantees FIFO acquisition order but has lower throughput than non-fair locks. Since getCache is called only once per key and primarily during initialization, fairness may not be critical here. Consider using the default non-fair lock (new ReentrantLock()) unless FIFO ordering is a strict requirement.

🔎 Alternative implementation
-        this.cacheMapLock = new ReentrantLock(true);
+        this.cacheMapLock = new ReentrantLock();

96-118: Minor inconsistency in getCache implementation.

The method uses clientsCache.containsKey(key) inside the lock but clientsCache.get(key) outside. For consistency and slightly better performance, use get() in both places.

🔎 Proposed refactor
     private ClientCache<T> getCache(String key) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache == null) {
             cacheMapLock.lock();
             try {
-                if (!clientsCache.containsKey(key)) {
+                cache = clientsCache.get(key);
+                if (cache == null) {
                     cache = new ClientCache<>(key, clientPool);
                     clientsCache.put(key, cache);
-                } else {
-                    cache = clientsCache.get(key);
                 }
             } finally {
                 cacheMapLock.unlock();
             }
         }
         return cache;
     }

69-76: Unnecessary null check.

Since getCache(key) always returns a non-null ClientCache (it creates one if absent), the null check at line 71 is unreachable. Consider removing it or adding a comment explaining it's defensive programming.

🔎 Proposed simplification
     public void preparePool(String key) {
         ClientCache<T> cache = getCache(key);
-        if (cache == null) {
-            logger.error("preparePool() gets null for key: {}", key);
-            return;
-        }
         cache.preparePool();
     }

87-94: Unnecessary null check in getClient.

Similar to preparePool, since getCache(key) never returns null, the null check and error logging at lines 89-91 are unreachable. The method can be simplified.

🔎 Proposed simplification
     public T getClient(String key) {
         ClientCache<T> cache = getCache(key);
-        if (cache == null) {
-            logger.error("getClient() return null for key: {}", key);
-            return null;
-        }
         return cache.getClient();
     }
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

265-273: Exception thrown from returnToPool may disrupt caller.

If returnObject fails, throwing a MilvusClientException here could be unexpected for callers who are just trying to clean up. The client is already retired, so the pool state might be inconsistent. Consider logging the error and continuing rather than propagating the exception.

🔎 Proposed fix
     private void returnToPool(T grpcClient) {
         try {
             clientPool.returnObject(this.key, grpcClient);
         } catch (Exception e) {
             // the pool might return exception if the key doesn't exist or the grpcClient doesn't belong to this pool
             logger.error("Failed to return client, exception: ", e);
-            throw new MilvusClientException(ErrorCode.CLIENT_ERROR, e);
+            // Don't throw - the client is being cleaned up, throwing here could disrupt callers
         }
     }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c08d7b5 and 6dfbb60.

📒 Files selected for processing (6)
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and test
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Summary
🔇 Additional comments (15)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (3)

139-148: Good cleanup sequence.

Calling clear() before closing the pool ensures per-key cache timers are stopped and resources are released properly. The comments appropriately document the behavior with unreturned clients.


153-164: LGTM: Proper per-key cleanup.

The method correctly stops all cache timers, clears the cache map, and then clears the underlying pool. The cleanup order is appropriate.


171-183: LGTM: Per-key cleanup is correct.

The method properly handles per-key cleanup by stopping the timer, removing from the cache map, and clearing the underlying pool for that key.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (4)

3141-3153: Good test configuration.

The pool configuration with minIdlePerKey=1, maxIdlePerKey=2, and maxTotalPerKey=4 provides a realistic test scenario. The subsequent call to preparePool(key) and assertion that active clients equal minIdlePerKey validates the pre-warming behavior.


3156-3157: Significantly increased test load.

The test now executes 20,000 total requests (20 threads × 1000 requests) compared to the previous 100 requests. This better exercises the pool under concurrent load but may increase test execution time. Ensure this is acceptable for CI/CD pipelines.


3183-3185: LGTM: Pool capacity assertions.

The assertions correctly validate that the pool reached maximum capacity (maxTotalPerKey) under concurrent load, and the QPS logging provides useful runtime metrics.


3159-3173: The finally block pattern is safe—ClientCache.returnClient handles null clients gracefully.

When pool.getClient(key) fails and returns null, the finally block will call pool.returnClient(key, null). The method iterates through active and retired client lists and silently ignores the null client if no match is found, so no exception is thrown.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (5)

88-90: LGTM!

The new imports for ExecutorService and Executors are correctly added to support the concurrent pool testing below.


2975-2998: LGTM!

Test setup properly creates a collection in the dummy database, inserts test data, and loads the collection before testing the pool functionality.


3004-3011: LGTM!

Pool configuration is well-defined with explicit per-key limits. The variable names clearly document the expected pool behavior constraints.


3060-3078: LGTM!

The concurrent workload testing with ExecutorService is well-structured. The test properly validates:

  • Pool respects maxTotalPerKey constraint
  • QPS logging occurs periodically
  • Proper timeout handling with awaitTermination

3081-3092: LGTM!

The post-workload validation is thorough:

  • Waits for pool to stabilize after workload
  • Validates idle/active counts against configured limits
  • Confirms client database context is preserved
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (3)

17-19: LGTM!

The threshold constants are now declared as final, addressing the previous thread-safety concern about mutable static fields.


50-70: LGTM!

The preparePool() method properly initializes the pool and handles exceptions with appropriate logging. The loop correctly populates the active client list with pre-created clients.


279-325: LGTM!

The ClientWrapper inner class is well-designed:

  • Uses AtomicInteger for thread-safe reference counting
  • Properly implements equals() to support comparison with both wrapper and client instances
  • Clean encapsulation of client lifecycle management

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

58-60: Inconsistent log level: checking debug but logging info.

This issue was already flagged in a previous review.


80-82: Same log level inconsistency as in create().

This issue was already flagged in a previous review.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3055-3057: Null client passed to returnClient if getClient fails.

This issue was already flagged in a previous review.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

126-131: Silent no-op when cache doesn't exist may leak clients.

This issue was already flagged in a previous review.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3187-3194: Add timeout to wait loop.

The while loop at Line 3187 waits for active clients to drain but has no timeout. If clients fail to become idle due to a bug or resource leak, the test will hang indefinitely.

🔎 Proposed fix with timeout
-            while (pool.getActiveClientNumber(key) > 1) {
+            int maxWaitSeconds = 30;
+            int waited = 0;
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
                 TimeUnit.SECONDS.sleep(1);
+                waited++;
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
             }
+            Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, 
+                String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (3)

37-48: Scheduler thread should be daemon to prevent blocking JVM shutdown.

The ThreadFactory creates non-daemon threads, so if stopTimer() is not called before application shutdown, the scheduler thread will keep the JVM running indefinitely. This can cause issues in testing and production environments.

🔎 Proposed fix
         ThreadFactory threadFactory = new ThreadFactory() {
             @Override
             public Thread newThread(@NotNull Runnable r) {
                 Thread t = new Thread(r);
+                t.setDaemon(true);
+                t.setName("ClientCache-QPS-Checker-" + key);
                 t.setPriority(Thread.MAX_PRIORITY); // set the highest priority for the timer
                 return t;
             }
         };

150-154: TOCTOU race: list can change between get(maxIndex) and remove(maxIndex).

Between calling activeClientList.get(maxIndex) and activeClientList.remove(maxIndex), another thread could modify the list (e.g., in getClient() adding a new wrapper), potentially causing IndexOutOfBoundsException or removing the wrong element.

🔎 Proposed fix using remove by object reference
             if (maxIndex >= 0) {
                 ClientWrapper<T> wrapper = activeClientList.get(maxIndex);
-                activeClientList.remove(maxIndex);
+                if (activeClientList.remove(wrapper)) {
                     retireClientList.add(wrapper);
+                } else {
+                    logger.warn("ClientCache key: {} failed to retire wrapper, already removed", key);
+                }
             }

Using remove(Object) removes by identity, which is safer for concurrent access.


219-232: Race condition in min-load client selection.

Between finding the minimum-load wrapper (lines 219-226) and calling wrapper.getClient() (line 232), the checkQPS() method running on the scheduler thread could retire this wrapper and move it to retireClientList. This results in returning a client that is being phased out.

🔎 Suggested mitigation
         int minLoad = Integer.MAX_VALUE;
         ClientWrapper<T> wrapper = null;
         for (ClientWrapper<T> tempWrapper : activeClientList) {
             if (tempWrapper.getRefCount() < minLoad) {
                 minLoad = tempWrapper.getRefCount();
                 wrapper = tempWrapper;
             }
         }
+        // Verify wrapper is still in active list (checkQPS might have retired it)
+        if (wrapper != null && !activeClientList.contains(wrapper)) {
+            // Wrapper was retired, retry selection
+            if (!activeClientList.isEmpty()) {
+                wrapper = activeClientList.get(0);
+            }
+        }
         if (wrapper == null) {
             // should not be here
             wrapper = activeClientList.get(0);
         }

         return wrapper.getClient();

Alternatively, use a lock around both selection and getClient() to make the operation atomic, though this may impact performance.

🧹 Nitpick comments (3)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

3081-3084: Potential infinite loop without a timeout safeguard.

If clients are never returned (e.g., due to a bug), this while loop will run indefinitely, causing the test to hang. Consider adding a timeout or maximum iteration count.

🔎 Proposed fix
+            int maxWaitSeconds = 30;
+            int waitedSeconds = 0;
-            while (pool.getActiveClientNumber(dummyDb) > 1) {
+            while (pool.getActiveClientNumber(dummyDb) > 1 && waitedSeconds < maxWaitSeconds) {
                 TimeUnit.SECONDS.sleep(1);
+                waitedSeconds++;
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb));
             }
+            if (waitedSeconds >= maxWaitSeconds) {
+                Assertions.fail("Timed out waiting for active clients to reduce");
+            }

3028-3034: Minor: Field id shadows constructor parameter.

The field id and the constructor parameter share the same name. While this works, using this.id = id; explicitly clarifies intent. Consider renaming the parameter or keeping it as-is with the explicit this. qualifier.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

21-21: Consider making cacheMapLock final.

Since the lock is assigned once in the constructor and never reassigned, marking it as final would clearly communicate immutability and prevent accidental reassignment.

🔎 Proposed fix
     private final ConcurrentMap<String, ClientCache<T>> clientsCache = new ConcurrentHashMap<>();
-    private Lock cacheMapLock;
+    private final Lock cacheMapLock = new ReentrantLock(true);

And remove the assignment in the constructor:

     protected ClientPool(PoolConfig config, PoolClientFactory<C, T> clientFactory) {
         this.config = config;
         this.clientFactory = clientFactory;
-        this.cacheMapLock = new ReentrantLock(true);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dfbb60 and 330b587.

📒 Files selected for processing (6)
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and test
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Summary
🔇 Additional comments (21)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

39-41: LGTM! Simplified method signature.

The change from returning the previous config to void is a valid simplification, aligning with a "set-only" pattern.


93-93: Good improvement: parameterized exception logging.

Using the exception as a separate parameter (logger.error("...", e)) instead of concatenation is the correct SLF4J pattern, ensuring the full stack trace is logged properly.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)

88-90: LGTM! Added necessary imports for concurrent test execution.


2975-2997: LGTM! Proper test setup for per-key pool testing.

The test correctly creates a separate database and collection to exercise the per-key pool functionality.


3004-3011: LGTM! Well-defined pool configuration for testing.

The configuration explicitly sets minIdlePerKey, maxIdlePerKey, and maxTotalPerKey to verify pool behavior under concurrent load.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (6)

9-13: LGTM! Appropriate concurrency utilities for per-key caching.

The use of ConcurrentHashMap and ReentrantLock provides correct thread-safe access to the per-key cache.

Also applies to: 20-21


65-74: LGTM! Pre-warming capability for per-key pools.

The preparePool method allows callers to initialize clients before actual usage, reducing latency for the first getClient() call.


94-116: LGTM! Correct double-checked locking pattern.

The implementation correctly uses a lock to prevent multiple ClientCache instances from being created for the same key. The comment explaining why putIfAbsent alone is insufficient is helpful.


137-146: Good documentation of edge case behavior.

The comments explaining what happens if clients are not returned before clear() is called are helpful for understanding the expected behavior (clients receive UNAVAILABLE error and retry).

Also applies to: 151-162, 169-181


219-225: LGTM! Side-effect issue fixed.

The fetchClientPerSecond method now correctly uses clientsCache.get(key) instead of getCache(key), preventing the unintended side effect of creating a new ClientCache when querying metrics.


53-55: The signature change from returning C to void is safe. Only one call site exists (MilvusClientV2DockerTest.java:3093), and it does not capture or use the return value (pool.removeConfig(dummyDb);). All callers are compatible with the new void signature.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (3)

3141-3148: LGTM! Pool configuration is appropriate for testing.

The per-key pool configuration values (minIdlePerKey=1, maxIdlePerKey=2, maxTotalPerKey=4) are well-suited for exercising the pool's per-key scaling behavior in a test environment.


3151-3153: Verify the semantics of getActiveClientNumber(key) after preparePool().

The assertion expects minIdlePerKey (1) active clients immediately after preparePool(key). Based on the enriched summary, preparePool() creates minIdlePerKey clients and adds them to activeClientList. However, it's important to confirm:

  • Does getActiveClientNumber(key) count clients in the ClientCache.activeClientList, or does it reflect the underlying pool's active count?
  • Are these pre-created clients considered "active" (borrowed from the pool) or "idle" (available in the pool)?

If the clients are borrowed and placed in the cache, they should indeed show as active. Please verify this aligns with the intended pool accounting.


3191-3194: Final assertions look correct assuming proper pool draining.

The assertions expect the pool to stabilize at maxIdlePerKey (2) idle clients and 1 active client after the concurrent workload completes and clients are retired. This assumes the ClientCache.checkQPS() timer has successfully retired and returned excess clients to the pool during the draining loop.

sdk-core/src/main/java/io/milvus/pool/ClientCache.java (7)

18-19: LGTM! Static threshold fields are now immutable.

The THRESHOLD_INCREASE and THRESHOLD_DECREASE fields are now declared as public static final, which addresses thread-safety and encapsulation concerns from previous reviews.


50-70: LGTM! preparePool() correctly pre-warms the client cache.

The method properly initializes the cache by borrowing minIdlePerKey clients from the pool and adding them to the active list. Exception handling is appropriate with proper logging and error propagation.


161-174: LGTM! returnRetiredClients() safely returns idle retired clients.

The method correctly uses removeIf with a predicate that checks refCount <= 0 before returning clients to the pool. The atomic nature of removeIf ensures thread-safe list modification.


235-250: LGTM! returnClient() safely decrements reference counts.

The method correctly searches both activeClientList and retireClientList to find the matching wrapper and decrement its refCount. The CopyOnWriteArrayList ensures thread-safe iteration.


252-273: LGTM! Pool interaction methods have proper error handling.

Both fetchFromPool() and returnToPool() correctly use parameterized logging and wrap pool exceptions in MilvusClientException. The size check at line 254 prevents borrowing beyond MaxTotalPerKey.


275-277: LGTM! Simple metric accessor.

The fetchClientPerSecond() method is a straightforward getter for the QPS metric computed by checkQPS().


279-325: LGTM! ClientWrapper correctly manages per-client reference counting.

The inner class properly encapsulates a client with thread-safe reference counting using AtomicInteger. The equals() and hashCode() implementations correctly handle comparison with both ClientWrapper instances and raw client objects, enabling the returnClient() lookup pattern.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

118-131: returnClient behavior differs from Javadoc and is silent on unknown keys

The Javadoc claims returnClient will throw if the key doesn’t exist or the client doesn’t belong to the key group, but the implementation now silently no‑ops when no ClientCache is present. That mismatch can make misuse harder to spot and, if clear(key) is called while callers still hold clients, may leave those clients effectively unmanaged.

Either update the Javadoc to match the new behavior, or log a warning when clientsCache.get(key) returns null so incorrect usage is visible.

Example: log on missing cache
-    public void returnClient(String key, T grpcClient) {
-        ClientCache<T> cache = clientsCache.get(key);
-        if (cache != null) {
-            cache.returnClient(grpcClient);
-        }
-    }
+    public void returnClient(String key, T grpcClient) {
+        ClientCache<T> cache = clientsCache.get(key);
+        if (cache != null) {
+            cache.returnClient(grpcClient);
+        } else {
+            logger.warn("returnClient() called for non-existent cache key: {}", key);
+        }
+    }
♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

219-225: fetchClientPerSecond avoids side effects now

Switching fetchClientPerSecond to use clientsCache.get(key) instead of getCache(key) ensures a metrics call does not create a new ClientCache or start its timer when none exists. Returning 0.0F when the cache is absent is a sensible default.

sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

58-60: Align debug guard with log level

You still guard with logger.isDebugEnabled() but log at INFO, so the guard is ineffective and the message always appears at INFO. Either log at DEBUG or change the guard to isInfoEnabled().

Proposed adjustment
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} creates a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} creates a client", key);
+        }
@@
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} closes a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} closes a client", key);
+        }

Also applies to: 80-82

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3183-3195: Add timeout to active‑drain loop to avoid hanging tests

The loop waiting for pool.getActiveClientNumber(key) > 1 has no timeout. If the active count never drops (e.g., due to a bug in retire logic), this test can hang indefinitely.

Recommend bounding the wait and asserting if active clients fail to drain within a reasonable period.

Example bounded-drain loop
-            while (pool.getActiveClientNumber(key) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
-            }
+            int waited = 0;
+            int maxWaitSeconds = 30;
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                System.out.printf(
+                        "waiting idle %d, active %d%n",
+                        pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
+            }
+            Assertions.assertTrue(
+                    pool.getActiveClientNumber(key) <= 1,
+                    String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3055-3057: Add null check before returning client to pool.

If pool.getClient(dummyDb) at line 3039 throws an exception or returns null, the finally block will call pool.returnClient(dummyDb, null). Guard against passing null to returnClient.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }
🧹 Nitpick comments (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

50-55: Example changes are behaviorally consistent; consider idiomatic constant naming

  • Using ServerUri in both default and per‑DB ConnectConfig is consistent and keeps all traffic directed to the same endpoint.
  • Tighter pool settings (maxIdlePerKey(1), maxTotalPerKey(5)) align with the “optimize pool” objective and make contention more visible in the example.
  • The updated total‑request calculation with dbNames.size() correctly reflects the number of databases rather than assuming a fixed value.

If you’d like to polish style, you might make ServerUri a public static final String SERVER_URI, but that’s cosmetic.

Also applies to: 97-103, 289-299, 341-344

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

180-222: concurrentSearch logic exercises pool behavior well; consider a bounded wait

The concurrent search flow with a fixed thread pool plus periodic printPoolState() is a good way to visualize idle/active counts and QPS. The final loop that waits while pool.getActiveClientNumber(DemoKey) > 1 has no timeout, though, so if retirement logic regresses the demo could hang.

Adding a maximum wait duration with a clear message would keep this example safe to run unattended.

Illustrative bounded-wait pattern
-            // after all requests are done, the active clients will be retired and eventually only one idle client left
-            while (pool.getActiveClientNumber(DemoKey) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                printPoolState();
-            }
+            // after all requests are done, the active clients should retire and only one idle client remain
+            int waited = 0;
+            int maxWaitSeconds = 60;
+            while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                printPoolState();
+            }
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3155-3175: Threaded workload is representative; optional null handling

The threaded loop that repeatedly acquires a client, calls getVersion(), and returns the client stresses the per‑key limits as intended. Given getClient may return null on timeout, you could defensively check client != null before invoking getVersion() to make this test more robust under extreme latency, but with the current limits it’s unlikely to fail in practice.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

85-92: getClient uses per‑key cache; remove dead null‑cache branch

getClient now always goes through ClientCache.getClient(), which is consistent with the per‑key design. Given getCache will always either find or create a cache when clientPool is non‑null, the if (cache == null) branch is effectively unreachable and can be dropped to simplify the method. Any inability to borrow a client should be surfaced from within ClientCache.getClient() (e.g., via null or exception).

Possible simplification
-    public T getClient(String key) {
-        ClientCache<T> cache = getCache(key);
-        if (cache == null) {
-            logger.error("getClient() return null for key: {}", key);
-            return null;
-        }
-        return cache.getClient();
-    }
+    public T getClient(String key) {
+        return getCache(key).getClient();
+    }
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3091-3095: Consider returning the client before closing the pool.

The client obtained at line 3091 is not returned to the pool before pool.close() is called. While close() should handle cleanup, explicitly returning borrowed clients is better practice and makes the test's intent clearer.

🔎 Suggested improvement
             // get client connect to the dummy db
             MilvusClientV2 dummyClient = pool.getClient(dummyDb);
             Assertions.assertEquals(dummyDb, dummyClient.currentUsedDatabase());
+            pool.returnClient(dummyDb, dummyClient);
             pool.removeConfig(dummyDb);
             Assertions.assertNull(pool.getConfig(dummyDb));
             pool.close();
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 330b587 and a6b118e.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧬 Code graph analysis (4)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (8)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
  • HasCollectionReq (22-80)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
  • InsertResp (25-81)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
  • QueryResp (27-122)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
  • SearchResp (27-185)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (10)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

39-41: configForKey side‑effect‑only API looks correct

Changing configForKey to return void and only update configForKeys matches its actual behavior and avoids implying a transformed config is returned. No further changes needed here.


88-95: Improved exception logging in validateObject

Passing the exception as a separate parameter to logger.error is the correct pattern and preserves stack traces; wrapping it in MilvusClientException is consistent with the rest of the factory.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)

42-62: Static pool init and basic collection setup look sound

The static initializer builds a MilvusClientV2Pool with a conservative PoolConfig and wires it into the helper methods. createCollection/insertData correctly borrow and return clients in finally blocks, and the schema/index setup matches the subsequent insert/search patterns.

Also applies to: 64-119


224-235: Main demo flow is coherent

main wires together collection creation, two large concurrent search runs, and final pool close. The ordering is sensible and ensures the pool is closed after use; no additional lifecycle changes seem necessary.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3137-3153: Per‑key pool configuration and preparePool() usage align with new API

Configuring minIdlePerKey, maxIdlePerKey, and maxTotalPerKey and then calling preparePool(key) before issuing requests is exactly how the new per‑key cache is intended to be exercised. The assertion against getActiveClientNumber(key) verifies that preparePool eagerly warms the pool as designed.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (5)

19-33: Per‑key cache map and lock initialization look correct

The introduction of clientsCache and a fair ReentrantLock in the PoolConfig/PoolClientFactory constructor cleanly sets up the infrastructure for per‑key ClientCache instances and avoids contention on unrelated keys.


65-75: preparePool delegates correctly to per‑key cache

preparePool(String key) simply resolves the ClientCache and forwards to cache.preparePool(), which is exactly the right layering for pre‑warming a specific key’s pool. No additional synchronization is needed here given getCache’s locking.


94-116: getCache’s single‑creation logic for ClientCache is sound

The combination of clientsCache.get(key) with a lock‑guarded putIfAbsent style ensures only one ClientCache is created per key, while normal fast‑path lookups remain lock‑free. This matches the single‑writer pattern used inside ClientCache and avoids unnecessary contention.


137-181: close/clear/clear(key) lifecycle coordination is reasonable

Calling clear() before clientPool.close() and ensuring each ClientCache.stopTimer() is invoked before clearing the underlying pool is a sensible lifecycle sequence. It lets ClientCache manage closing any tracked clients (idle or active) while the pool is still available, then tears down pool state. The repeated comment explaining behavior with unreturned clients is helpful and consistent across close, clear, and clear(key).


183-217: Metric accessors correctly document thread‑safety and delegate to pool

The idle/active count methods are thin pass‑throughs to GenericKeyedObjectPool and are properly documented as thread‑safe. No issues here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

58-60: Align log level with isDebugEnabled() guard

Here you guard with logger.isDebugEnabled() but log at INFO. Either switch the call to logger.debug(...) or change the guard to logger.isInfoEnabled() so level and guard are consistent.

Proposed adjustment
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} creates a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} creates a client", key);
+        }
...
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} closes a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} closes a client", key);
+        }

Also applies to: 80-82

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3187-3194: Add a timeout around the active-client drain loop

The loop waiting for pool.getActiveClientNumber(key) <= 1 has no timeout. If clients fail to retire due to a regression, this test can hang indefinitely and stall the suite.

Example bounded wait with assertion
-            while (pool.getActiveClientNumber(key) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
-            }
+            int maxWaitSeconds = 30;
+            int waited = 0;
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                System.out.printf("waiting idle %d, active %d%n",
+                        pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
+            }
+            Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1,
+                    String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3055-3057: Null client passed to returnClient if getClient fails.

If pool.getClient(dummyDb) at line 3039 throws an exception or returns null, the finally block will call pool.returnClient(dummyDb, null). Add a null check before returning the client.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

131-136: Silent no-op when cache doesn't exist may leak clients.

If returnClient is called for a key whose cache has been removed (e.g., after clear(key) was called), the client is never returned to the underlying pool. This could lead to resource leaks, especially in scenarios where clients are obtained but the cache is cleared before they are returned.

Consider logging a warning and attempting to return the client directly to the underlying pool as a fallback.

🔎 Proposed fix
     public void returnClient(String key, T grpcClient) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache != null) {
             cache.returnClient(grpcClient);
+        } else {
+            logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key);
+            try {
+                clientPool.returnObject(key, grpcClient);
+            } catch (Exception e) {
+                logger.error("Failed to return client directly to pool, exception: ", e);
+            }
         }
     }
🧹 Nitpick comments (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

50-50: Consider using standard Java naming convention for constants.

The field ServerUri uses PascalCase, which is inconsistent with Java naming conventions. Static final fields (constants) should use UPPER_CASE_WITH_UNDERSCORES (e.g., SERVER_URI). While the current naming is consistent with other fields in this example file, following standard conventions improves readability and sets a better example for SDK users.

🔎 Suggested refactor to follow Java conventions
-    public static String ServerUri = "http://localhost:19530";
+    public static final String SERVER_URI = "http://localhost:19530";
     public static String CollectionName = "java_sdk_example_pool_v2";
     public static String VectorFieldName = "vector";
     public static int DIM = 128;

Then update all usages:

-                    .uri(ServerUri)
+                    .uri(SERVER_URI)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

168-239: Consider bounding the wait loops to avoid potential hangs

Both the retry loop in search() and the drain loop in concurrentSearch() can, in theory, run indefinitely if the pool or retirement logic misbehaves. That’s unlikely but can make the demo hang instead of failing fast.

Illustrative bounded wait changes
     private static void search() {
         MilvusClientV2 client = null;
         try {
-            client = pool.getClient(DemoKey);
-            while (client == null) {
+            client = pool.getClient(DemoKey);
+            int attempts = 0;
+            int maxAttempts = 20; // ~100s total at 5s per attempt
+            while (client == null && attempts < maxAttempts) {
                 try {
                     // getClient() might exceeds the borrowMaxWaitMillis and throw exception if the pool is full
                     // retry to call until it return a client
                     client = pool.getClient(DemoKey);
                 } catch (Exception e) {
                     System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage());
                 }
+                attempts++;
             }
+            if (client == null) {
+                throw new RuntimeException("Failed to acquire client from pool after retries");
+            }
@@
-            // after all requests are done, the active clients will be retired and eventually only one idle client left
-            while (pool.getActiveClientNumber(DemoKey) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                printPoolState();
-            }
+            // after all requests are done, the active clients will be retired
+            int waited = 0;
+            int maxWaitSeconds = 300;
+            while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                printPoolState();
+            }
+            if (pool.getActiveClientNumber(DemoKey) > 1) {
+                System.out.printf("Active clients did not drain within %d seconds%n", maxWaitSeconds);
+            }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b118e and f5388aa.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (3)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/LoadCollectionReq.java (1)
  • LoadCollectionReq (25-227)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (7)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
  • AddFieldReq (30-406)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
  • QueryReq (26-277)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (10)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (2)

343-343: Good improvement: Using dbNames.size() instead of magic number.

Replacing the hardcoded 3 with dbNames.size() makes the code more maintainable and eliminates a potential source of bugs if the number of databases changes.


296-297: The pool configuration already includes a GitHub issue reference (issue #1577) explaining the rationale for these settings. No changes needed.

sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

39-41: Per-key config storage change looks good

Making configForKey void and simply updating the ConcurrentMap keeps the API simple and thread-safe for per-key configuration. No issues from a concurrency or correctness standpoint.


93-94: Improved exception logging in validateObject

Passing the exception as a separate parameter to logger.error preserves the stack trace and avoids string concatenation. This is the preferred SLF4J style.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)

44-56: Pool initialization and configuration are sensible for the demo

Static initialization of MilvusClientV2Pool with a small per-key pool and a 5‑second maxBlockWaitDuration matches the documented behavior and keeps the example easy to reason about.


64-166: Collection creation and insert flow are clear and pool-safe

The demo cleanly acquires a client from the pool, handles a potential null, and uses STRONG consistency for the row count check after inserts. The overall pattern (borrow → use → finally return) is sound for example code.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3141-3185: Per-key pool configuration and concurrency test are well-structured

Using explicit minIdlePerKey, maxIdlePerKey, and maxTotalPerKey, plus preparePool(key) and the concurrent getVersion() workload, is a good way to validate the new per-key pool behavior and metrics (getActiveClientNumber, getTotalActiveClientNumber, fetchClientPerSecond).

sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

1-334: LGTM! Well-structured per-key client cache implementation.

The implementation addresses previous review feedback and follows a sound concurrency model:

  • Thresholds are now final constants
  • Single-writer pattern (timer thread only) ensures thread-safe list modifications
  • CopyOnWriteArrayList enables lock-free iteration for readers
  • Ref-counting via ClientWrapper protects clients during retirement
  • Proper lifecycle management with preparePool() and stopTimer()

The QPS-based dynamic scaling logic is well-designed with incremental adjustments to prevent over-reaction to transient load changes.

Based on learnings: The single-writer pattern combined with CopyOnWriteArrayList and ref-counting ensures thread-safe client selection without additional locking.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)

99-121: Well-implemented double-checked locking for cache creation.

The getCache method correctly uses double-checked locking to prevent multiple threads from creating duplicate ClientCache instances for the same key. The lock is only held during the initial cache creation, ensuring minimal performance impact since this is a one-time operation per key.


142-186: Proper lifecycle management for cache and pool cleanup.

The close() and clear() methods correctly handle the lifecycle:

  • Timers are stopped before clearing caches to prevent race conditions
  • Both per-key and global clear operations are properly implemented
  • The comment explaining behavior for unreturned clients is helpful

The order of operations (stop timers → clear caches → clear pool) ensures resources are released cleanly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3187-3194: Add timeout to prevent test hang.

The wait loop lacks a timeout and could hang indefinitely if active clients fail to drain due to a bug or resource leak. This issue was flagged in previous reviews and remains unresolved.

🔎 Suggested fix with bounded wait
+            int maxWaitSeconds = 30;
+            int waited = 0;
-            while (pool.getActiveClientNumber(key) > 1) {
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
                 TimeUnit.SECONDS.sleep(1);
+                waited++;
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
             }
+            Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1, 
+                String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3028-3059: Add null check before returning client to pool.

If pool.getClient(dummyDb) returns null (e.g., when the pool is exhausted), the finally block will pass null to pool.returnClient(dummyDb, client). While this won't cause an exception in ClientCache.returnClient() (it will simply not find a matching wrapper), it's cleaner to guard the call.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

131-136: Silent no-op in returnClient may leak resources.

If returnClient(key, grpcClient) is called for a key that has no cache (e.g., after clear(key) was called or before any client was borrowed for that key), the method silently does nothing. The client is never returned to the underlying pool, potentially leaking the resource.

Consider either:

  1. Logging a warning and attempting to return directly to the underlying pool via clientPool.returnObject(key, grpcClient).
  2. Throwing an IllegalStateException to fail-fast and alert the caller to the programming error.
🔎 Proposed fix (option 1 - return to pool)
     public void returnClient(String key, T grpcClient) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache != null) {
             cache.returnClient(grpcClient);
+        } else {
+            logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key);
+            try {
+                clientPool.returnObject(key, grpcClient);
+            } catch (Exception e) {
+                logger.error("Failed to return client directly to pool for key: {}, exception: ", key, e);
+            }
         }
     }
🧹 Nitpick comments (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)

37-48: Verify necessity of MAX_PRIORITY for scheduler thread.

Setting Thread.MAX_PRIORITY (line 41) for the QPS-checking timer thread may cause it to preempt other application threads excessively. This priority is typically reserved for critical system tasks. Consider using Thread.NORM_PRIORITY or a slightly elevated priority (e.g., NORM_PRIORITY + 1) unless high-priority scheduling is definitively required for the QPS monitoring logic.

🔎 Optional adjustment
             public Thread newThread(@NotNull Runnable r) {
                 Thread t = new Thread(r);
-                t.setPriority(Thread.MAX_PRIORITY); // set the highest priority for the timer
+                t.setPriority(Thread.NORM_PRIORITY + 1); // slightly elevated priority for timer
                 return t;
             }

255-271: Document or resolve TODO about MaxTotal check.

The TODO at line 261 notes that borrowObject() may throw "Timeout waiting for idle object" when the global MaxTotal limit is reached, even if the per-key MaxTotalPerKey limit hasn't been hit. Consider:

  1. Documenting this behavior in the method's Javadoc or class-level comments so callers understand the failure mode.
  2. If feasible, adding a pre-check for the global limit (e.g., clientPool.getNumActive() >= clientPool.getMaxTotal()) before calling borrowObject(), though this may introduce a TOCTOU race.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5388aa and 6ad1e9f.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (3)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and test
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Summary
🔇 Additional comments (4)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

343-343: LGTM: Improved calculation accuracy.

The change from threadCount * repeatRequests * 3 to threadCount * repeatRequests * dbNames.size() correctly derives the count from the actual number of databases rather than a hardcoded value, improving maintainability.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)

128-169: LGTM: Null-safety improved.

The insertData() method now includes an explicit null check at lines 133-136, which addresses the potential null return from pool.getClient(). This ensures the method handles pool exhaustion gracefully.


171-198: LGTM: Retry loop handles pool exhaustion.

The search() method implements a retry loop (lines 175-183) that continues attempting to acquire a client until successful. This is a robust approach to handling pool exhaustion and aligns with the pattern used in ClientPoolExample.

sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)

133-135: Document undocumented reduction in per-key pool defaults.

The default per-key pool sizing has been reduced from the values in version 2.6.5 (maxIdlePerKey: 10→2, maxTotalPerKey: 50→5, minIdlePerKey: 1). This is a breaking change for code expecting the 2.6.5 pool capacity. This reduction is not documented in the CHANGELOG and should be added to release notes. Users relying on implicit defaults or those migrating from 2.6.5 will experience reduced pool capacity and potential blocking without explicit reconfiguration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3141-3194: Bound the active-client drain loop in testClientPool() to avoid hangs.

The loop that waits for pool.getActiveClientNumber(key) > 1 has no timeout, so a bug in pool retirement could hang the test indefinitely.

Example bounded wait
-            while (pool.getActiveClientNumber(key) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
-            }
+            int maxWaitSeconds = 30;
+            int waited = 0;
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                System.out.printf("waiting idle %d, active %d%n",
+                        pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
+            }
+            Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1,
+                    String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

39-41: Fix debug guard vs log level mismatch in create()/destroyObject().

Both create() and destroyObject() guard on logger.isDebugEnabled() but log at INFO, so the guard is ineffective. Switching to DEBUG will make the check meaningful.

Suggested change
         try {
-            if (logger.isDebugEnabled()) {
-                logger.info("PoolClientFactory key: {} creates a client", key);
-            }
+            if (logger.isDebugEnabled()) {
+                logger.debug("PoolClientFactory key: {} creates a client", key);
+            }
@@
     @Override
     public void destroyObject(String key, PooledObject<T> p) throws Exception {
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} closes a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} closes a client", key);
+        }

Also applies to: 56-61, 80-82, 93-94

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3055-3057: Null client passed to returnClient if getClient fails.

If pool.getClient(dummyDb) throws or returns null, the finally block will call pool.returnClient(dummyDb, null). Add a null check before returning.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

131-136: Silent no-op when cache doesn't exist may leak clients.

If returnClient is called for a key that has no cache (e.g., after clear(key) was called or before any getClient call), the client is never returned to the underlying pool. This could lead to resource leaks.

🔎 Proposed fix
     public void returnClient(String key, T grpcClient) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache != null) {
             cache.returnClient(grpcClient);
+        } else {
+            logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key);
+            try {
+                clientPool.returnObject(key, grpcClient);
+            } catch (Exception e) {
+                logger.error("Failed to return client directly to pool, exception: ", e);
+            }
         }
     }
🧹 Nitpick comments (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

49-55: Example config now matches per-key sizing and multi-DB workload.

Using ServerUri in all ConnectConfig builders, tightening .maxIdlePerKey(1) / .maxTotalPerKey(5), and multiplying totals by dbNames.size() in the final summary correctly reflect the per-database loops.

As a small style tweak for a constant, you could make ServerUri private static final and rename it to SERVER_URI.

Also applies to: 295-301, 341-344

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

172-199: Simplify search() client acquisition loop.

The initial pool.getClient(DemoKey) before the while (client == null) is redundant, since the loop immediately re-tries until a client is obtained. You can start with client = null and let the loop handle all attempts.

Possible simplification
-        MilvusClientV2 client = null;
-        try {
-            client = pool.getClient(DemoKey);
-            while (client == null) {
+        MilvusClientV2 client = null;
+        try {
+            while (client == null) {
                 try {
                     // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full.
                     // retry to call until it return a client.
                     client = pool.getClient(DemoKey);
                 } catch (Exception e) {
                     System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage());
                 }
             }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad1e9f and 037dc0c.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
🧬 Code graph analysis (2)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (7)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

194-236: getClient() selection logic matches the intended concurrency model.

Using a CopyOnWriteArrayList for activeClientList with only the scheduler mutating it, combined with per-wrapper ref counts and a lock only for the first client creation, gives a safe, low-contention min-load selection under concurrent access. No extra locking seems needed here. Based on learnings, this matches the single-writer/ref-counted design.

sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)

133-135: Per-key default limits are coherent with pool behavior.

minIdlePerKey = 1, maxIdlePerKey = 2, and maxTotalPerKey = 5 preserve min ≤ maxIdle ≤ maxTotal and match the small-per-key pool sizing used in the examples/tests. No changes needed.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

45-56: Static pool setup and warmup look good.

The shared MilvusClientV2Pool uses conservative per-key limits and preparePool(DemoKey) to pre-create the first client, which is a sensible baseline for the demo’s concurrent workload.

Also applies to: 61-64

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)

99-121: Lazy initialization with proper double-checked locking.

The per-key cache initialization uses correct double-checked locking to prevent multiple ClientCache instances from being created concurrently. The lock ensures only one cache is created per key.


224-230: Metrics query without side effects.

The method correctly uses clientsCache.get(key) to retrieve the cache without creating a new one, avoiding unintended side effects from metrics queries.


69-74: Per-key pool preparation reduces first-call latency.

The preparePool method correctly initializes the per-key cache and pre-warms the pool with minIdlePerKey clients, reducing latency for the first getClient() call.


156-186: Clear operations properly stop per-key timers.

The clear() and clear(String key) methods correctly stop the ClientCache timer threads before clearing. The comments appropriately document the edge case where unreturned clients will receive UNAVAILABLE errors after pool closure.

@yhmo yhmo force-pushed the mm branch 2 times, most recently from 4e3836e to eb41d5e Compare December 26, 2025 08:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

58-60: Fix log level mismatch with isDebugEnabled() guard

Both in create() and destroyObject(), you guard with logger.isDebugEnabled() but log at INFO, so the guard is ineffective and the message always appears at INFO.

Proposed change
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} creates a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} creates a client", key);
+        }
...
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} closes a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} closes a client", key);
+        }

Also applies to: 80-82

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

3028-3057: Guard against returning a null client in Worker

If pool.getClient(dummyDb) ever returns null or throws before client is assigned, the finally block still calls pool.returnClient(dummyDb, client), passing null.

Proposed fix
                 } catch (Exception e) {
                     System.out.printf("request failed: %s%n", e);
                 } finally {
-                    pool.returnClient(dummyDb, client);
+                    if (client != null) {
+                        pool.returnClient(dummyDb, client);
+                    }
                 }
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

17-19: Align QPS threshold comment with constants

The comment still says “Treat qps >= 75 as high, qps <= 50 as low”, but THRESHOLD_INCREASE is 100 and THRESHOLD_DECREASE is 50, and checkQPS() compares directly to those constants.

Suggested fix

Either update the comment:

-    // Treat qps >= 75 as high, qps <= 50 as low
+    // Treat qps >= 100 as high, qps <= 50 as low

or, if 75 was the intended cutoff, change THRESHOLD_INCREASE accordingly so code and docs match.

Also applies to: 72-82

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

131-138: Handle missing cache keys in returnClient to avoid leaking clients

When returnClient(key, grpcClient) is invoked and clientsCache.get(key) returns null (e.g., after clear(key) or clear()), the method only logs a warning and never returns the client to the underlying GenericKeyedObjectPool. That client then stays outside the pool’s lifecycle and is never closed or reused.

More robust `returnClient` behavior
     public void returnClient(String key, T grpcClient) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache != null) {
             cache.returnClient(grpcClient);
         } else {
-            logger.warn("No such key: {}", key);
+            logger.warn("returnClient() called for non-existent cache key: {}, returning directly to pool", key);
+            if (clientPool != null && !clientPool.isClosed() && grpcClient != null) {
+                try {
+                    clientPool.returnObject(key, grpcClient);
+                } catch (Exception e) {
+                    logger.error("Failed to return client directly to pool for key {}: ", key, e);
+                }
+            }
         }
     }

This preserves safety even if callers return a client after a key has been cleared.

Also applies to: 155-169, 176-188

🧹 Nitpick comments (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (2)

27-31: Consider making fetchClientPerSecond volatile for accurate cross-thread reads

fetchClientPerSecond is written by the scheduler thread in checkQPS() and read by other threads via fetchClientPerSecond(). Without volatile (or other synchronization), callers may observe stale values.

Low-impact improvement
-    private float fetchClientPerSecond = 0.0F;
+    private volatile float fetchClientPerSecond = 0.0F;

This keeps the metric best-effort but avoids visibility issues with the Java memory model.

Also applies to: 283-285


255-270: Downgrade log level in fetchFromPool for expected exhaustion cases

fetchFromPool() is an internal helper that may legitimately return null when the pool is exhausted or a borrow times out. Logging every such case at ERROR can be noisy under load.

Suggested adjustment
-        } catch (Exception e) {
-            // the pool might return timeout exception if it could not get a client in PoolConfig.maxBlockWaitDuration
-            // fetchFromPool() is internal use, return null here, let the caller handle.
-            logger.error("Failed to get client, exception: ", e);
-            return null;
-        }
+        } catch (Exception e) {
+            // The pool might time out if it cannot get a client within PoolConfig.maxBlockWaitDuration.
+            // fetchFromPool() is internal; return null and let the caller decide how to react.
+            logger.warn("Failed to get client from pool for key {}, returning null: ", key, e);
+            return null;
+        }

Or even debug if timeouts are considered normal in some scenarios.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 037dc0c and 4e3836e.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
🧬 Code graph analysis (2)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (13)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/v2/client/MilvusClientV2.java (1)
  • MilvusClientV2 (74-1213)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
  • AddFieldReq (30-406)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (2)
  • CreateCollectionReq (36-1112)
  • CollectionSchema (404-529)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
  • HasCollectionReq (22-80)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
  • QueryReq (26-277)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
  • InsertResp (25-81)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
  • QueryResp (27-122)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
  • SearchResp (27-185)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (6)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

39-41: configForKey implementation looks fine

Using a ConcurrentMap for per-key configs with a simple put is appropriate and thread-safe for this use case.


93-94: Good switch to throwable-aware logging in validateObject

Passing the exception as a separate parameter (logger.error("Failed to validate client, exception: ", e);) aligns with SLF4J best practices and preserves stack traces cleanly.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

172-199: Example logic and pool-usage pattern look solid

The search, concurrentSearch, and main methods exercise the new per-key pool API (preparePool, QPS metrics, idle/active counts) in a realistic way: clients are always returned in finally, metrics are sampled while the executor runs, and the pool is properly closed at the end.

Also applies to: 208-256, 258-272

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)

2975-3079: Per-key pool behavior and assertions look appropriate

The test’s use of minIdlePerKey, maxIdlePerKey, maxTotalPerKey, preparePool, and the per-key/total idle+active counters aligns with the new ClientCache/ClientPool behavior and validates the intended scaling and retirement semantics for the dummy DB.

Also applies to: 3090-3095

sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

194-236: Client selection and ref-counting design looks correct

The getClient() logic (min-load selection from activeClientList, with per-wrapper ref counting) combined with the single-writer pattern for list mutations and CopyOnWriteArrayList iteration gives a clean, lock-free read path for callers, with QPS-based scaling handled by the scheduler. The ClientWrapper equality semantics also correctly support matching a raw client instance in returnClient(...).

Based on learnings, this concurrency design for active/retire lists and ref counting is sound.

Also applies to: 287-333

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

65-75: Per-key cache wiring and metrics accessors look correct

The integration with ClientCache (preparePool, getClient, clear, per-key and total idle/active counts, and fetchClientPerSecond) is coherent and provides a clean per-key abstraction over GenericKeyedObjectPool, with timers properly stopped before clearing/closing the pool.

Also applies to: 158-169, 176-188, 196-232

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

55-71: Align debug guard with debug-level logging in create and destroyObject

logger.isDebugEnabled() is used but the guarded calls are logger.info(...), so the guard is ineffective and the messages always log at INFO.

Proposed fix
     @Override
     public T create(String key) throws Exception {
         try {
-            if (logger.isDebugEnabled()) {
-                logger.info("PoolClientFactory key: {} creates a client", key);
-            }
+            if (logger.isDebugEnabled()) {
+                logger.debug("PoolClientFactory key: {} creates a client", key);
+            }
@@
     @Override
     public void destroyObject(String key, PooledObject<T> p) throws Exception {
-        if (logger.isDebugEnabled()) {
-            logger.info("PoolClientFactory key: {} closes a client", key);
-        }
+        if (logger.isDebugEnabled()) {
+            logger.debug("PoolClientFactory key: {} closes a client", key);
+        }

Also applies to: 78-85

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3187-3190: Bound the drain-wait loop to avoid hanging the test suite

The loop waiting for pool.getActiveClientNumber(key) > 1 has no timeout. If active clients never drop (due to a regression), this test will hang indefinitely.

Proposed bounded wait with assertion
-            while (pool.getActiveClientNumber(key) > 1) {
-                TimeUnit.SECONDS.sleep(1);
-                System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
-            }
+            int maxWaitSeconds = 30;
+            int waited = 0;
+            while (pool.getActiveClientNumber(key) > 1 && waited < maxWaitSeconds) {
+                TimeUnit.SECONDS.sleep(1);
+                waited++;
+                System.out.printf("waiting idle %d, active %d%n",
+                        pool.getIdleClientNumber(key), pool.getActiveClientNumber(key));
+            }
+            Assertions.assertTrue(pool.getActiveClientNumber(key) <= 1,
+                    String.format("Active clients did not drain within %d seconds", maxWaitSeconds));
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

3036-3057: Add null check before returning client in finally block.

If pool.getClient(dummyDb) returns null (e.g., pool exhausted or cache creation failed), the finally block will call pool.returnClient(dummyDb, null). While the current returnClient implementation may handle this gracefully (line 136 in ClientPool.java logs a warning for missing cache), passing null is semantically incorrect and could mask pool issues.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }

3081-3088: Add timeout to prevent infinite hang in drain loop.

The loop waiting for pool.getActiveClientNumber(dummyDb) > 1 has no upper bound. If the pool's retirement logic has a bug and clients don't converge to the expected state, this test will hang indefinitely and block CI/CD pipelines.

🔎 Proposed fix
+            long waitStart = System.currentTimeMillis();
             while (pool.getActiveClientNumber(dummyDb) > 1) {
                 TimeUnit.SECONDS.sleep(1);
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb));
+                if (System.currentTimeMillis() - waitStart > 60_000L) {
+                    Assertions.fail("Pool did not converge to a single active client within 60s. Current state: idle=" 
+                        + pool.getIdleClientNumber(dummyDb) + ", active=" + pool.getActiveClientNumber(dummyDb));
+                }
             }
sdk-core/src/main/java/io/milvus/pool/ClientPool.java (2)

21-21: Initialize cacheMapLock at field declaration to prevent NPE.

cacheMapLock is uninitialized in the protected no-arg constructor (lines 23-25). Any subclass or code path using this constructor will encounter a NullPointerException when cacheMapLock.lock() is invoked at line 108 in getCache(). Initialize the lock at declaration:

🔎 Proposed fix
-    private Lock cacheMapLock;
+    private final Lock cacheMapLock = new ReentrantLock(true);

Then remove the redundant initialization from line 30 in the 2-arg constructor.


131-138: Return client to underlying pool when cache is missing.

If returnClient is called for a key whose cache was cleared (e.g., via clear(key)), the client is neither returned to the pool nor closed, causing a resource leak. While this scenario is unlikely in correct usage, defensive programming suggests falling back to the underlying pool:

🔎 Proposed fix
     public void returnClient(String key, T grpcClient) {
         ClientCache<T> cache = clientsCache.get(key);
         if (cache != null) {
             cache.returnClient(grpcClient);
         } else {
-            logger.warn("No such key: {}", key);
+            logger.warn("No cache for key: {}, returning client directly to pool", key);
+            try {
+                clientPool.returnObject(key, grpcClient);
+            } catch (Exception e) {
+                logger.error("Failed to return client to pool for key: {}", key, e);
+            }
         }
     }
🧹 Nitpick comments (5)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

27-31: Consider making fetchClientPerSecond volatile for clearer cross-thread visibility

fetchClientPerSecond is written from the scheduler thread and read from others without synchronization. Although float writes are atomic and eventual consistency is acceptable for metrics, marking it volatile would guarantee up-to-date reads with negligible cost.

Proposed tweak
-    private float fetchClientPerSecond = 0.0F;
+    private volatile float fetchClientPerSecond = 0.0F;

Also applies to: 95-97, 283-285

examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

49-55: Treat ServerUri as an actual constant and follow Java naming conventions

public static String ServerUri is effectively a constant but mutable and non-idiomatic. Making it private static final and renaming to SERVER_URI improves clarity and prevents accidental modification.

Proposed change
-    public static String ServerUri = "http://localhost:19530";
+    private static final String SERVER_URI = "http://localhost:19530";
@@
-                    .uri(ServerUri)
+                    .uri(SERVER_URI)
@@
-                .uri(ServerUri)
+                .uri(SERVER_URI)

Also applies to: 97-101, 290-293

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (2)

172-185: Include the initial getClient call in the retry loop in search()

Right now the first pool.getClient(DemoKey) is outside the retry loop, so an exception there won’t be retried or logged like subsequent attempts. Folding that initial call into the while (client == null) loop keeps behavior consistent for all attempts.

Proposed refactor
-        try {
-            client = pool.getClient(DemoKey);
-            while (client == null) {
-                try {
-                    // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full.
-                    // retry to call until it return a client.
-                    client = pool.getClient(DemoKey);
-                } catch (Exception e) {
-                    System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage());
-                }
-            }
+        try {
+            while (client == null) {
+                try {
+                    // getClient() might return null if it exceeds the borrowMaxWaitMillis when the pool is full.
+                    // retry to call until it returns a client.
+                    client = pool.getClient(DemoKey);
+                } catch (Exception e) {
+                    System.out.printf("Failed to get client, will retry, error: %s%n", e.getMessage());
+                }
+            }

246-253: Optional: add a timeout when waiting for active clients to retire

The loop waiting for pool.getActiveClientNumber(DemoKey) > 1 has no upper bound, so a bug in retirement logic could make the demo hang. A simple max-wait counter (or time-based limit) with a message would keep behavior predictable.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (1)

69-74: Consider documenting exception behavior in preparePool.

While the null check at line 71 guards against a missing cache, getCache(key) (lines 99-121) should never return null under normal circumstances—it creates a cache if absent. If ClientCache construction throws an exception, it will propagate to the caller. Consider documenting this behavior or adding exception handling if silent failure is preferred.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e3836e and eb41d5e.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
🧬 Code graph analysis (5)
examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (1)
  • CreateCollectionReq (36-1112)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (9)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
  • HasCollectionReq (22-80)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
  • QueryReq (26-277)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
  • InsertResp (25-81)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
  • QueryResp (27-122)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
  • SearchResp (27-185)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Summary
🔇 Additional comments (11)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)

194-236: Min-load selection and ref-counting look correct with the single-writer + COW design

The getClient() implementation combined with ClientWrapper ref-counting is consistent with the single-writer pattern on activeClientList and retireClientList; iteration over CopyOnWriteArrayList without extra locking is safe here. The min-load selection logic is straightforward and appropriate for the expected small active-list size.

Based on learnings, this matches the intended concurrency model.

examples/src/main/java/io/milvus/v2/ClientPoolExample.java (1)

295-301: Updated pool sizing and final accounting look consistent

Using a tighter PoolConfig (maxIdlePerKey(1), maxTotalPerKey(5)) and computing totals with dbNames.size() correctly aligns resource usage and reported operation counts with the per-database workload.

Also applies to: 341-344

sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)

132-143: Conservative per-key defaults are reasonable and consistent with usage

The new Builder defaults (minIdlePerKey = 1, maxIdlePerKey = 2, maxTotalPerKey = 5) match the examples/tests and should reduce memory/connection footprint out of the box while still allowing tuning via setters.

sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (1)

39-53: configForKey as a void setter matches per-key config usage

Changing configForKey to a void method that just updates configForKeys makes it a straightforward per-key configuration setter and simplifies the API; this aligns with how it’s used from the pools.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3137-3153: Per-key pool configuration and assertions in testClientPool look correct

The test’s use of minIdlePerKey, maxIdlePerKey, and maxTotalPerKey together with per-key/total active-idle assertions validates the new pool behavior nicely and exercises the adaptive scaling logic under load.

Also applies to: 3183-3185, 3191-3194

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

3060-3072: LGTM: Proper ExecutorService lifecycle management.

The executor is correctly shut down and awaits termination with a reasonable timeout. The test fails explicitly if tasks don't complete within 100 seconds, preventing CI hangs.


3085-3092: Pool state assertions validate per-key behavior correctly.

The assertions confirm that the pool has converged to the expected state with maxIdlePerKey idle clients and 1 active client remaining. This validates the per-key pool lifecycle and retirement logic.

sdk-core/src/main/java/io/milvus/pool/ClientPool.java (4)

27-46: Constructor properly initializes per-key pool configuration.

The constructor correctly sets up the GenericKeyedObjectPoolConfig<T> with all necessary parameters from PoolConfig, including per-key limits, eviction settings, and test-on-borrow/return flags.


99-121: Double-checked locking pattern correctly prevents redundant cache creation.

The implementation ensures only one ClientCache instance is created per key, even under concurrent access. The lock is held only during initialization, minimizing contention. The pattern is sound once the cacheMapLock field initialization issue (line 21) is resolved.

Based on learnings: The ClientCache implementation uses CopyOnWriteArrayList with a single-writer pattern (timer thread only) for modifying client lists, ensuring thread-safe client selection in getClient() without additional locking.


159-169: Cache cleanup properly stops timers before clearing pool.

The clear() method correctly stops all per-key timers before clearing the cache map and underlying pool. This prevents timer threads from attempting to access cleared caches. The comment about unreturned clients receiving UNAVAILABLE errors provides useful context for understanding the cleanup semantics.


226-232: Metrics method correctly avoids side effects.

The fetchClientPerSecond method properly uses clientsCache.get(key) instead of getCache(key), ensuring no cache is created as a side effect of querying metrics. Returning 0.0F for a missing cache is semantically correct for a metrics query.

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

58-60: Inconsistent log level: checking debug but logging info.

This issue was already flagged in previous review.


80-82: Same log level inconsistency as in create().

This issue was already flagged in previous review.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)

3187-3190: Wait loop without timeout was previously flagged.

This concern was raised in a previous review suggesting a timeout to prevent indefinite hangs.

sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (2)

3036-3057: Null client can be passed to returnClient if getClient fails.

If pool.getClient(dummyDb) throws an exception or returns null, the finally block at line 3056 will call pool.returnClient(dummyDb, null). Add a null check before returning the client.

🔎 Proposed fix
                     } finally {
-                        pool.returnClient(dummyDb, client);
+                        if (client != null) {
+                            pool.returnClient(dummyDb, client);
+                        }
                     }

3081-3088: Add a timeout to the post-workload drain loop.

The loop waiting for pool.getActiveClientNumber(dummyDb) > 1 has no timeout. If the active count doesn't converge to 1 (e.g., due to a bug in retirement logic), this test can hang indefinitely and block CI/CD pipelines.

🔎 Proposed fix
+            long waitStart = System.currentTimeMillis();
             while (pool.getActiveClientNumber(dummyDb) > 1) {
                 TimeUnit.SECONDS.sleep(1);
                 System.out.printf("waiting idle %d, active %d%n", pool.getIdleClientNumber(dummyDb), pool.getActiveClientNumber(dummyDb));
+                if (System.currentTimeMillis() - waitStart > 60_000L) {
+                    Assertions.fail("Pool did not converge to a single active client within 60s");
+                }
             }
🧹 Nitpick comments (1)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (1)

246-252: Add timeout to prevent indefinite waiting.

The while loop at line 249 polls until active clients drain to 1, but has no timeout. If clients fail to become idle due to a bug, the demo will hang indefinitely. Consider adding a bounded wait for robustness.

🔎 Proposed fix with timeout
             // just demo the pool can automatically destroy idle clients, you can directly close the pool without waiting
             // it in practice.
+            int maxWaitSeconds = 60;
+            int waited = 0;
-            while (pool.getActiveClientNumber(DemoKey) > 1) {
+            while (pool.getActiveClientNumber(DemoKey) > 1 && waited < maxWaitSeconds) {
                 TimeUnit.SECONDS.sleep(1);
+                waited++;
                 printPoolState();
             }
+            if (waited >= maxWaitSeconds) {
+                System.out.println("Warning: Timed out waiting for clients to become idle");
+            }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb41d5e and 1ff9e91.

📒 Files selected for processing (8)
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • examples/src/main/java/io/milvus/v2/ClientPoolExample.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java
  • sdk-core/src/main/java/io/milvus/pool/PoolConfig.java
  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.
📚 Learning: 2025-12-26T03:01:52.831Z
Learnt from: yhmo
Repo: milvus-io/milvus-sdk-java PR: 1731
File: sdk-core/src/main/java/io/milvus/pool/ClientCache.java:194-233
Timestamp: 2025-12-26T03:01:52.831Z
Learning: In ClientCache<T> (sdk-core/src/main/java/io/milvus/pool/ClientCache.java), only the timer thread modifies activeClientList and retireClientList, establishing a single-writer pattern. Combined with CopyOnWriteArrayList and ref-counting via ClientWrapper, this ensures thread-safe client selection in getClient() without additional locking during iteration.

Applied to files:

  • sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java
  • examples/src/main/java/io/milvus/v2/ClientPoolDemo.java
  • sdk-core/src/main/java/io/milvus/pool/ClientPool.java
  • sdk-core/src/main/java/io/milvus/pool/ClientCache.java
🧬 Code graph analysis (4)
sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (11)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddFieldReq.java (1)
  • AddFieldReq (30-406)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (2)
  • CreateCollectionReq (36-1112)
  • CollectionSchema (404-529)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/HasCollectionReq.java (1)
  • HasCollectionReq (22-80)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
  • QueryReq (26-277)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
  • InsertResp (25-81)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/QueryResp.java (1)
  • QueryResp (27-122)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
  • SearchResp (27-185)
sdk-core/src/test/java/io/milvus/v2/client/MilvusClientV2DockerTest.java (1)
sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)
  • PoolConfig (5-210)
sdk-core/src/main/java/io/milvus/pool/ClientCache.java (1)
sdk-core/src/main/java/io/milvus/v2/exception/MilvusClientException.java (1)
  • MilvusClientException (22-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (13)
sdk-core/src/main/java/io/milvus/pool/PoolClientFactory.java (2)

93-93: Good improvement: parameterized logging for exception.

The exception is now passed as a separate parameter rather than concatenated, which follows SLF4J best practices and ensures the full stack trace is logged.


39-41: The method signature change appears safe. All usages of configForKey throughout the codebase (test files, examples, and the delegation in ClientPool.java) call the method without expecting or capturing a return value, making them compatible with the new void return type.

Likely an incorrect or invalid review comment.

sdk-core/src/main/java/io/milvus/pool/PoolConfig.java (1)

132-136: Significant reduction in default pool sizing values.

The new defaults are considerably more restrictive:

  • minIdlePerKey: 0 → 1
  • maxIdlePerKey: 10 → 2
  • maxTotalPerKey: 50 → 5

This is a breaking change for users relying on default values. Existing applications using PoolConfig.builder().build() without explicit sizing will now have much smaller pools, which could cause increased latency under load if clients are exhausted.

Consider documenting this change prominently in release notes.

examples/src/main/java/io/milvus/v2/ClientPoolExample.java (3)

50-50: LGTM: Constant naming follows Java conventions.

Renaming serverUri to ServerUri follows Java naming conventions for public static fields used as constants.


295-301: Pool configuration aligns with new defaults.

The example now uses more restrictive pool settings (maxIdlePerKey=1, maxTotalPerKey=5) which align with the updated PoolConfig defaults and demonstrate realistic per-key pool usage.


342-343: Correct timing calculation for per-database operations.

The formula threadCount * repeatRequests * dbNames.size() correctly accounts for operations across all databases, fixing the previous calculation that only counted operations for a single database.

examples/src/main/java/io/milvus/v2/ClientPoolDemo.java (4)

43-67: LGTM: Static pool initialization with warmup.

The static block properly initializes the pool with appropriate configuration and calls preparePool(DemoKey) to pre-create clients according to minIdlePerKey, reducing first-call latency. The exception handling correctly wraps checked exceptions in RuntimeException.


69-127: Null client handling in finally block.

The finally block at lines 124-126 calls pool.returnClient(DemoKey, client) even when client could be null (from early return at line 77). Based on previous review discussion, returnClient safely handles null clients, so this is acceptable.


172-199: Retry loop for client acquisition is appropriate.

The retry loop at lines 176-184 handles the case where getClient() returns null or throws an exception when the pool is exhausted, ensuring the demo doesn't fail under heavy load. This pattern is consistent with ClientPoolExample.


258-272: Demo main method is well-structured.

The main method demonstrates a complete workflow: collection creation, data insertion, two rounds of concurrent searches with different thread counts, and proper pool cleanup. This provides a good reference for users.

sdk-core/src/test/java/io/milvus/client/MilvusClientDockerTest.java (3)

3141-3153: LGTM: Per-key pool configuration and preparation.

The test properly configures per-key pool limits and calls preparePool(key) before testing. The assertion at line 3153 correctly verifies that minIdlePerKey clients are pre-created as active clients in the pool.


3159-3172: Client handling in concurrent test loop.

The try-catch-finally pattern correctly handles client acquisition and return. Per previous discussion, returnClient safely handles null clients.


3183-3194: Good validation of pool metrics.

The assertions properly validate:

  • Active client count matches maxTotalPerKey after concurrent load
  • Total active/idle counts via new aggregate methods
  • QPS metric is logged for observability
  • Final state shows expected idle and active counts after drain

@yhmo yhmo added the lgtm label Dec 26, 2025
@sre-ci-robot sre-ci-robot merged commit 160c39c into milvus-io:master Dec 26, 2025
5 checks passed
@yhmo yhmo deleted the mm branch December 26, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants