Skip to content

Conversation

@kiryazovi-redis
Copy link
Collaborator

Summary

This PR adds comprehensive functional testing for Redis Enterprise maintenance event handling in Lettuce, including connection handoff, timeout relaxation, and notification protocol validation. The test suite validates proper client behavior during database migrations and failover scenarios.

Key Changes

New Test Infrastructure

  • MaintenanceNotificationConnectionTest

    • Comprehensive connection lifecycle testing during maintenance events
    • Tests for all 5 notification types: MOVING, MIGRATING, MIGRATED, FAILING_OVER, FAILED_OVER
    • Validates connection rebinding with different endpoint types (EXTERNAL_IP, EXTERNAL_FQDN, NONE)
    • Connection handoff verification with dual-connection scenarios
    • Traffic resumption testing with async GET/SET operations
    • BLPOP timeout unblocking during MOVING events with connection closure
    • EventBus monitoring for connection state transitions
  • ConnectionEventBusMonitoringUtil

    • Utility for tracking connection lifecycle events (connected, disconnected, activated, deactivated)
    • Supports connection handoff detection and verification
    • Provides waiting mechanisms for connection state transitions

Enhanced Tests

  • RelaxedTimeoutConfigurationTest

    • Comprehensive 4-phase MOVING timeout test: relaxed during MIGRATING → unrelaxed after MIGRATED → relaxed during MOVING → unrelaxed after MOVING completion
    • Failover timeout test: relaxed during FAILING_OVER → unrelaxed after FAILED_OVER
    • Optimized test execution time by 50%
    • Connection handoff testing integrated
  • ConnectionTestUtil

    • Added support for MaintenanceAwareExpiryWriter delegation
    • Improved channel extraction for maintenance-aware connections
  • RedisEnterpriseConfig

    • Removed hardcoded configurations
    • Support for 6 nodes and multiple databases
    • Dynamic cluster configuration refresh

Removed

  • MaintenanceNotificationTest - All functionality migrated to MaintenanceNotificationConnectionTest with improved test structure
  • FaultInjectionClientUnitTest - Consolidated into integration tests

Test Coverage

The test suite validates:

  1. Connection Rebinding - Proper connection handoff to new endpoints during maintenance
  2. Endpoint Type Support - Correct handling of EXTERNAL_IP, EXTERNAL_FQDN, and NONE endpoint configurations
  3. Traffic Continuity - Traffic resumes correctly after MOVING events with async operations
  4. Notification Protocol - Validation of push notification format for all 5 event types
  5. Timeout Relaxation - Dynamic timeout adjustment during maintenance windows
  6. Timeout De-relaxation - Proper timeout restoration after maintenance completion
  7. Connection Lifecycle - EventBus monitoring of connection state transitions
  8. BLPOP Handling - Blocking operations correctly unblocked during connection transitions

Performance Improvements

  • Reduced test execution time from 2 hrs to ~20-ish mins
    • Removal of unnecessary cleanup operations
    • Elimination of hardcoded sleeps in favor of functional await conditions
    • Optimized test setup and teardown

Testing

All tests pass successfully with proper logging enabled for debugging purposes.

…rise maintenance events

- Add ConnectionTesting class with 9 test scenarios for maintenance handoff behavior
- Test old connection graceful shutdown during MOVING operations
- Validate traffic resumption with autoconnect after handoff
- Verify maintenance notifications only work with RESP3 protocol
- Test new connection establishment during migration and bind phases
- Add memory leak validation for multiple concurrent connections
- Include TLS support testing for maintenance events
- Replace .supportMaintenanceEvents(true) with MaintenanceEventsOptions.enabled()
- Add comprehensive monitoring and validation of connection lifecycle

Tests cover CAE-1130 requirements for Redis Enterprise maintenance event handling
including connection draining, autoconnect behavior, and notification delivery.
…IONS

- connectionHandshakeIncludesEnablingNotificationsTest: Verifies all 5 notification types (MOVING, MIGRATING, MIGRATED, FAILING_OVER, FAILED_OVER) are received when maintenance events are enabled
- disabledDontReceiveNotificationsTest: Verifies no notifications received when maintenance events are disabled
- clientHandshakeWithEndpointTypeTest: Tests CLIENT MAINT_NOTIFICATIONS with 'none' endpoint type (nil IP scenario)
- clientMaintenanceNotificationInfoTest: Verifies CLIENT MAINT_NOTIFICATIONS configuration with moving-endpoint-type

Based on CLIENT MAINT_NOTIFICATIONS implementation from commit bd408cf
- Update push notification patterns to include sequence numbers (4-element format)
- Fix MOVING notification parsing to handle new address format with sequence and time
- Update MIGRATING, MIGRATED, FAILING_OVER, and FAILED_OVER patterns with sequence numbers
- Improve FaultInjectionClient status handling: change from 'pending' to 'running' checks
- Enhance JSON response parsing with better output field handling and debugging
- Remove deprecated maintenance sequence functionality and associated unit test
- Add test phase isolation to prevent cleanup notification interference
- Extend monitoring timeout from 2 to 5 minutes for longer maintenance operations
- Add @AfterEach cleanup to restore cluster state between tests
- Remove hardcoded optimal node selection logic in RedisEnterpriseConfig

This aligns with the updated Redis Enterprise maintenance events specification
and improves test reliability by handling the new notification protocol format.
…and-connection-testing-of-maint-events-2

Ci fix functional handoff and connection testing of maint events 2
…ow, also implement more fixes and improvements
…nctionally correct, enable logging for testing
@jit-ci
Copy link

jit-ci bot commented Oct 1, 2025

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@kiryazovi-redis kiryazovi-redis requested review from Copilot and removed request for Copilot October 1, 2025 17:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive functional testing for Redis Enterprise maintenance event handling in Lettuce, with a focus on connection lifecycle management, notification validation, and timeout relaxation. The test suite validates proper client behavior during database migrations and failover scenarios.

Key changes:

  • New comprehensive test class MaintenanceNotificationConnectionTest for testing all 5 maintenance notification types with connection management
  • Enhanced timeout configuration tests with 4-phase validation and optimized execution
  • Complete removal of legacy test classes and their consolidation into improved integration tests

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/test/resources/log4j2-test.xml Enhanced logging configuration for maintenance-aware components
src/test/java/io/lettuce/test/ConnectionTestUtil.java Added MaintenanceAwareExpiryWriter support and connection verification utilities
src/test/java/io/lettuce/scenario/RelaxedTimeoutConfigurationTest.java Comprehensive 4-phase timeout testing with optimized execution and connection handoff
src/test/java/io/lettuce/scenario/RedisEnterpriseConfig.java Dynamic cluster configuration with 6-node support and endpoint-aware operations
src/test/java/io/lettuce/scenario/MaintenancePushNotificationMonitor.java Simplified monitoring without periodic pings and correct notification format parsing
src/test/java/io/lettuce/scenario/MaintenanceNotificationTest.java Removed - functionality migrated to MaintenanceNotificationConnectionTest
src/test/java/io/lettuce/scenario/MaintenanceNotificationConnectionTest.java New comprehensive test suite for all maintenance scenarios with connection lifecycle validation
src/test/java/io/lettuce/scenario/FaultInjectionClientUnitTest.java Removed - consolidated into integration tests
src/test/java/io/lettuce/scenario/FaultInjectionClient.java Enhanced with endpoint-aware operations and improved status checking
src/test/java/io/lettuce/scenario/ConnectionEventBusMonitoringUtil.java New utility for monitoring connection state transitions via EventBus
Comments suppressed due to low confidence (1)

src/test/java/io/lettuce/scenario/MaintenanceNotificationConnectionTest.java:1

  • Using fixed Thread.sleep() delays in tests can lead to flaky behavior and unnecessarily slow test execution. Consider using condition-based waiting or shorter, more appropriate delays for these specific scenarios.
package io.lettuce.scenario;

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

log.info("Testing unrelaxed timeouts after MIGRATED notification...");
log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...",
2 * RELAXED_TIMEOUT_ADDITION.toMillis());
Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Thread.sleep() for timing in tests can lead to flaky behavior. Consider using Awaitility or similar polling mechanisms to wait for the actual condition (timeout un-relaxation) rather than relying on fixed sleep durations.

Copilot uses AI. Check for mistakes.
Comment on lines +661 to +664
log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...",
2 * RELAXED_TIMEOUT_ADDITION.toMillis());
Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer
log.info("Grace period expired, starting traffic to verify unrelaxed timeouts...");
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another instance of Thread.sleep() that could make tests flaky. This should be replaced with a condition-based wait mechanism to ensure the timeout un-relaxation has actually taken effect before proceeding with verification.

Suggested change
log.info("Waiting for grace period ({}ms) to allow timeout un-relaxation to take effect...",
2 * RELAXED_TIMEOUT_ADDITION.toMillis());
Thread.sleep(2 * RELAXED_TIMEOUT_ADDITION.toMillis()); // Wait for grace period + 100ms buffer
log.info("Grace period expired, starting traffic to verify unrelaxed timeouts...");
log.info("Waiting for timeout un-relaxation to take effect (up to {}ms)...",
2 * RELAXED_TIMEOUT_ADDITION.toMillis());
waitForTimeoutUnrelaxed(context, 2 * RELAXED_TIMEOUT_ADDITION.toMillis());
log.info("Timeout un-relaxation detected, starting traffic to verify unrelaxed timeouts...");

Copilot uses AI. Check for mistakes.
}

// Throttle traffic to ~1000 ops/sec to avoid memory pressure
Thread.sleep(1);
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Thread.sleep(1) in a tight loop for traffic throttling is inefficient and imprecise. Consider using a proper rate limiter or a more accurate timing mechanism for controlling traffic generation at ~1000 ops/sec.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@tishun tishun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small notes only

import io.netty.util.internal.logging.InternalLogger;
import io.netty.util.internal.logging.InternalLoggerFactory;

public class ConnectionEventBusMonitoringUtil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the "Util" pattern. Utility classes quickly become swiss-army-knives and violate the single responsibility principle.

Could we just name it ConnectionEventBusMonitor or something similar?

Comment on lines +92 to +100
} catch (Exception e) {
if (event instanceof ConnectedEvent) {
return "connected-" + ((ConnectedEvent) event).remoteAddress().toString();
} else if (event instanceof DisconnectedEvent) {
return "disconnected-" + ((DisconnectedEvent) event).remoteAddress().toString();
} else {
return event.getClass().getSimpleName() + "-" + System.currentTimeMillis();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of if-else and instanceof can we use multiple catch statements?

expectedEndpoint, cleanCurrentEndpoint).isTrue();
}
} else {
log.warn("⚠ Could not verify endpoint - currentRemoteAddress: {}, expectedEndpoint: {}", currentRemoteAddress,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail the test?

log.info("Second connection created and monitoring setup completed");

} catch (Exception e) {
log.error("Failed to create second connection: {}", e.getMessage(), e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here - should the test fail?

/**
* Configuration holder for dynamically discovered Redis Enterprise cluster information.
*/
public class RedisEnterpriseConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RedisEnterpriseConfig - a "Config" object is typically a POJO that has some configuration details in it. In contrast this class seems to be configuring the RS
So I'd put some other end to the name like "Configurer" or idk

@kiryazovi-redis kiryazovi-redis merged commit e29bff2 into redis:main Oct 2, 2025
11 checks passed
@kiryazovi-redis
Copy link
Collaborator Author

I will take care of Tisho's comments on the topic in next PR.
LIkely will take more comments from Ivo also.

@ggivo ggivo added the type: task A general task label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: task A general task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants