[BUG-EXTERNAL] Excessive $NextInner objects (EH and SB)

## Root causing

Cx deployed An application using SB session to 10 pods. A heap dump was taken from one pod after six days of execution, and showing excessive allocation of "reactor.core.publisher.NextProcessor$NextInner" type

<img width="805" alt="HG" src="https://user-images.githubusercontent.com/1471612/162843781-89c0ef62-a6eb-46ba-b69f-c6de952f4236.png">

Out of 1,065,540, the 1,065,504 "NextInner" instances retains "MonoCacheTime+MonoCacheTime$CoordinatedSubscriber" instances, those also appeared on the top of the above histogram.

<img width="813" alt="MCT_Inbound" src="https://user-images.githubusercontent.com/1471612/162843923-12bab2bb-3047-4670-85f3-9b07826c492c.png">

Such ~1M instances of these types are leaking symptoms.

These 1,065,504 "NextInner" objects are retained by one "reactor.core.publisher.NextProcessor" object.

<img width="1267" alt="NP_26" src="https://user-images.githubusercontent.com/1471612/162844107-0b027504-69ca-4747-8856-e6c278347779.png">

<img width="1270" alt="NP_NPI" src="https://user-images.githubusercontent.com/1471612/162844164-c8b3f65f-3cdc-4714-bfb9-64342c703391.png">

that "NextProcessor" is retained by the private variable `shutdownSignalSink` (type=Sink.One) in ReactorConnection.


<img width="1303" alt="SSS_Retained" src="https://user-images.githubusercontent.com/1471612/162844259-07928bbf-3c2f-4874-9dba-3c3667857819.png">

While there are multiple places library subscribe to this `shutdownSignalSink`, the only place that exposes it via the intermediate `MonoCacheTime` operator (as appears in the second figure) by applying `cache()` operator is ReactorConnection::getShutdownSignal() [method](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorConnection.java#L186-L189)

```java

private final Sinks.One<AmqpShutdownSignal> shutdownSignalSink = Sinks.one();

@Override
public Flux<AmqpShutdownSignal> getShutdownSignals() {
    return shutdownSignalSink.asMono().cache().flux();
}
```

Scanning the source code for reference to the method `getShutdownSignals()`, the only place it gets referenced and subscribed is in ReactorReceiver [constructor](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorReceiver.java#L178-L181)

```java
 amqpConnection.getShutdownSignals().flatMap(signal -> {
    logger.verbose("Shutdown signal received.");
    return closeAsync("Connection shutdown.", null);
}).subscribe());
```

But the subscription is DISPOSED in the ReactorReceiver close [api](https://github.com/Azure/azure-sdk-for-java/blob/2b882d649ea8e1ef6969751589194a96315eb32f/sdk/core/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorReceiver.java#L345).

Had it a bug in not closing ReactorReceiver, there would be ~1M instances of ReactorReceiver, but actually, there are only 250 of them.

<img width="1451" alt="SBRR" src="https://user-images.githubusercontent.com/1471612/162872781-9ea8f85e-718e-4246-b2a6-25e53a98c744.png">

It proves that ReactorReceivers are not getting leaked and are disposed correctly. It leads us to check the implementation of Sink.One and identified a problem in Sink.One where it continues to retain the subscriber even after the disposition. A git-ticket is opened in reactor-core repro "Memory leak in SinkOneMulticast" https://github.com/reactor/reactor-core/issues/3001, and the fix coming in reactor-core-3.4.17

## Observation_1 (Cx action item)

You might have noticed that the reactor git-ticket mention about the type `SinkOneMulticast`, but in heap-dump, it's a different type `NextProcessor`.

The reason for that is - Cx application seems to have one or more dependency that brings in a relatively old version of the reactor-core library (~7 months behind). The Azure ServiceBus SDK is defined to use the recent version of reactor-core-3.4.14. 

Back in Sept 2021, the use of `NextProcessor` in Sink.One was replaced with `SinkOneMulticast`; in this [commit](https://github.com/reactor/reactor-core/commit/472b1698ee29c9fcf5f3a23fc044f9b9b01e5679#diff-6250051e63f4ea53566d3a0ce7fa02c5d752fe5e0e1eb54c5668790c28738113) 

Cx needs to analyze the dependencies and align the versions of shared libraries (e.g., reactor-core, reactor-netty, etc..) by upgrading dependencies so that the application is ready to pick when reactor-core-3.4.17 is available.

## Observation_2 (Cx action item)

The 1,065,504 instances of "NextProcessor$NextInner" means, over the period of 6 days, around ~1M ReactorReceiver objects were created and closed. It means ~125 ReactorReceiver instances created-disposed per minute.

The only reason for such a massive churn of these objects is - that the consuming application is trying to acquire too many sessions from the service, but the producer application does not create enough sessions. Due to this, SB service DETACHes those unnecessary receivers after a 1 minute timeout. This indicates that maxConcurrentSession in the consumer application is too large. The Cx should tune this configuration according to the expected load.

## Observation_3: (Sdk action-item)

The getShutdownSignals() API uses `cache()` operator.  We don't have to use the cache operator here, Sink.One is capable of remembering the last signal and replaying it. While `cache()` doesn't directly contribute to any leak, we could remove it and save allocations.

And the azure-core should upgrade to reactor-core 3.4.17 once released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG-EXTERNAL] Excessive $NextInner objects (EH and SB) #28235

Root causing

Observation_1 (Cx action item)

Observation_2 (Cx action item)

Observation_3: (Sdk action-item)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG-EXTERNAL] Excessive $NextInner objects (EH and SB) #28235

Description

Root causing

Observation_1 (Cx action item)

Observation_2 (Cx action item)

Observation_3: (Sdk action-item)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions