[Tiered Caching] Performance improvement for TieredCaching 

### Is your feature request related to a problem? Please describe

The current implementation of TieredCaching.computeIfAbsent() is performing poorly which was discovered while running some performance benchmarks. Before we delve into the root cause, lets me elaborate on the current implementation.

**Current implementation:**

As of now TieredCaching implementation has onheap and disk tier which are individually thread safe. But we need to ensure that the TieredCache as a whole is thread safe and doesn't allow any key to be cached onto both tiers.

So we implemented a single global read/write lock([link](https://github.com/opensearch-project/OpenSearch/blob/main/modules/cache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCache.java#L80-L82)) which ensures that any write operation is performed on TieredCache by only one thread at a time and allows multiple readers to access any cache via read lock.

**Performance Issues with currrent approach:**

1. Considering we have a **single write/read lock**, it makes the TieredCache a single segmented cache and doesn't provide write throughout benefits of a segmented cache(like the underlying onheap cache which has 256 segments). Due to this only one thread can write to cache at a time and impacts write/read throughput.
2. In case of any cache miss, we are **recomputing the query under the write lock**([link](https://github.com/opensearch-project/OpenSearch/blob/main/modules/cache-common/src/main/java/org/opensearch/cache/common/tier/TieredSpilloverCache.java#L195)) which makes the other thread wait and makes the performance really worse. 



### Describe the solution you'd like

To address above performance issues:

#### 1. Firstly we need to move out the query recompute part(during cache miss) outside the write lock in computeIfAbsent

This can be probably achieved by below(reusing Cache.java logic).

- Create a concurrent hashmap of `Map<Key, CompletableFuture>`. This will temporally hold the queries for short duration.
- Multiple requests for same key A lands in computeIfAbsent. key A is not present in Cache yet.
- Only one thread will succeed putting a future object in the map for key A and rest will reuse the same future.
- The thread will succeeded putting a future in the map will load the query. Rest of the threads will wait via `future.get()`.
- Once query is loaded, the desired thread will put the item onto onHeap cache via `put` call. Rest of the threads will return the value.
- Remove the desired key from the map.

**2. Avoid using a single read/write lock**

Possible solutions:

- **Segmented cache**: Instead of using a single write/read local and a single segment, we can use a segmented tiered cache to improve read/write performance.
- **Lock per key**: Considering lock usage in TieredCache is to ensure that no same key lands onto both caches, we can considering creating locks per key and then delete them once the request completes. 
     -  There might be some overhead in creating locks frequently. (Will run performance tests)
     - One pros of this is that we don't really need to move the query recompute part of write lock considering other requests for different key will not be blocked.
 



### Related component

Search:Performance

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tiered Caching] Performance improvement for TieredCaching #13989

Is your feature request related to a problem? Please describe

Describe the solution you'd like

1. Firstly we need to move out the query recompute part(during cache miss) outside the write lock in computeIfAbsent

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tiered Caching] Performance improvement for TieredCaching #13989

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

1. Firstly we need to move out the query recompute part(during cache miss) outside the write lock in computeIfAbsent

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions