[RFC] Streaming ingestion (pull based)

### Is your feature request related to a problem? Please describe

Today, OpenSearch exposes an HTTP-based API for indexing, in which users invoke the endpoint to push changes. It also has a “_bulk” API to group multiple operations in a single request. 

There are some shortcomings with the push-based API, especially for complex applications at scale: 

1. Ingestion spikes beyond the server capacity can result in request rejections and backpressure to the clients. Thus it adds complexity to the application producer to properly handle the back pressures. 
2. In complex applications, not all index requests are equally important. However, today there’s no good way to differentiate the index requests based on the priority with the current HTTP API. This issue is particularly challenging for heavy ingestion workloads. 
3. Some cases would require replaying the indexing requests. For example, the cluster is restored from a previous snapshot, and the ingestion history needs to be replayed. Another example is the live cluster migration in which the indexing needs to be applied to two clusters. 
4. In the segment-replication mode, [translog](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html) is used for the durability of staged index changes before they are committed. However, excessive Translog can result in [overflow](https://discuss.elastic.co/t/elasticesearch-7-6-2-translog-overflow-issue/256776) under heavy ingestion. And in the document-replication mode, it needs to wait for the completion of the document. Neither translog nor synchronous replication is necessary in the streaming ingestion mode, because the streaming buffer can provide the durability guarantee.

### Describe the solution you'd like

In general, a streaming ingestion solution could bring in multiple values and address the aforementioned challenges:

1. By introducing a streaming buffer, it’s possible to decouple the producers (i.e. applications who generate index operations) and consumers (i.e. OpenSearch server). With that, the ingestion spike can be smoothed and buffered, without blocking the producer.
2. With streaming ingestion, it’s possible to enhance it with priority-aware ingestion by employing multiple Kafka topics to isolate the events based on priority, and applying different rate-limit thresholds to prioritize important events.
3. A streaming platform like Kafka stores messages in a durable log, and allows consumers to re-read messages from the past via offset. This is a powerful feature that offers reprocessing and replayability.
4. A durable messaging system like Kafka can provide the durability guarantee of the messages and still maintain a high write throughput. Thus it’s possible to simplify the ingestion component in OpenSearch to delegate the message durability problem to the messaging system.


More details of the solution can be found in this [document](https://docs.google.com/document/d/1C7GmgC-8lEI9p9ju1HvI4S_9yAsdNwxXo51zhhm5BC4/edit?tab=t.0#heading=h.agp1v7xu0qhi)


### Related component

Indexing

### Describe alternatives you've considered

As an alternative, the plugin-based approach starts a streaming-ingester process as a [sidecar](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) to the OpenSearch server in the same host, which is described in this [section](https://docs.google.com/document/d/1C7GmgC-8lEI9p9ju1HvI4S_9yAsdNwxXo51zhhm5BC4/edit?tab=t.0#heading=h.9mqs9syc1b2x).

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Streaming ingestion (pull based) #16495

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development