-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-137: Pulsar Client Shared State API #13490
Comments
This PIP number is duplicated with #13408 |
@mattisonchao yes you could, but most of the code is already in the attached github repo. |
we talked in the Community meeting about this PIP. The main concern from @merlimat is the exposing a "put" operation will give a false illusion that you have something like a Map, but actually every "write" operation will be "slow" because we need to acquire the Producer lock. @merlimat suggested to split this into two distinct interfaces: a API for readers and one for writers, possibly extending the work done on PIP-104 TableViews (#12356). I am fine with extending PIP-104 in that direction (to add a writer side of the TableView), but I believe that the API proposed here is going to give a API to achieve the goal of having a generic "Java object" as share state, by letting the developer of the Shared State Object deal with the internal representation. @merlimat @merlimat WDTY ? |
The issue had no activity for 30 days, mark with Stale label. |
The issue had no activity for 30 days, mark with Stale label. |
Motivation
Sometimes in a distributed application or library that already uses Pulsar you need to some "state" across several instances of the application, for example:
Such cases are also very frequent while developing Pulsar IO Connectors or Pulsar Broker Protocol Handlers.
Currently you end up in adding some additional component to the application, like a Database, or in using the internal ZooKeeper or BookKeeper/Distributed Log components supporting Pulsar.
This is usually awkward both for the developers and for system administrators.
We can provide a built-in mechanism in the Pulsar client API to support building such shared data structures.
In fact since Pulsar 2.8.0 we have the Exclusive Producer, that allows you to use Pulsar as a consistent write-ahead-log for replicated state machines.
We can provide an API to handle a shared distributed Java Object: each client can access the Object and mutate the State,
ensuring consistency.
This is a sample implementation: https://github.com/eolivelli/pulsar-shared-state-manager
Goal
It is not a goal to implement a Pulsar backed Database system
API Changes
PulsarMap recipe, interface:
Implementation
The proposal is to add this SharedStateManager API as part of the Java Pulsar Client API:
This way the API and the implementation will be available to every Pulsar Client user and also for Pulsar IO Connectors and Pulsar Broker Protocol Handlers.
An alternative is to put it in the pulsar-adapters repository, but that would make it harder to discover the API and also it will require Pulsar IO Adapters and Broker Protocol Handlers to bundle copies of this new API into the .nar files.
The SharedStateManager holds in memory a reference to a Java object, that represents the
State
.There is a non-partitioned Pulsar topic that stores all the changes on the Java object.
In order to update the State the local SharedStateManager performs these steps:
When you are reading the State you have two ways:
If you want to ensure strong consistency you perform a "read" operation together with a dummy write operation, so inside the implicit Lock acquired by the Exclusive producer.
at bootstrap we read fully the topic (from the beginning to the tail) in order to build the State.
We do not want to require to the Client application to store locally the State.
This sample PulsarMap implementation, describes how to use the SharedStateManager:
Future works and other considerations
Depending on the implementation of the Shared State (this is up to the developer, so the user of the new API) you need to set infinite retention on the support topic, otherwise you may lose some changes from the commit log.
Pulsar is very flexible and initially we can let the user configure properly the system, this is because we want to provide the basic API to easily build a Shared State Manager, using the Exclusive Producer API together with the Reader API.
In the future we can implement more advanced features like making checkpoints or leveraging compacted topics, but this can be done as a follow up work.
Reject Alternatives
None
The text was updated successfully, but these errors were encountered: