-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/oap/cluster plugin #1440
Feature/oap/cluster plugin #1440
Conversation
Some test cases unfinished, I will supplement them later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions and questions have been added to the comments.
@@ -0,0 +1,39 @@ | |||
# Observability Analysis Platform Cluster Management | |||
OAP(Observability Analysis Platform) server is a distributed system, services need to find each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the document, we are using backend to represent the collector
or server
. Let's keep it same.
OAP(Observability Analysis Platform) server is a distributed system, services need to find each | ||
other. i.e. a web service needs to find query service, level 1 aggregate service need to find | ||
level 2 aggregate service. Cluster management is just a client-side implementation. It must work | ||
together with a distributed coordination server, (i.e. Zookeeper, Consul, Kubernetes.) unless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be distributed coordination service
. Service is Zookeeper says for itself. Ref https://zookeeper.apache.org/
|
||
### Cluster Management Plugins | ||
By default, OAP server provides two implementations for cluster management, which are standalone | ||
and zookeeper. When the applications being monitored are small, you can choose the standalone mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the scale of services, which are under monitoring, is small,
### Cluster Management Plugins | ||
By default, OAP server provides two implementations for cluster management, which are standalone | ||
and zookeeper. When the applications being monitored are small, you can choose the standalone mode. | ||
If they are big, you must choose the cluster mode by zookeeper plugin, or you can implement another |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are big
- >Otherwise
- It(Scale) is big.
- Remove
another
There are two interfaces defined beforehand in the OAP server core, which are Module register and | ||
module query, all the cluster management plugins must implement those two interfaces. | ||
|
||
* Module Register: When any modules which need to provide services for each other, those modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When any module needs to provide services for others, the module must do register through this interface.
The first half of adjustment is about the language only, the second half is that Cluster Management Interface can't guarantee the register will be into service discovery service. Like standalone
is no service existed outside, Kubernetes
is doing service instance management by itself, no real register happens.
* Module Register: When any modules which need to provide services for each other, those modules | ||
must invoke this interface to register itself into the service discovery server. | ||
* Module Query: When any modules which need to call each other, those modules must retrieve the | ||
module set by this interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When any modules which need to call each other
-> Same as above. No needwhich
.each other
means in the same group or set, but in this case, they are not.- The descriptions of
Register
andQuery
should be from the provider side. Such asWhen any module needs to find other services, it can use this interface to retrieve the service instance list in the certain order.
* Module Query: When any modules which need to call each other, those modules must retrieve the | ||
module set by this interface. | ||
|
||
### Process Flow Between Client and Cluster Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the following description:
The client has two ways to connect the backend, one, use the direct link by a set of backend instance endpoint list, or you can use naming service, which considers your given list is just the seed nodes of the whole cluster. The following graph is showing you how naming service works. You need to check the probe documents to know which way(s) is(are) supported.
private final Map<String, ServiceCache<InstanceDetails>> serviceCacheMap; | ||
|
||
public ServiceCacheManager() { | ||
this.serviceCacheMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about the OP on put
is synchronized already? No concurrency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just can know there should be multiple service names in this Map, but do they just be added during startup?
public List<InstanceDetails> query(String moduleName, String providerName) throws ServiceRegisterException { | ||
List<ServiceInstance<InstanceDetails>> serviceInstances = cacheManager.get(NodeNameBuilder.build(moduleName, providerName)).getInstances(); | ||
|
||
List<InstanceDetails> instanceDetails = new ArrayList<>(serviceInstances.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a high-frequency call? Will query
be called everytime you do secondary aggregation? If so, be careful of this ArrayList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ArrayList is a fixed size set.
cluster: | ||
zookeeper: | ||
hostPort: localhost:2181 | ||
sessionTimeout: 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any purpose to change this?
Please answer these questions before submitting the pull request
New feature or improvement
Cluster management implementation.