Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scatter-gather support on Query APIs #36

Closed
jeqo opened this issue Aug 28, 2019 · 4 comments · Fixed by #38
Closed

Scatter-gather support on Query APIs #36

jeqo opened this issue Aug 28, 2019 · 4 comments · Fixed by #38
Labels
enhancement New feature or request

Comments

@jeqo
Copy link
Collaborator

jeqo commented Aug 28, 2019

Rational

The storage layer is based on Kafka Streams local store, that is aligned with partitioning. Currently we have specified that our implementation supports running only a standalone instance for storage, because if we scale the Zipkin instances, storage will get partitioned between servers.

In order to cope with this scenario I'd like to propose a scatter-gather support that allows storage layer to query other instances to build a response.

Example Scenario

Given a partitioned back-end with 3 zipkin servers (a,b,c) running as a cluster, if we receive a query from client-side, zipkin-a receive the request, and forward the same query to zipkin-b and zipkin-c with an additional query param (e.g. peer=true) so b and c don't propagate the query. zipkin-a receives responses and build response.

Feature Request

This feature will require:

  • Register current instance URL via metadata API [1]
  • Have a client to call other instances.
  • Have a way to distinguish between peer calls and client calls to avoid repeating calls.

Kafka Streams already supports a metadata API to register peers URLs [1]

[1] https://kafka.apache.org/documentation/streams/developer-guide/interactive-queries.html#adding-an-rpc-layer-to-your-application

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 29, 2019 via email

@jeqo
Copy link
Collaborator Author

jeqo commented Aug 29, 2019

@adriancole thanks for the feedback!

ZK is part of Kafka so yeah should be possible to find a means of
registration. you may end up with the usual clustering concerns like what
if the partition node goes down, who takes over etc. maintaining health and
partition metadata about the healthy ones etc.

StreamsMetadata already supports this, no need to use ZK imo. Also ZK will potentially be out of Kafka if this goes through https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum (not any time zoom I suspect)

one question is if there is an existing partition aware layer over Kafka ..
probably someone made a project like this and we could list lessons learned
if not anything else.

IIRC Lightbend did a library on top of KStream Metadata API https://www.lightbend.com/blog/kafka-http-interactive-layer which does scatter-gather as well. But we should be safe with Metadata API: https://kafka.apache.org/documentation/streams/developer-guide/interactive-queries.html#discovering-and-accessing-application-instances-and-their-local-state-stores

Do we have any zipkin-api client library to start playing on this, or would a plain HTTP client be enough?

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 29, 2019 via email

@jeqo
Copy link
Collaborator Author

jeqo commented Aug 29, 2019

thanks! will give a try with OkHttp

This was referenced Sep 3, 2019
@jeqo jeqo closed this as completed in #38 Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants