-
Notifications
You must be signed in to change notification settings - Fork 2.6k
SOLR-14987: Reuse HttpSolrClient per node vs. one per Solr core when using CloudSolrStream #2067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing CloudSolrStream
I don't think the current solution solves the problem of clusters many nodes, you'll still get a bunch of client objects, still investigating what can be done about that |
@@ -334,11 +334,6 @@ private StreamComparator parseComp(String sort, String fl) throws IOException { | |||
public static Slice[] getSlices(String collectionName, ZkStateReader zkStateReader, boolean checkAlias) throws IOException { | |||
ClusterState clusterState = zkStateReader.getClusterState(); | |||
|
|||
Map<String, DocCollection> collectionsMap = clusterState.getCollectionsMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related: can we update the javadoc on clusterState.getCollectionsMap to be more explicit that it will make a call to zk, instead of the current may
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
|
||
//TODO we should probably split collection by comma to query more than one | ||
// which is something already supported in other parts of Solr | ||
|
||
// check for alias or collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cache the value of zkStateReader.getAliases
below to avoid volatile reads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the call to getAliases out of the for loop
@@ -126,6 +135,17 @@ public void open() throws IOException { | |||
} | |||
} | |||
|
|||
private String getNodeUrl() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we precomute this in the constructor?
Hi @joel-bernstein wasn't able to assign you as a reviewer on this, but would love for you to take a look when convenient. |
} | ||
|
||
// Check collection case insensitive | ||
for(Entry<String, DocCollection> entry : collectionsMap.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this b/c I don't think we should try to accommodate improperly cased collection names. No tests broke, but let me know if we need this for some reason I don't understand
} else { | ||
// stream of replicas to reuse the same SolrHttpClient per baseUrl | ||
// avoids re-parsing data we already have in the replicas | ||
streamOfSolrStream = getReplicas(this.zkHost, this.collection, this.streamContext, mParams).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're keeping the Replica so we have direct access to its baseUrl and core name instead of parsing those out of the shardUrl
@@ -268,8 +275,7 @@ private Map mapFields(Map fields, Map<String,String> mappings) { | |||
return fields; | |||
} | |||
|
|||
// temporary... | |||
public TupleStreamParser constructParser(SolrClient server, SolrParams requestParams) throws IOException, SolrServerException { | |||
private TupleStreamParser constructParser(SolrParams requestParams) throws IOException, SolrServerException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't seem like this method needed to be public and we already get a SolrClient in the open method, so no need to pass it. However, this breaks a public method signature, so is only for Solr 9.x and shouldn't be back-ported to 8.x
Plan to merge this into master later today. |
…using CloudSolrStream (apache#2067)
…using CloudSolrStream (apache#2067)
…using CloudSolrStream (apache#2067)
Description
For collections with many shards (or aliases with many collections and some shards),
CloudSolrStream
will end up creating a newHttpSolrClient
for everySolrStream
it opens because the cache key is the full core URL, such as:http://127.0.0.1:63460/solr/collection4_shard4_replica_n6/
In addition,
CloudSolrStream#getSlices
was callingclusterState.getCollectionsMap()
which pre-emptively loads allLazyCollectionRef
from ZK unnecessarily. This could cause an issue with clusters with many collections and slow down the streaming expression execution.Solution
In this PR, I've introduced a new ctor in
SolrStream
to pass the Replica'sbaseUrl
andcore
as separate parameters. This leads to reusing the sameHttpSolrClient
for the same node because the cache key is nowhttp://127.0.0.1:63460/solr/
. I chose this new ctor approach becauseCloudSolrStream
is not the only consumer ofSolrStream
and it knows how the list of URLs where constructed from cluster state, so it can safely make the decision about passing the core and reusing clients.When the request is sent to the remote core, we need to add the core name to the path. This happens in
SolrStream#constructParser
. This method was public and takes a SolrClient (even though SolrStream already has an HttpSolrClient created in theopen
method); I've changed the signature to be private and use the client opened in theopen
method.Tests
Added a new test
testCloudStreamClientCache
inStreamingTest
to verify the SolrStreams created by the CloudStream have the correct baseUrl (without the core).Checklist
Please review the following and check all that apply:
master
branch../gradlew check
.