Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add main classes for Query and basic unit tests #172

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

package org.opensearch.neuralsearch.plugin;

import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
Expand All @@ -23,6 +24,7 @@
import org.opensearch.neuralsearch.ml.MLCommonsClientAccessor;
import org.opensearch.neuralsearch.processor.TextEmbeddingProcessor;
import org.opensearch.neuralsearch.processor.factory.TextEmbeddingProcessorFactory;
import org.opensearch.neuralsearch.query.HybridQueryBuilder;
import org.opensearch.neuralsearch.query.NeuralQueryBuilder;
import org.opensearch.plugins.ActionPlugin;
import org.opensearch.plugins.ExtensiblePlugin;
Expand Down Expand Up @@ -60,8 +62,9 @@ public Collection<Object> createComponents(
}

public List<QuerySpec<?>> getQueries() {
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
return Collections.singletonList(
new QuerySpec<>(NeuralQueryBuilder.NAME, NeuralQueryBuilder::new, NeuralQueryBuilder::fromXContent)
return Arrays.asList(
new QuerySpec<>(NeuralQueryBuilder.NAME, NeuralQueryBuilder::new, NeuralQueryBuilder::fromXContent),
new QuerySpec<>(HybridQueryBuilder.NAME, HybridQueryBuilder::new, HybridQueryBuilder::fromXContent)
);
}

Expand Down
160 changes: 160 additions & 0 deletions src/main/java/org/opensearch/neuralsearch/query/HybridQuery.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.neuralsearch.query;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.Objects;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;
import org.apache.lucene.search.ScoreMode;
import org.apache.lucene.search.Weight;

/**
* Implementation fo Query interface for type "hybrid". It allows execution of multiple sub-queries and collect individual
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
* scores for each sub-query.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add @opensearch.internal tag on all these classes. These shouldn't be extended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opensearch.internal cannot be used, seems it's specific to core OpenSearch repo, and for plugins javadoc tasks throws

error: unknown tag: opensearch.internal
 * @opensearch.internal
   ^

we can use final for classes for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is same here. opensearch.internal is for OpenSearch core to indicate that the class is not public to plugins.

public final class HybridQuery extends Query implements Iterable<Query> {

private final List<Query> subQueries;

public HybridQuery(Collection<Query> subQueries) {
Objects.requireNonNull(subQueries, "Collection of Queries must not be null");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check empty list also here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me add this check. Such empty collect doesn't harm but also does not make sense to process such search request.

this.subQueries = new ArrayList<>(subQueries);
}

/**
* Returns an iterator over sub-queries that are parts of this hybrid query
* @return iterator
*/
@Override
public Iterator<Query> iterator() {
return getSubQueries().iterator();
}

/**
* Prints a query to a string, with field assumed to be the default field and omitted.
* @param field default field
* @return string representation of hybrid query
*/
@Override
public String toString(String field) {
StringBuilder buffer = new StringBuilder();
buffer.append("(");
Iterator<Query> it = subQueries.iterator();
for (int i = 0; it.hasNext(); i++) {
Query subquery = it.next();
if (subquery instanceof BooleanQuery) { // wrap sub-boolean in parents
buffer.append("(");
buffer.append(subquery.toString(field));
buffer.append(")");
} else buffer.append(subquery.toString(field));
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
if (i != subQueries.size() - 1) buffer.append(" | ");
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
}
buffer.append(")");
return buffer.toString();
}

/**
* Re-writes queries into primitive queries. Callers are expected to call rewrite multiple times if necessary,
* until the rewritten query is the same as the original query.
* @param reader
* @return
* @throws IOException
*/
@Override
public Query rewrite(IndexReader reader) throws IOException {
if (subQueries.isEmpty()) {
return new MatchNoDocsQuery("empty HybridQuery");
}

if (subQueries.size() == 1) {
return subQueries.iterator().next();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont we want to call the rewrite if that query here? We are just returning the query iterator?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why use iterator here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it's an artifact from POC, we need to call rewrite and wrap it into HybridQuery even for 1 sub-query, I'll do the change

}

boolean actuallyRewritten = false;
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
List<Query> rewrittenSubQueries = new ArrayList<>();
for (Query subQuery : subQueries) {
Query rewrittenSub = subQuery.rewrite(reader);
/* we keep rewrite sub-query unless it's not equal to itself, it may take multiple levels of recursive calls
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
queries need to be rewritten from high-level clauses into lower-level clauses because low-level clauses
perform better. For hybrid query we need to track progress of re-write for all sub-queries */
actuallyRewritten |= rewrittenSub != subQuery;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment what line is doing and why we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

rewrittenSubQueries.add(rewrittenSub);
}

if (actuallyRewritten) {
return new HybridQuery(rewrittenSubQueries);
}

return super.rewrite(reader);
}

/**
* Recurse through the query tree, visiting any child queries
* @param queryVisitor a QueryVisitor to be called by each query in the tree
*/
@Override
public void visit(QueryVisitor queryVisitor) {
QueryVisitor v = queryVisitor.getSubVisitor(BooleanClause.Occur.SHOULD, this);
for (Query q : subQueries) {
q.visit(v);
}
}
Comment on lines +115 to +120
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this visitor can provide some details here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main purpose is to allow visitor pattern for Hybrid query clients in future. It's an abstract method in Query class, we need to provide some implementation. As for the visitor, I've found one example of visitor in core: In TopDocsCollectorContext we do have a visitor that is checking maxScore flag in each sub-query
We can throw UnsupportedOperation for now as I do not recall usage of visitor in other parts of normalization related code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked it, visitors are executed by IndexSearcher, so we need some sort of implementation. Leaving it as is for now.


/**
* Override and implement query instance equivalence properly in a subclass. This is required so that QueryCache works properly.
* @param other query object that when compare with this query object
* @return
*/
@Override
public boolean equals(Object other) {
heemin32 marked this conversation as resolved.
Show resolved Hide resolved
return sameClassAs(other) && equalsTo(getClass().cast(other));
}

private boolean equalsTo(HybridQuery other) {
return Objects.equals(subQueries, other.subQueries);
}

/**
* Override and implement query hash code properly in a subclass. This is required so that QueryCache works properly.
* @return hash code of this object
*/
@Override
public int hashCode() {
int h = classHash();
h = 31 * h + Objects.hashCode(subQueries);
return h;
}

public Collection<Query> getSubQueries() {
return Collections.unmodifiableCollection(subQueries);
}

/**
* Create the Weight used to score this query
*
* @param searcher
* @param scoreMode How the produced scorers will be consumed.
* @param boost The boost that is propagated by the parent queries.
* @return
* @throws IOException
*/
@Override
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {
return new HybridQueryWeight(this, searcher, scoreMode, boost);
}
}
Loading