Skip to content

Make Fuzziness reject illegal values earlier #33511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/reference/migration/migrate_8_0.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ coming[8.0.0]
* <<breaking_80_mappings_changes>>
* <<breaking_80_snapshots_changes>>
* <<breaking_80_security_changes>>
* <<breaking_80_java_changes>>

//NOTE: The notable-breaking-changes tagged regions are re-used in the
//Installation and Upgrade Guide
Expand Down Expand Up @@ -43,3 +44,4 @@ include::migrate_8_0/discovery.asciidoc[]
include::migrate_8_0/mappings.asciidoc[]
include::migrate_8_0/snapshots.asciidoc[]
include::migrate_8_0/security.asciidoc[]
include::migrate_8_0/java.asciidoc[]
20 changes: 20 additions & 0 deletions docs/reference/migration/migrate_8_0/java.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[float]
[[breaking_80_java_changes]]
=== Java API changes

[float]
==== Changes to Fuzziness

To create `Fuzziness` instances, use the `fromString` and `fromEdits` method
instead of the `build` method that used to accept both Strings and numeric
values. Several fuzziness setters on query builders (e.g.
MatchQueryBuilder#fuzziness) now accept only a `Fuzziness`instance instead of
an Object. You should preferably use the available constants (e.g.
Fuzziness.ONE, Fuzziness.AUTO) or build your own instance using the above
mentioned factory methods.

Fuzziness used to be lenient when it comes to parsing arbitrary numeric values
while silently truncating them to one of the three allowed edit distances 0, 1
or 2. This leniency is now removed and the class will throw errors when trying
to construct an instance with another value (e.g. floats like 1.3 used to get
accepted but truncated to 1). You should use one of the allowed values.
151 changes: 74 additions & 77 deletions server/src/main/java/org/elasticsearch/common/unit/Fuzziness.java
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

import org.elasticsearch.ElasticsearchParseException;
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.io.stream.Writeable;
Expand All @@ -38,41 +39,73 @@
*/
public final class Fuzziness implements ToXContentFragment, Writeable {

public static final String X_FIELD_NAME = "fuzziness";
public static final Fuzziness ZERO = new Fuzziness(0);
public static final Fuzziness ONE = new Fuzziness(1);
public static final Fuzziness TWO = new Fuzziness(2);
public static final Fuzziness AUTO = new Fuzziness("AUTO");
static final String X_FIELD_NAME = "fuzziness";
public static final ParseField FIELD = new ParseField(X_FIELD_NAME);
private static final int DEFAULT_LOW_DISTANCE = 3;
private static final int DEFAULT_HIGH_DISTANCE = 6;

public static final Fuzziness ZERO = new Fuzziness("0");
public static final Fuzziness ONE = new Fuzziness("1");
public static final Fuzziness TWO = new Fuzziness("2");
public static final Fuzziness AUTO = new Fuzziness("AUTO");

static final int DEFAULT_LOW_DISTANCE = 3;
static final int DEFAULT_HIGH_DISTANCE = 6;

private final String fuzziness;
private int lowDistance = DEFAULT_LOW_DISTANCE;
private int highDistance = DEFAULT_HIGH_DISTANCE;

private Fuzziness(int fuzziness) {
if (fuzziness != 0 && fuzziness != 1 && fuzziness != 2) {
throw new IllegalArgumentException("Valid edit distances are [0, 1, 2] but was [" + fuzziness + "]");
}
this.fuzziness = Integer.toString(fuzziness);
private Fuzziness(String fuzziness) {
this.fuzziness = fuzziness;
}

private Fuzziness(String fuzziness) {
if (fuzziness == null || fuzziness.isEmpty()) {
throw new IllegalArgumentException("fuzziness can't be null!");
/**
* Creates a {@link Fuzziness} instance from an edit distance. The value must be one of {@code [0, 1, 2]}
* Note: Using this method only makes sense if the field you are applying Fuzziness to is some sort of string.
* @throws IllegalArgumentException if the edit distance is not in [0, 1, 2]
*/
public static Fuzziness fromEdits(int edits) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method used to be very permissive, e.g. parsing float values and silently casting them to ints, where values larger than 2 get ignored somewhere later in the code I think.

switch (edits) {
case 0: return Fuzziness.ZERO;
case 1: return Fuzziness.ONE;
case 2: return Fuzziness.TWO;
default:
throw new IllegalArgumentException("Valid edit distances are [0, 1, 2] but was [" + edits + "]");
}
this.fuzziness = fuzziness.toUpperCase(Locale.ROOT);
}

private Fuzziness(String fuzziness, int lowDistance, int highDistance) {
this(fuzziness);
if (lowDistance < 0 || highDistance < 0 || lowDistance > highDistance) {
throw new IllegalArgumentException("fuzziness wrongly configured, must be: lowDistance > 0, highDistance" +
" > 0 and lowDistance <= highDistance ");
/**
* Creates a {@link Fuzziness} instance from a String representation. This can
* either be an edit distance where the value must be one of
* {@code ["0", "1", "2"]} or "AUTO" for a fuzziness that depends on the term
* length. Using the "AUTO" fuzziness, you can optionally supply low and high
* distance arguments in the format {@code "AUTO:[low],[high]"}. See the query
* DSL documentation for more information about how these values affect the
* fuzziness value.
* Note: Using this method only makes sense if the field you
* are applying Fuzziness to is some sort of string.
*/
public static Fuzziness fromString(String fuzzinessString) {
if (Strings.isEmpty(fuzzinessString)) {
throw new IllegalArgumentException("fuzziness cannot be null or empty.");
}
String upperCase = fuzzinessString.toUpperCase(Locale.ROOT);
// check if it is one of the "AUTO" variants
if (upperCase.equals("AUTO")) {
return Fuzziness.AUTO;
} else if (upperCase.startsWith("AUTO:")) {
return parseCustomAuto(upperCase);
} else {
// should be a float or int representing a valid edit distance, otherwise throw error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we restrict to int ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I see this still leaves the door open for something like "fuzziness" : 1.5 to be silently cast to int 1. At the same time I think it would be good to continue to allow floats equivalent to the allowed edit distances 0,1,2, e.g. allow "1.0" or "1.0000" but reject "1.1". This is because I remember other issues where users going through rest sometimes had issues that their json-strigification produces float-like values even if they feed them ints (I think this was mostly in javascript clients). Anyway, would you be okay to allow "0.0", "1.0" and "2.0" so anything evenly divisible still?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

try {
float parsedFloat = Float.parseFloat(upperCase);
if (parsedFloat % 1 > 0) {
throw new IllegalArgumentException("fuzziness needs to be one of 0.0, 1.0 or 2.0 but was " + parsedFloat);
}
return fromEdits((int) parsedFloat);
} catch (NumberFormatException e) {
throw new IllegalArgumentException("fuzziness cannot be [" + fuzzinessString + "].", e);
}
}
this.lowDistance = lowDistance;
this.highDistance = highDistance;
}

/**
Expand Down Expand Up @@ -101,39 +134,23 @@ public void writeTo(StreamOutput out) throws IOException {
}
}

/**
* Creates a {@link Fuzziness} instance from an edit distance. The value must be one of {@code [0, 1, 2]}
*
* Note: Using this method only makes sense if the field you are applying Fuzziness to is some sort of string.
*/
public static Fuzziness fromEdits(int edits) {
return new Fuzziness(edits);
}

public static Fuzziness build(Object fuzziness) {
if (fuzziness instanceof Fuzziness) {
return (Fuzziness) fuzziness;
}
String string = fuzziness.toString();
if (AUTO.asString().equalsIgnoreCase(string)) {
return AUTO;
} else if (string.toUpperCase(Locale.ROOT).startsWith(AUTO.asString() + ":")) {
return parseCustomAuto(string);
}
return new Fuzziness(string);
}

private static Fuzziness parseCustomAuto( final String string) {
assert string.toUpperCase(Locale.ROOT).startsWith(AUTO.asString() + ":");
String[] fuzzinessLimit = string.substring(AUTO.asString().length() + 1).split(",");
private static Fuzziness parseCustomAuto(final String fuzzinessString) {
assert fuzzinessString.toUpperCase(Locale.ROOT).startsWith(AUTO.asString() + ":");
String[] fuzzinessLimit = fuzzinessString.substring(AUTO.asString().length() + 1).split(",");
if (fuzzinessLimit.length == 2) {
try {
int lowerLimit = Integer.parseInt(fuzzinessLimit[0]);
int highLimit = Integer.parseInt(fuzzinessLimit[1]);
return new Fuzziness("AUTO", lowerLimit, highLimit);
if (lowerLimit < 0 || highLimit < 0 || lowerLimit > highLimit) {
throw new ElasticsearchParseException("fuzziness wrongly configured [{}]. Must be 0 < lower value <= higher value.",
fuzzinessString);
}
Fuzziness fuzziness = new Fuzziness("AUTO");
fuzziness.lowDistance = lowerLimit;
fuzziness.highDistance = highLimit;
return fuzziness;
} catch (NumberFormatException e) {
throw new ElasticsearchParseException("failed to parse [{}] as a \"auto:int,int\"", e,
string);
throw new ElasticsearchParseException("failed to parse [{}] as a \"auto:int,int\"", e, fuzzinessString);
}
} else {
throw new ElasticsearchParseException("failed to find low and high distance values");
Expand All @@ -144,29 +161,9 @@ public static Fuzziness parse(XContentParser parser) throws IOException {
XContentParser.Token token = parser.currentToken();
switch (token) {
case VALUE_STRING:
return fromString(parser.text());
case VALUE_NUMBER:
final String fuzziness = parser.text();
if (AUTO.asString().equalsIgnoreCase(fuzziness)) {
return AUTO;
} else if (fuzziness.toUpperCase(Locale.ROOT).startsWith(AUTO.asString() + ":")) {
return parseCustomAuto(fuzziness);
}
try {
final int minimumSimilarity = Integer.parseInt(fuzziness);
switch (minimumSimilarity) {
case 0:
return ZERO;
case 1:
return ONE;
case 2:
return TWO;
default:
return build(fuzziness);
}
} catch (NumberFormatException ex) {
return build(fuzziness);
}

return fromEdits(parser.intValue());
default:
throw new IllegalArgumentException("Can't parse fuzziness on token: [" + token + "]");
}
Expand Down Expand Up @@ -200,7 +197,7 @@ public float asFloat() {
if (this.equals(AUTO) || isAutoWithCustomValues()) {
return 1f;
}
return Float.parseFloat(fuzziness.toString());
return Float.parseFloat(fuzziness);
}

private int termLen(String text) {
Expand All @@ -209,13 +206,13 @@ private int termLen(String text) {

public String asString() {
if (isAutoWithCustomValues()) {
return fuzziness.toString() + ":" + lowDistance + "," + highDistance;
return fuzziness + ":" + lowDistance + "," + highDistance;
}
return fuzziness.toString();
return fuzziness;
}

private boolean isAutoWithCustomValues() {
return fuzziness.startsWith("AUTO") && (lowDistance != DEFAULT_LOW_DISTANCE ||
return fuzziness.equals("AUTO") && (lowDistance != DEFAULT_LOW_DISTANCE ||
highDistance != DEFAULT_HIGH_DISTANCE);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,8 @@ public String minimumShouldMatch() {
}

/** Sets the fuzziness used when evaluated to a fuzzy query type. Defaults to "AUTO". */
public MatchBoolPrefixQueryBuilder fuzziness(Object fuzziness) {
this.fuzziness = Fuzziness.build(fuzziness);
public MatchBoolPrefixQueryBuilder fuzziness(Fuzziness fuzziness) {
this.fuzziness = fuzziness;
return this;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,8 @@ public String analyzer() {
}

/** Sets the fuzziness used when evaluated to a fuzzy query type. Defaults to "AUTO". */
public MatchQueryBuilder fuzziness(Object fuzziness) {
this.fuzziness = Fuzziness.build(fuzziness);
public MatchQueryBuilder fuzziness(Fuzziness fuzziness) {
this.fuzziness = Objects.requireNonNull(fuzziness);
return this;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -376,10 +376,8 @@ public int slop() {
/**
* Sets the fuzziness used when evaluated to a fuzzy query type. Defaults to "AUTO".
*/
public MultiMatchQueryBuilder fuzziness(Object fuzziness) {
if (fuzziness != null) {
this.fuzziness = Fuzziness.build(fuzziness);
}
public MultiMatchQueryBuilder fuzziness(Fuzziness fuzziness) {
this.fuzziness = Objects.requireNonNull(fuzziness);
return this;
}

Expand Down Expand Up @@ -707,7 +705,6 @@ public static MultiMatchQueryBuilder fromXContent(XContentParser parser) throws
.type(type)
.analyzer(analyzer)
.cutoffFrequency(cutoffFrequency)
.fuzziness(fuzziness)
.fuzzyRewrite(fuzzyRewrite)
.maxExpansions(maxExpansions)
.minimumShouldMatch(minimumShouldMatch)
Expand All @@ -723,6 +720,9 @@ public static MultiMatchQueryBuilder fromXContent(XContentParser parser) throws
if (lenient != null) {
builder.lenient(lenient);
}
if (fuzziness != null) {
builder.fuzziness(fuzziness);
}
return builder;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ protected Query handleBareFuzzy(String field, Token fuzzySlop, String termImage)
if (fuzzySlop.image.length() == 1) {
return getFuzzyQuery(field, termImage, fuzziness.asDistance(termImage));
}
return getFuzzyQuery(field, termImage, Fuzziness.build(fuzzySlop.image.substring(1)).asFloat());
return getFuzzyQuery(field, termImage, Fuzziness.fromString(fuzzySlop.image.substring(1)).asFloat());
}

@Override
Expand All @@ -434,7 +434,7 @@ protected Query getFuzzyQuery(String field, String termStr, float minSimilarity)
}
List<Query> queries = new ArrayList<>();
for (Map.Entry<String, Float> entry : fields.entrySet()) {
Query q = getFuzzyQuerySingle(entry.getKey(), termStr, minSimilarity);
Query q = getFuzzyQuerySingle(entry.getKey(), termStr, (int) minSimilarity);
assert q != null;
queries.add(applyBoost(q, entry.getValue()));
}
Expand All @@ -447,15 +447,15 @@ protected Query getFuzzyQuery(String field, String termStr, float minSimilarity)
}
}

private Query getFuzzyQuerySingle(String field, String termStr, float minSimilarity) throws ParseException {
private Query getFuzzyQuerySingle(String field, String termStr, int minSimilarity) throws ParseException {
currentFieldType = context.fieldMapper(field);
if (currentFieldType == null) {
return newUnmappedFieldQuery(field);
}
try {
Analyzer normalizer = forceAnalyzer == null ? queryBuilder.context.getSearchAnalyzer(currentFieldType) : forceAnalyzer;
BytesRef term = termStr == null ? null : normalizer.normalize(field, termStr);
return currentFieldType.fuzzyQuery(term, Fuzziness.fromEdits((int) minSimilarity),
return currentFieldType.fuzzyQuery(term, Fuzziness.fromEdits(minSimilarity),
getFuzzyPrefixLength(), fuzzyMaxExpansions, fuzzyTranspositions);
} catch (RuntimeException e) {
if (lenient) {
Expand All @@ -467,7 +467,7 @@ private Query getFuzzyQuerySingle(String field, String termStr, float minSimilar

@Override
protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) {
int numEdits = Fuzziness.build(minimumSimilarity).asDistance(term.text());
int numEdits = Fuzziness.fromEdits((int) minimumSimilarity).asDistance(term.text());
FuzzyQuery query = new FuzzyQuery(term, numEdits, prefixLength,
fuzzyMaxExpansions, fuzzyTranspositions);
QueryParsers.setRewriteMethod(query, fuzzyRewriteMethod);
Expand Down
Loading