-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Integrate "fields" API into QL #68467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate "fields" API into QL #68467
Conversation
Pinging @elastic/es-ql (Team:QL) |
@elasticmachine run elasticsearch-ci/default-distro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Glad to see this PR land. The -700 lines is just icing on the cake.
Curious about the impact it will have in benchmarks once it lands.
I would mark it as breaking change even if only to indicate the response of the translate endpoint changes.
@@ -29,9 +29,9 @@ public FieldHitExtractor(StreamInput in) throws IOException { | |||
super(in); | |||
} | |||
|
|||
public FieldHitExtractor(String name, String fullFieldName, DataType dataType, ZoneId zoneId, boolean useDocValue, String hitName, | |||
public FieldHitExtractor(String name, DataType dataType, ZoneId zoneId, String hitName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sourceBuilder.fetchSource(sourceFields.toArray(Strings.EMPTY_ARRAY), null); | ||
} | ||
docFields.forEach(field -> sourceBuilder.docValueField(field.field, field.format)); | ||
fetchFields.forEach(field -> sourceBuilder.fetchField(new FieldAndFormat(field.field, field.format, null))); | ||
scriptFields.forEach(sourceBuilder::scriptField); | ||
} | ||
|
||
public boolean noSource() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely this needs to go in a future PR since the notion of source is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also believe there is room for improvement when it comes to removing code. There may be other places where things are not needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(seems to only be used once?)
@@ -15,7 +15,7 @@ selectAsWKT | |||
SELECT city, ST_AsWKT(location) location FROM "geo" WHERE city = 'Amsterdam'; | |||
|
|||
city:s | location:s | |||
Amsterdam |POINT (4.850311987102032 52.347556999884546) | |||
Amsterdam |POINT (4.850312 52.347557) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea where does the rounding come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costin "fields" API is relying (atm) on extraction from _source, which is basically the input the user used when indexing. Before "fields" API we were relying mainly on extraction from _source but there were field types (geo_point and shape being two of them) that we were taking from docvalues (meaning the actual values that were indexed). And in docvalues, things are just a bit different when it comes to floating point numbers and geo stuff.
That specific document with Amsterdam city has been indexed as "city": "Amsterdam", "location": "52.347557,4.850312"
(location
is a geo_point
) and the returned value is pretty much what has been actually sent as _source to ES.
@@ -187,8 +187,5 @@ private static void optimize(QueryContainer query, SearchSourceBuilder builder) | |||
|
|||
private static void disableSource(SearchSourceBuilder builder) { | |||
builder.fetchSource(FetchSourceContext.DO_NOT_FETCH_SOURCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any downside to disabling stored fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not @costin . Tests passed. I don't see why we would want to return stored fields... can't think of a reason why we had it there in the first place. In theory, we shouldn't need any kind of way of returning values from Elasticsearch except "fields".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The why not disable explicitly stored fields:
builder.storedFields(NO_STORED_FIELD);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@costin I forgot about this part. "fields" API does not allow disabling stored fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a comment for the unit test.
expected.put("columns", Arrays.asList(columnInfo("plain", "keyword_field", "keyword", JDBCType.VARCHAR, Integer.MAX_VALUE))); | ||
expected.put("rows", singletonList(singletonList(ignoreAbove ? null : keyword))); | ||
assertResponse(expected, runSql("SELECT keyword_field FROM test")); | ||
if (explicitSourceSetting == false || enableSource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can extract a method, where you pass the expected Map and the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will look into simplifying the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, LGTM.
} | ||
return value; | ||
} | ||
|
||
protected Object unwrapMultiValue(Object values) { | ||
protected Object unwrapFieldsMultiValue(Object values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the "Fields" plural?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. If I remember well, it's a leftover from a previous attempt where both strategies were used at the same time in code: extraction from source and docvalues (unwrapSourceMultiValue
) and extraction from "fields" API output (unwrapFieldsMultiValue
). No need for the rename, indeed. I'll revert.
sourceBuilder.fetchSource(sourceFields.toArray(Strings.EMPTY_ARRAY), null); | ||
} | ||
docFields.forEach(field -> sourceBuilder.docValueField(field.field, field.format)); | ||
fetchFields.forEach(field -> sourceBuilder.fetchField(new FieldAndFormat(field.field, field.format, null))); | ||
scriptFields.forEach(sourceBuilder::scriptField); | ||
} | ||
|
||
public boolean noSource() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(seems to only be used once?)
@@ -74,10 +74,10 @@ public void testTextField() throws IOException { | |||
* } | |||
*/ | |||
public void testKeywordField() throws IOException { | |||
String query = "SELECT keyword_field FROM test"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nice, prepares the reader for what to expect.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Integrate "fields" API into QL (elastic#68467) * QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (elastic#68602) * Adapt nested fields extraction from "fields" API output to the new un-flattened structure (elastic#68745) (cherry picked from commit ee5cc54)
This is a first step towards switching from using
docvalue_fields
and_source
extraction to using the fields API. It helps simplify the QL code, but does also come with a small drawback which will become visible in FieldHitExtractorTests where some tests are not needed anymore. Any query (especially SQL) on fields that have_source
disabled, but are indexed (accessible throughdocvalue_fields
), will not be possible anymore. The tests in the above mentioned class are changed to adapt to this new restriction by checking that an exception is being thrown in those cases.Future PR(s) against this branch will probably simplify the code even further.
Also, there will be another PR that will add BWC tests similar to this one.
Relates to #67727.