Skip to content

Integrate "fields" API into QL #68467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Feb 3, 2021

This is a first step towards switching from using docvalue_fields and _source extraction to using the fields API. It helps simplify the QL code, but does also come with a small drawback which will become visible in FieldHitExtractorTests where some tests are not needed anymore. Any query (especially SQL) on fields that have _source disabled, but are indexed (accessible through docvalue_fields), will not be possible anymore. The tests in the above mentioned class are changed to adapt to this new restriction by checking that an exception is being thrown in those cases.

Future PR(s) against this branch will probably simplify the code even further.
Also, there will be another PR that will add BWC tests similar to this one.

Relates to #67727.

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Feb 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@astefan
Copy link
Contributor Author

astefan commented Feb 3, 2021

@elasticmachine run elasticsearch-ci/default-distro

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Glad to see this PR land. The -700 lines is just icing on the cake.
Curious about the impact it will have in benchmarks once it lands.

I would mark it as breaking change even if only to indicate the response of the translate endpoint changes.

@@ -29,9 +29,9 @@ public FieldHitExtractor(StreamInput in) throws IOException {
super(in);
}

public FieldHitExtractor(String name, String fullFieldName, DataType dataType, ZoneId zoneId, boolean useDocValue, String hitName,
public FieldHitExtractor(String name, DataType dataType, ZoneId zoneId, String hitName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

sourceBuilder.fetchSource(sourceFields.toArray(Strings.EMPTY_ARRAY), null);
}
docFields.forEach(field -> sourceBuilder.docValueField(field.field, field.format));
fetchFields.forEach(field -> sourceBuilder.fetchField(new FieldAndFormat(field.field, field.format, null)));
scriptFields.forEach(sourceBuilder::scriptField);
}

public boolean noSource() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely this needs to go in a future PR since the notion of source is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also believe there is room for improvement when it comes to removing code. There may be other places where things are not needed anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(seems to only be used once?)

@@ -15,7 +15,7 @@ selectAsWKT
SELECT city, ST_AsWKT(location) location FROM "geo" WHERE city = 'Amsterdam';

city:s | location:s
Amsterdam |POINT (4.850311987102032 52.347556999884546)
Amsterdam |POINT (4.850312 52.347557)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea where does the rounding come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin "fields" API is relying (atm) on extraction from _source, which is basically the input the user used when indexing. Before "fields" API we were relying mainly on extraction from _source but there were field types (geo_point and shape being two of them) that we were taking from docvalues (meaning the actual values that were indexed). And in docvalues, things are just a bit different when it comes to floating point numbers and geo stuff.

That specific document with Amsterdam city has been indexed as "city": "Amsterdam", "location": "52.347557,4.850312" (location is a geo_point) and the returned value is pretty much what has been actually sent as _source to ES.

@@ -187,8 +187,5 @@ private static void optimize(QueryContainer query, SearchSourceBuilder builder)

private static void disableSource(SearchSourceBuilder builder) {
builder.fetchSource(FetchSourceContext.DO_NOT_FETCH_SOURCE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any downside to disabling stored fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not @costin . Tests passed. I don't see why we would want to return stored fields... can't think of a reason why we had it there in the first place. In theory, we shouldn't need any kind of way of returning values from Elasticsearch except "fields".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The why not disable explicitly stored fields:

builder.storedFields(NO_STORED_FIELD);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will try that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin I forgot about this part. "fields" API does not allow disabling stored fields.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a comment for the unit test.

expected.put("columns", Arrays.asList(columnInfo("plain", "keyword_field", "keyword", JDBCType.VARCHAR, Integer.MAX_VALUE)));
expected.put("rows", singletonList(singletonList(ignoreAbove ? null : keyword)));
assertResponse(expected, runSql("SELECT keyword_field FROM test"));
if (explicitSourceSetting == false || enableSource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can extract a method, where you pass the expected Map and the query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will look into simplifying the code here.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM.

}
return value;
}

protected Object unwrapMultiValue(Object values) {
protected Object unwrapFieldsMultiValue(Object values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "Fields" plural?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If I remember well, it's a leftover from a previous attempt where both strategies were used at the same time in code: extraction from source and docvalues (unwrapSourceMultiValue) and extraction from "fields" API output (unwrapFieldsMultiValue). No need for the rename, indeed. I'll revert.

sourceBuilder.fetchSource(sourceFields.toArray(Strings.EMPTY_ARRAY), null);
}
docFields.forEach(field -> sourceBuilder.docValueField(field.field, field.format));
fetchFields.forEach(field -> sourceBuilder.fetchField(new FieldAndFormat(field.field, field.format, null)));
scriptFields.forEach(sourceBuilder::scriptField);
}

public boolean noSource() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(seems to only be used once?)

@@ -74,10 +74,10 @@ public void testTextField() throws IOException {
* }
*/
public void testKeywordField() throws IOException {
String query = "SELECT keyword_field FROM test";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nice, prepares the reader for what to expect.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea regarding noSource(), agree, it's used once. As part of @costin's review I already removed noSource() but didn't commit the change just yet.

@astefan astefan merged commit 174fa45 into elastic:ql_fields_api_implementation Feb 4, 2021
@astefan astefan deleted the fields_api_part1 branch February 4, 2021 22:12
astefan added a commit to astefan/elasticsearch that referenced this pull request Feb 5, 2021
astefan added a commit that referenced this pull request Feb 10, 2021
* Integrate "fields" API into QL (#68467)
* QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (#68602)
* Adapt nested fields extraction from "fields" API output to the new un-flattened structure (#68745)
astefan added a commit to astefan/elasticsearch that referenced this pull request Feb 10, 2021
* Integrate "fields" API into QL (elastic#68467)
* QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (elastic#68602)
* Adapt nested fields extraction from "fields" API output to the new un-flattened structure (elastic#68745)

(cherry picked from commit ee5cc54)
astefan added a commit that referenced this pull request Feb 10, 2021
* Integrate "fields" API into QL (#68467)
* QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (#68602)
* Adapt nested fields extraction from "fields" API output to the new un-flattened structure (#68745)

(cherry picked from commit ee5cc54)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying :Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants