Skip to content

Handle unmapped fields in _field_caps API #34071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 19, 2019

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Sep 26, 2018

Today the _field_caps API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices for a specific fields. It makes the response more verbose
but it will allow users of this API to retrieve easily the indices where a field
is present.

@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.5.0 labels Sep 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jimczi
Copy link
Contributor Author

jimczi commented Sep 26, 2018

\cc @costin

@jpountz
Copy link
Contributor

jpountz commented Sep 27, 2018

@costin What is the use-case for this?

@costin
Copy link
Member

costin commented Sep 27, 2018

The reason behind it comes from index patterns - as a consumer it's unclear what indices match the query and thus what the returned fields refer to.

in SQL this will be used to implement SYS TABLES meta call (show me all tables that match this pattern and their content) on top of field caps.
This is a common case with UI tools like Excel - when opening a datastore, they first ask what tables are there and what do they look like. Only then one can point and click around.
With field caps one gets a merged mapping of the target indices however without knowing what indices a field belongs to a field, it's not possible to know what:

  1. indices the pattern resolves to ; in its current form the field caps only tells you whether field matched all indices or in what indices they different; not the indices where they match.
  2. indices where a field is missing. field caps tells us whether a mapped field is mapped differently but if it's not mapped in certain indices, this information cannot be inferred.

@jimczi
Copy link
Contributor Author

jimczi commented Sep 27, 2018

We discussed this internally with @jpountz and @costin . Returning the list of indices per field is too verbose and defeats the purpose of this API which is to get a merged mapping of all fields that are present in an index pattern. Though there are some useful informations that are missing in the API currently:

  • It is not possible to know the list of indices that are present in the response.
  • It is not possible to know if a field is missing in a specific indices if it has the same type in all other indices.

For the first bullet we discussed the possibility to add a top-level entry in the response that lists all concrete indices that match the pattern:

GET foo*/_field_caps?
{
    "indices": ["foo1", "foo2"],
    "fields": {
       ....
     }
}

For the second one we could add a new type unmapped that could be added to the list of types for a field if it's missing in some indices:

GET foo*/_field_caps?fields=bar,baz
{
    "indices": ["foo1", "foo2"],
    "fields": {
       "bar": {
           "keyword": {
                "searchable": true,
                "aggregatable": true,
                "indices": ["foo1"]
           },
           "unmapped": {
               "searchable": false,
               "aggregatable": false,
                "indices": ["foo2"]
            }
      },
     "baz": {
          "text": {
                "searchable": true,
                "aggregatable": true
          }
      }
}

In the example above, bar is present in foo1 but not in foo2 whereas baz is present in all indices.
Even though this change wouldn't break the API it changes the assumptions that a user can make on a field caps response so it should probably be considered as breaking. We'd also like to get some feedbacks from Kibana about this potential change since they strongly rely on this API. @elastic/kibana-discovery could you please find someone who can check the proposal to see if it would be of any help or if there are concerns that this will break the current usage in Kibana ?

@Bargs
Copy link

Bargs commented Sep 28, 2018

Thanks for the ping @jimczi. Someone on @elastic/kibana-management is probably best equipped to answer this question at the moment, as far as I know we only use field_caps when creating a new index pattern in management.

@jen-huang
Copy link

For the second one we could add a new type unmapped that could be added to the list of types for a field if it's missing in some indices

From Kibana Management perspective when creating index patterns, we will need to add unmapped to the list of known ES types for mapping purposes. Without this change it will resolve to the unknown Kibana type which will cause Kibana treat the field as having conflicting types (when there is another type mapped also, of course). I am not sure if we want this behavior simply because the field is missing in some indices, so we may need to do additional work to ignore unmapped when determining if a field is in conflict.

There are a few other areas in Kibana that use _field_caps and the parsing logic is not unified, so they may need similar changes to their handling (watcher, canvas, ML).

It is not possible to know if a field is missing in a specific indices if it has the same type in all other indices.

Going back to the original problem, I suppose this information could be inferred the hard way by finding the difference between the top-level indices and the union of all the field-level indices 🙂

@costin
Copy link
Member

costin commented Sep 29, 2018

Going back to the original problem, I suppose this information could be inferred the hard way by finding the difference between the top-level indices and the union of all the field-level indices 🙂

Unfortunately that is not the case. If a field has the same mapping across all the indices where it is mapped, it return indices: null. However there's no information regarding the indices where the field is not mapped at all (consider field A mapped as integer in X and Y but not mapped in Z and doing a field caps query against X,Y,Z).

This could be alleviated by specifying the indices for field type but that's much more verbose then declaring the indices once per request (at the top-level) and being explicit about unmapped. This is also consistent with #33803 - simply assuming a type is a primitive is bound to create issues when dealing with complex types (object, nested, join).

@jimczi
Copy link
Contributor Author

jimczi commented Nov 23, 2018

I changed the pr to reflect the latest discussions, indices that match the request are now always returned in a section called indices in the main response. I also added a parameter named include_unmapped (false by default) which when set will add the unmapped fields and their corresponding indices as a additional type.

@jimczi jimczi changed the title Always return indices in _field_caps API Handle unmapped fields in _field_caps API Nov 23, 2018
@jimczi
Copy link
Contributor Author

jimczi commented Nov 23, 2018

test this please

Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
@jimczi jimczi force-pushed the field_cap_return_indices branch from 9206f48 to b740e99 Compare November 23, 2018 18:03
@costin
Copy link
Member

costin commented Apr 18, 2019

Fwiw, LGTM

@jimczi jimczi merged commit 5375465 into elastic:master Apr 19, 2019
@jimczi jimczi deleted the field_cap_return_indices branch April 19, 2019 07:17
costin added a commit to costin/elasticsearch that referenced this pull request Apr 19, 2019
Thanks to elastic#34071, there is enough information in field caps to infer
the table structure and thus use the same API consistently across the
IndexResolver.
costin added a commit that referenced this pull request Apr 22, 2019
Thanks to #34071, there is enough information in field caps to infer
the table structure and thus use the same API consistently across the
IndexResolver.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Apr 23, 2019
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
davidkyle added a commit that referenced this pull request Apr 24, 2019
After #34071 the FieldCapabilitiesResponse response map is unmodifiable
jimczi added a commit that referenced this pull request Apr 25, 2019
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
costin added a commit that referenced this pull request Apr 25, 2019
Thanks to #34071, there is enough information in field caps to infer
the table structure and thus use the same API consistently across the
IndexResolver.

(cherry picked from commit f999469)
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Thanks to elastic#34071, there is enough information in field caps to infer
the table structure and thus use the same API consistently across the
IndexResolver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants