Skip to content

Cache vectors usage stats #74974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

Today VectorsUsageTransportAction is pretty heavyweight since it must
decompress and read the mappings for every index in the cluster. In
particular Metricbeat hits this action every 10s by default, and it runs
on the elected master, which causes nontrivial load in an otherwise
quiet cluster.

This commit introduces a cache for the usage stats, keyed by index,
avoiding recomputing these stats in the common case that the mapping
hasn't changed.

Today `VectorsUsageTransportAction` is pretty heavyweight since it must
decompress and read the mappings for every index in the cluster. In
particular Metricbeat hits this action every 10s by default, and it runs
on the elected master, which causes nontrivial load in an otherwise
quiet cluster.

This commit introduces a cache for the usage stats, keyed by index,
avoiding recomputing these stats in the common case that the mapping
hasn't changed.
@DaveCTurner DaveCTurner added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.15.0 labels Jul 6, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jul 6, 2021

@DaveCTurner Thanks very much for your PR, and sorry that this part was a bottleneck for metrics. But I think a better approach should be to completely remove xpack vectors usage stats, as we already report mappings stats of all fields in /_cluster/stats/indices section.

Confirming with @giladgal that it is ok to move from the current:

GET _xpack/usage
...
 "vectors" : {
    "available" : true,
    "enabled" : true,
    "dense_vector_fields_count" : 3,
    "sparse_vector_fields_count" : 0,
    "dense_vector_dims_avg_count" : 3
  }

to cluster stats API:

GET _cluster/stats
...
  {
    "name" : "dense_vector",
    "count" : 3,
    "index_count" : 2
  }

which should be consistent how we report mapping stats for other fields.

@DaveCTurner Sorry for the trouble!

@DaveCTurner
Copy link
Contributor Author

Not at all, doing no work is always better than doing less work 😁

Can we remove these stats in 7.x tho? It looks like a breaking change to me. It also seems to lose the dense_vector_dims_avg_count field, not sure how important that really is.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jul 6, 2021

We have clarified in the past that usage stats updates are not considered breaking, so it is safe to remove those in 7.x

About missing dense_vector_dims_avg_count in the cluster stats, that's what I wanted to confirm with @giladgal. To my mind it is ok to miss this. May be later, we can consider enhancing cluster mapping stats with additional field type specific bits

@giladgal
Copy link
Contributor

giladgal commented Jul 6, 2021

About missing dense_vector_dims_avg_count in the cluster stats, that's what I wanted to confirm with @giladgal. To my mind it is ok to miss this. May be later, we can consider enhancing cluster mapping stats with additional field type specific bits

That's fine by me. Thanks.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jul 7, 2021
We have already decided not to have xpack usage for field mappers
(see elastic#53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see elastic#74974).

This removes xpack actions for vector field.
@mayya-sharipova
Copy link
Contributor

I've created a PR to remove xpack vector usage.

@DaveCTurner
Copy link
Contributor Author

Great, thanks Mayya. Closing this in favour of #75017

@DaveCTurner DaveCTurner closed this Jul 7, 2021
@DaveCTurner DaveCTurner deleted the 2021-07-06-cache-vectors-usage-stats branch July 7, 2021 06:37
mayya-sharipova added a commit that referenced this pull request Jul 13, 2021
We have already decided not to have xpack usage for field mappers
(see #53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see #74974).

This removes xpack actions for vector field.
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jul 13, 2021
We have already decided not to have xpack usage for field mappers
(see elastic#53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see elastic#74974).

This removes xpack actions for vector field.

Backport for elastic#75017
mayya-sharipova added a commit that referenced this pull request Jul 13, 2021
We have already decided not to have xpack usage for field mappers
(see #53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see #74974).

This removes xpack actions for vector field.

Backport for #75017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants