Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chef search results are limited to 10,000 records when using external OpenSearch 1.3.x regardless of max_result_window #3786

Open
oberones opened this issue Feb 27, 2024 · 2 comments
Labels
Status: Untriaged An issue that has yet to be triaged.

Comments

@oberones
Copy link

Chef Server Version

Chef Infra Server 15.7.0

Platform Details

Ubuntu 20.04 running on c5.2xlarge instances in AWS EC2

Configuration

Chef HA cluster with external OpenSearch 1.3.14 and PSQL 13 backends. Recently migrated from 12.22.4 to 15.7.0.

Scenario

Since migrating to Infra Server 15.7.0, all chef search results are capped at 10,000. The max_result_window on the index is currently set to 1,000,000. We have manually adjusted it several times as a test with no result. No matter what value we configure in the index, only 10,000 results are returned.

Steps to Reproduce

Provision a Chef HA cluster with an OpenSearch 1.3.x backend, register at least 10,000 nodes, and then attempt to run knife status or knife search *:*.

Expected Result

All registered node objects should be returned.

Actual Result

Only 10,000 objects are returned by the query, regardless of how many objects are in the index. In this particular case, we have over 20,000 registered nodes but only 10,000 are returned in the search results.

$ knife node list | wc -l
   20343
   
$ knife status | wc -l
   10000

$ curl https://opensearch.host/chef/_settings -u chef_user | jq .chef.settings.index.max_result_window
"1000000"

Breadcrumbs

This appears to be related to differences in the search API between Elasticsearch 6 and OpenSearch 1.3.x. The following code assumes that the value for "total" will equal the total number of records, which was the case in elasticsearch 6.
https://github.com/chef/chef-server/blob/main/src/oc_erchef/apps/chef_index/src/chef_opensearch.erl#L84-L94

However, OpenSearch will return a value of '10000' in the total field unless ?track_total_hits=true is added to the query. For example, Chef server 15.7.0 receives the following result from the default OpenSearch query: "hits":{"total":{"value":10000,"relation":"gte"} compared to the following when track_total_hits is added: "hits":{"total":{"value":61964,"relation":"eq"}. This behavior is further documented in the following bug report:
opensearch-project/OpenSearch#9720

As well as the following forum thread:
https://forum.opensearch.org/t/opensearch-dashboards-shows-10000-as-hits-total/8397

@oberones oberones added the Status: Untriaged An issue that has yet to be triaged. label Feb 27, 2024
@matemikulic
Copy link

Yes, I can confirm this is causing issues for when you have over 10k nodes.
This should be definitely mitigated against.
This is a big mistake on Opersearch side from my pow. You don't make your searches faster by giving partial results and not even giving users option to change Opensearch behavior in cluster settings. But they are running with this and I don't see this changing. So all apps that need full exact results need to send track_total_hits=true in body for each search request.

@matemikulic
Copy link

matemikulic commented Oct 11, 2024

I came up with workaround until this is fixed in Chef code. Chef 15.9.38.
If you deploy an external Opensearch cluster, on balancer rewrite the path that Chef sends /chef/_search to /chef/_search?track_total_hits=true.

Example for haproxy:
acl is_search path_reg ^/chef/_search$
http-request set-path /chef/_search?track_total_hits=true if is_search

And also inside the Opensearch cluster you will need to raise max_result_window to a value higher than you have nodes:
curl -XPUT -u user:pass -H "Content-Type: application/json" https://your-cluster.example.com:9200/chef/_settings -d '{ "index" : { "max_result_window" : 100000 } }'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Untriaged An issue that has yet to be triaged.
Projects
None yet
Development

No branches or pull requests

2 participants