Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting not working for search API #8

Open
alexpop opened this issue Oct 28, 2014 · 21 comments
Open

Sorting not working for search API #8

alexpop opened this issue Oct 28, 2014 · 21 comments
Labels
Aspect: UX How users feel interacting with the project, focusing on function, ease-of-use and accessibility. Component: opscode-erchef Status: move to jira Status: To be prioritized Indicates that product needs to prioritize this issue. Triage: Feature Request Indicates an issue requesting new functionality.

Comments

@alexpop
Copy link
Contributor

alexpop commented Oct 28, 2014

The "sort" parameter in the search API is not working. Also reported in CHEF-2121.

This is causing Chef Manage(chef/chef#2279) and 'knife search' commands to display unsorted lists. Here are some "knife search" results to show the issue:

[apop@mymac chef-repo]$ knife search role "*:*" --id-only --sort asc -VV
...
DEBUG: Initiating GET to https://api.opscode.com/organizations/ap-org1/search/role?q=*%253A*&sort=asc&start=0&rows=1000
...
3 items found

windows_web
linux_base
windows_base

Same unsorted list returned with these commands:

knife search role "*:*" --id-only --sort name
knife search role "*:*" --id-only --sort name+desc
knife search role "*:*" --id-only --sort "name desc"
knife search role "*:*" --id-only --sort ascending
knife search role "*:*" --id-only --sort asc
knife search role "*:*" --sort asc
knife search role "*:*" --sort description
@sean-horn
Copy link
Contributor

Earlier version of this https://tickets.corp.opscode.com/browse/OC-11238
Still appears as late as Manage 1.7.1

@smith
Copy link
Contributor

smith commented Dec 12, 2014

@sean-horn the version of Manage used should not be an issue. Manage will not implement sorting on the client side until it's working on the server. Ideally we would implement these together at the same time.

@sean-horn sean-horn added the bug label Jan 30, 2015
@sean-horn
Copy link
Contributor

In combination with https://github.com/chef/chef-manage-issues/issues/12 this is the number one customer Feature/Fix request.

@markan markan added this to the help-wanted milestone Apr 29, 2015
@sdelano
Copy link
Contributor

sdelano commented Apr 30, 2015

At this point, we're going to have to classify this as a feature and not a bug. The server-side sorting feature has been non-functional since sometime in 2010.

When this feature was working, the Chef Server was storing the flattened-and-expanded node object data as individual fields within Solr. The number of fields in Solr quickly expanded to over a milliion based on all the key combinations of all the nodes on the server, and search times suffered drastically. Solr behaves best when the number of fields is kept low (at the time benchmark comparisons were going up to 32 fields).

To improve (really, unbreak) the search performance on large Chef Server installs, we combined all of the flatten-and-expanded node keys into a content field in Solr. We do special search query transformation to convert a key:val Lucene query to something along the lines of content:__key__%SEP%__val__ before sending queries to Solr.

Since all of the node-related keys are merged into the content field in Solr, we can't actually tell Solr what fields to sort on when returning query results.

Without significant changes and / or new APIs in the Chef Server, we have two options (neither of them particularly great) for obtaining sorted search results: server-side and client-side:

server-side:
With the architecture described above, to obtain sorted results from the server, we'd have to first query Solr for all of the results matching a query, obtain the gzipped node JSON from PostgreSQL for all the results, parse the JSON, and sort based on the arbitrary key passed in by the user.

This would be fine for small result sets, but for large installs this quickly balloons the memory usage of the Chef Server.

client-side:
All of the same memory usage issues apply here, but you also are hindered by the fact that you'd have to stream all of the data to the client in order to sort, meaning it would be infeasible to combine sorting with limits.

@stevendanna
Copy link
Contributor

Perhaps we should open an issue with the client to remove the sort parameter in the DSL methods and the options in the CLI tool, so people don't get confused.

@charlesjohnson
Copy link
Contributor

👍 to that, @stevendanna. Also paging @smith to this thread.

@smith
Copy link
Contributor

smith commented May 1, 2015

Without significant changes and / or new APIs in the Chef Server

I'd be interested to hear what kind of things we're talking about here. Since we're talking about the way the data is stored and queried, "significant" sounds pretty significant.

server-side:
With the architecture described above, to obtain sorted results from the server, we'd have to first query Solr for all of the results matching a query, obtain the gzipped node JSON from PostgreSQL for all the results, parse the JSON, and sort based on the arbitrary key passed in by the user.

This would be fine for small result sets, but for large installs this quickly balloons the memory usage of the Chef Server.

Would we be able to mitigate or make feasible any of this by exploring some kind of SAX-like JSON streaming parser (to keep the memory usage low) or some of the postgres functions for manipulating JSON on the way out of the server?

client-side:
All of the same memory usage issues apply here, but you also are hindered by the fact that you'd have to stream all of the data to the client in order to sort, meaning it would be infeasible to combine sorting with limits.

We could figure out ways to allow sorting only after all rows had been loaded in memory, but that wouldn't survive a page refresh and would present tricky UI design challenges.

Perhaps we should open an issue with the client to remove the sort parameter in the DSL methods and the options in the CLI tool, so people don't get confused.

I'm ok with that, but as long as there are search results displayed in a table with headings, users will click the headings and want it to sort when the headings are clicked, because that's what people expect. They will not stop asking for this feature so long as things are in a table with headings.

Would it be possible to have some kind of preconfigured white-list of key/paths where we make sorting possible and only allow those (configured in chef-server.rb)? Just throwing out ideas at this point.

I'm open to not supporting sorting, but at this point I think the user who is caused the most pain by this might be support, since they have to keep fielding requests about it.

@alexpop
Copy link
Contributor Author

alexpop commented May 1, 2015

In 2015, unsorted lists are not acceptable in my opinion. Look at the products we use on a daily basics, how many give us unsorted lists?
It's one of those hard right things we need to prioritize and implement.

@stevendanna
Copy link
Contributor

If we limited what you could sort on, could we potentially use PostgreSQL cursors (http://www.postgresql.org/docs/9.2/static/plpgsql-cursors.html) to do sorted, paginated results without blowing up memory?

@jcreedcmu
Copy link
Contributor

I don't really know how much relevance it has to a realistic solution to this problem, but I'd like to note that https://github.com/jcreedcmu/psql-gunzip-test is a proof-of-concept that does allow ordering-by and postgresql-indexing on the json fields that are stored gzipped in serialized_object columns. e.g. I can get
SELECT name FROM nodes ORDER BY (gunzip(serialized_object)::json->'normal'->>'foo'); and
CREATE INDEX ON nodes ((gunzip(serialized_object)::json->'normal'->>'foo')); to work on a ubuntu 12.04 vm after doing a chef server omnibus build with postgresql upgraded from 9.2 to 9.3 so that the JSON access functions exist.

@petere
Copy link

petere commented May 5, 2015

Storing the json data gzipped is kind of silly anyway, because PostgreSQL will already compress column values automatically.

@jeremiahsnapp
Copy link
Contributor

@petere I think I was just reading about this yesterday. Does this page describe the compression you are talking about?

http://www.postgresql.org/docs/9.2/static/storage-toast.html

@markan
Copy link
Contributor

markan commented May 5, 2015

@petere Very much agree.

The compression is an artifact from when we supported mysql as well. We wanted to maintain a sql model containing the least common denominator between the two, and it was easier at the time to own the compression ourselves than to model the differences between how postgres and mysql managed things. It doesn't make much sense now.

Going forward we should stop compressing things on the erlang side; it doesn't add any value and closes off cool things like JSONB.

@marcparadise
Copy link
Member

I opened up #225 to separately address erlang-side compression, since that can be done independently of whatever solution we arrive at here.

@charlesjohnson
Copy link
Contributor

Related question: Putting search results aside, what's the difficulty of sorting results on the /nodes, /cookbooks, /roles, /environments, etc. endpoints?

@smith
Copy link
Contributor

smith commented May 11, 2015

@charlesjohnson since they only return ids (aside from cookbooks, which have the most recent version I think), I'm guessing you would just need to add an ORDER BY and possibly an index along with it. That's just my guess though.

@wps-carl
Copy link

+1 for this bug.
As a Chef user this unsorted lists issue is my one gripe with the product. Would really love a solution for this guys!

@jcreedcmu
Copy link
Contributor

What does it even mean to sort the results server-side on, e.g. /nodes, if what's being returned is unconditionally a JSON map of all the nodes? I thought the issue of sorting was meaningful for the search endpoint because it's paginated.

@Tyrael
Copy link

Tyrael commented Sep 23, 2015

is there any way I can help to move this forward?

@jquick
Copy link

jquick commented Mar 3, 2017

Any work around for this? Its very painful to search for every role you want to add.

chrisgit pushed a commit to chrisgit/knife-search_wrapper that referenced this issue Mar 14, 2017
@spotlesscoder
Copy link

Please make the chef server web UI lists sortable

@tas50 tas50 added Help: Good First Issue Type: Bug Does not work as expected. and removed help wanted labels Jan 4, 2019
@tas50 tas50 removed the bug label Jan 4, 2019
@tas50 tas50 added Status: Good First Issue An issue ready for a new contributor. and removed Help: Good First Issue labels Jan 28, 2019
@PrajaktaPurohit PrajaktaPurohit added Triage: Feature Request Indicates an issue requesting new functionality. Aspect: UX How users feel interacting with the project, focusing on function, ease-of-use and accessibility. Component: opscode-erchef Status: To be prioritized Indicates that product needs to prioritize this issue. and removed Status: Good First Issue An issue ready for a new contributor. Type: Bug Does not work as expected. labels Jul 31, 2020
@stevendanna stevendanna removed this from the help-wanted milestone Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aspect: UX How users feel interacting with the project, focusing on function, ease-of-use and accessibility. Component: opscode-erchef Status: move to jira Status: To be prioritized Indicates that product needs to prioritize this issue. Triage: Feature Request Indicates an issue requesting new functionality.
Projects
None yet
Development

No branches or pull requests