Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_entries always outputs keys sorted - can it be avoided? #561

Closed
joelpurra opened this issue Aug 26, 2014 · 7 comments
Closed

to_entries always outputs keys sorted - can it be avoided? #561

joelpurra opened this issue Aug 26, 2014 · 7 comments

Comments

@joelpurra
Copy link
Contributor

I care about output ordering, as I intend to use automated jq output in a report. After using to_entries (or with_entries) my original ordering is lost.

In these examples I want to keep the order c, b, a. The numeric values are an example; I can't use them for key ordering.

Order changed by to_entries.

echo '{ "c": 1, "b": 2, "a": 3 }' | jq 'to_entries'
[
  {
    "key": "a",
    "value": 3
  },
  {
    "key": "b",
    "value": 2
  },
  {
    "key": "c",
    "value": 1
  }
]

Order changed by with_entries, because it uses to_entries.

echo '{ "c": 1, "b": 2, "a": 3 }' | jq 'with_entries(.)'
{
  "a": 3,
  "b": 2,
  "c": 1
}

Using from_entries is no problem. Using reverse as an example only.

echo '{ "c": 1, "b": 2, "a": 3 }' | jq 'to_entries | reverse | from_entries'
{
  "c": 1,
  "b": 2,
  "a": 3
}

My problem is similar to the ordering issues in #48 "tsv and/or csv output support". I'm writing generic code though, and want to avoid manually specifying column ordering for each dataset.

Should to_entries produce key-sorted output?

@wtlangford
Copy link
Contributor

Do keep in mind that there's technically not an innate order for object keys. They happen to be (generally) stored in the order they are added, but that's an underlying implementation detail that really shouldn't be relied upon, as it could change without notice.

That being said, to_entries does require some ordering to exist, even if it is arbitrary. The current implementation ends up calling jv_keys(), which calls qsort() on the key list it returns, so you get keys sorted by unicode code point order, which is very deterministic, if not strictly lexicographical.

As for whether or not it should sort them... I feel like they should be sorted. The implementation of object is doing some interesting hashtable things, though. It seems to be creating an indirection array to be used with hashing, to maintain insertion order and still allow O(1) average-case lookups.

@nicowilliams, thoughts?

@joelpurra
Copy link
Contributor Author

@wtlangford:

Do keep in mind that there's technically not an innate order for object keys. They happen to be (generally) stored in the order they are added, but that's an underlying implementation detail that really shouldn't be relied upon, as it could change without notice.

Yes - and generally I don't care much. While my human-parseable output generally is generated in the preferred order, or sortable by value, to_entries messes with the output in a way other function calls don't.

That being said, to_entries does require some ordering to exist, even if it is arbitrary. The current implementation ends up calling jv_keys(), which calls qsort() on the key list it returns, so you get keys sorted by unicode code point order, which is very deterministic, if not strictly lexicographical.

Seems sorting keys when order is arbitrary is completely unecessary?? Why does to_entries require sorting? It's not like from_entries is relying on it.

As for whether or not it should sort them... I feel like they should be sorted. The implementation of object is doing some interesting hashtable things, though. It seems to be creating an indirection array to be used with hashing, to maintain insertion order and still allow O(1) average-case lookups.

If it fixes my opinionated "problem", it would be very nice. It does seem others have stumbled upon it too, as per other issues.

@nicowilliams
Copy link
Contributor

jq now actually keeps insertion order for objects internally, but it sorts keys in the keys builtin (which to_entries uses), via the jv_keys() C function. Dropping the qsort() call in jv_keys() is half the story, since the formatter depends on it (and since we could optimize jv_keys() if don't need to sort there).

Would this be a backwards-incompatible change? No, I don't think so.

@nicowilliams
Copy link
Contributor

Although the -S option will not sort to_entries output, just objects' keys.

@nicowilliams
Copy link
Contributor

Since keys is documented as returning keys in sorted order... I've added a keys_unsorted and made to_entries use it. jq -S ... still works, of course.

@joelpurra
Copy link
Contributor Author

@nicowilliams: excellent, thank you very much =)

@Gerst20051
Copy link

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants