-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34690: [Go]: Add sort indices function #34719
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
|
// Sort the indices slice based on the values in the input array | ||
sort.Slice(indices, func(i, j int) bool { | ||
// TODO: not sure what to do here? | ||
// compare using scalar comparison? | ||
a := inArr.Buffers[1].Buf[indices[i]*sz : (indices[i]+1)*sz] | ||
b := inArr.Buffers[1].Buf[indices[j]*sz : (indices[j]+1)*sz] | ||
return bytes.Compare(a, b) < 0 | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade So I've gotten this far, but the bytes.Compare
comparison here was just to get something compiling. 😃 In reality we need to compare the scalar values represented by those bytes, I believe. (I've also only tried to make it work for int32 so far.)
I'm not sure how to compare scalar values inside the array with one another? Are there existing comparators that can be used her? And is there a way to access individual items in an array, other than how I'm doing it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to have a look at https://github.com/apache/arrow/blob/main/go/arrow/compute/internal/kernels/scalar_comparisons.go (which implements the less than/equals/greater than/etc.) operations at a scalar level. I'll try to take another look at this early next week but you might be able to get an idea from over there on how to tackle this.
A tricky piece to think about though is that we'd want to be able to manage passing an option to allow sorting a record batch / table by multiple columns. In that scenario I think we can learn from https://arrow.apache.org/blog/2022/11/07/multi-column-sorts-in-arrow-rust-part-1/ and the corresponding part 2 and potentially leverage similar techniques to implement the sorting here.
expect []int64 | ||
valueType arrow.DataType | ||
}{ | ||
{"simple int32", []string{`[1, 1, 0, -5, -5, -5, 255, 255]`}, []int64{3, 4, 5, 2, 0, 1, 6, 7}, arrow.PrimitiveTypes.Int32}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to get a single simple test case to pass for now
@hermanschaaf Is this issue still actively being worked on? We'd love to be able to use a sorting function but if it's not being pursued currently I would be interested in picking it up. |
@thorfour Yes, I'm not actively working on this right now. If you'd like to pick it up, please do! |
|
Just opening an early draft for now, still trying to figure things out.