-
Couldn't load subscription status.
- Fork 3.9k
ARROW-47: [C++] Preliminary arrow::Scalar object model #3604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(T)
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not char[16]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything this should probably be arrow::Decimal128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's uint128 in abseil.io and in the future, int128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is worth stressing over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap the order of is_valid and type, you'll get better struct packing & alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, though if this ever mattered these classes are being used wrong ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we write small document in the Kernel header about this subject?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to write
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need a virtual destructor if we're going to be using shared_ptr<Scalar>s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I just added some checks to see if shared_ptr<Buffer> in a subclass is properly destructed and it is. I'm happy to add a virtual dtor for good measure, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot: shared_ptr's block manager is instantiated with the constructed type, so base_ref is pointing to a manager<BinaryScalar> which calls ~BinaryScalar. The memory won't leak with shared_ptr<Scalar>, but it will leak for unique_ptr<Scalar>. Probably best to add the virtual dtor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
4fab706 to
fd90264
Compare
|
I added a note that the API is experimental in 0.13 so we can change things. @xhochy could you take a glance at this and let me know if you're OK with it going in? I think this will be the subject of some iteration as we write more kernels that use these objects. |
cpp/src/arrow/compute/kernel.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given scalars are supposed to be "small" it might be worth comment on why you use a shared_ptr type here (i.e. instead of the previous variant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Scalars of nested types may be semantically small but physically occupy a fairly large amount of memory
- The object model uses subclasses. So you need to use a pointer type of some kind here
cpp/src/arrow/type_traits.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like maybe one shouldn't exist? Otherwise just an int and a dictionary pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also lacking BuilderType here for fundamentally the same reason. pair<int64_t, std::shared_ptr<Array>> is attractive but is it acceptable for when the index type is uint64_t?
An option is to use/wrap Scalar or std::shared_ptr<Scalar> here. Then BuilderType could be TypeErasedDictionaryBuilder or so:
std:: unique_ptr<ArrayBuilder> builder;
RETURN_NOT_OK(MakeBuilder(pool, dictionary(int32(), string_array), &builder));
for (std::shared_ptr<Scalar> s : user_inputs) {
RETURN_NOT_OK(builder->Append(s));
// Append null for strings not in string_array
// and non-StringScalars
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a decision needs to be made right now. We need to write some kernels that involve dictionaries first
|
@wesm Looks good, the things that most annoys me is the GTEST_VENDORED option ;) |
|
@xhochy I can remove that, I was having some trouble installing gtest 1.8.0 in my toolchain |
Change-Id: Ie7b17cef495b5a6b047e9205cd0408c939774e88
Change-Id: I87f690898c7af7b028f2e50247e96703ede4f457
Change-Id: Ieaa467c3058a44c23f1e3f36176da1592814ffb5
…stamp types Change-Id: I00ca69a9c1ec1ddaa5b1fd9db35e9db09a12eb8b
Change-Id: I98ae62598dbc35c609dfb92f212465bc0d67f58a
|
+1 |
This is the start of a Scalar object model suitable for static and dynamic dispatch to correspond with the existing array and array builder types.
I modified the first aggregation kernel (sum) to use these types for outputs.