Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Feb 11, 2019

This is the start of a Scalar object model suitable for static and dynamic dispatch to correspond with the existing array and array builder types.

I modified the first aggregation kernel (sum) to use these types for outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(T)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not char[16]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything this should probably be arrow::Decimal128

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's uint128 in abseil.io and in the future, int128.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth stressing over

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap the order of is_valid and type, you'll get better struct packing & alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, though if this ever mattered these classes are being used wrong ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we write small document in the Kernel header about this subject?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to write

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need a virtual destructor if we're going to be using shared_ptr<Scalar>s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I just added some checks to see if shared_ptr<Buffer> in a subclass is properly destructed and it is. I'm happy to add a virtual dtor for good measure, though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot: shared_ptr's block manager is instantiated with the constructed type, so base_ref is pointing to a manager<BinaryScalar> which calls ~BinaryScalar. The memory won't leak with shared_ptr<Scalar>, but it will leak for unique_ptr<Scalar>. Probably best to add the virtual dtor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wesm wesm force-pushed the ARROW-47 branch 2 times, most recently from 4fab706 to fd90264 Compare February 11, 2019 22:59
@wesm
Copy link
Member Author

wesm commented Feb 11, 2019

I added a note that the API is experimental in 0.13 so we can change things.

@xhochy could you take a glance at this and let me know if you're OK with it going in? I think this will be the subject of some iteration as we write more kernels that use these objects.

@wesm wesm changed the title ARROW-47: [C++] RFC arrow::Scalar object model ARROW-47: [C++] Preliminary arrow::Scalar object model Feb 12, 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given scalars are supposed to be "small" it might be worth comment on why you use a shared_ptr type here (i.e. instead of the previous variant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Scalars of nested types may be semantically small but physically occupy a fairly large amount of memory
  • The object model uses subclasses. So you need to use a pointer type of some kind here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like maybe one shouldn't exist? Otherwise just an int and a dictionary pointer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also lacking BuilderType here for fundamentally the same reason. pair<int64_t, std::shared_ptr<Array>> is attractive but is it acceptable for when the index type is uint64_t?

An option is to use/wrap Scalar or std::shared_ptr<Scalar> here. Then BuilderType could be TypeErasedDictionaryBuilder or so:

std:: unique_ptr<ArrayBuilder> builder;
RETURN_NOT_OK(MakeBuilder(pool, dictionary(int32(), string_array), &builder));
for (std::shared_ptr<Scalar> s : user_inputs) {
  RETURN_NOT_OK(builder->Append(s));
  // Append null for strings not in string_array
  // and non-StringScalars
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a decision needs to be made right now. We need to write some kernels that involve dictionaries first

@xhochy
Copy link
Member

xhochy commented Feb 12, 2019

@wesm Looks good, the things that most annoys me is the GTEST_VENDORED option ;)

@wesm
Copy link
Member Author

wesm commented Feb 12, 2019

@xhochy I can remove that, I was having some trouble installing gtest 1.8.0 in my toolchain

wesm added 10 commits February 12, 2019 11:13
Change-Id: Ie7b17cef495b5a6b047e9205cd0408c939774e88
Change-Id: I87f690898c7af7b028f2e50247e96703ede4f457
Change-Id: Ieaa467c3058a44c23f1e3f36176da1592814ffb5
…stamp types

Change-Id: I00ca69a9c1ec1ddaa5b1fd9db35e9db09a12eb8b
Change-Id: I98ae62598dbc35c609dfb92f212465bc0d67f58a
@wesm
Copy link
Member Author

wesm commented Feb 13, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants