Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Arrow Interop Benchmarks #17194
base: branch-24.12
Are you sure you want to change the base?
Added Arrow Interop Benchmarks #17194
Changes from 18 commits
974bfd6
62a1794
0568c38
6b3fe24
44825ea
dd437eb
81a2046
314d548
56102d5
0740ae6
0d5062c
7f75562
eda0ff8
a015526
1e36256
1807940
1b136dc
e476ff6
4e405ff
212d914
2a6edcd
dd64271
07a0916
f59f349
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similar suggestion at other places too.
is this better than using
fill_n
with back_inserter?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't trust std::vector's constructors, they easily become ambiguous and do the wrong thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be used by other benchmarks too.
Please consider moving these to benchmark common header file or common utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. which header would you suggest that be moved to? same header as type_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good idea. Many times, I had to implement this function for my debugging. Let's make a follow up PR, moving this into
types.hpp
(same header astype_id
), orcudf_test/debug_utilities.hpp
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a similar file here: https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/benchmarks/io/nvbench_helpers.hpp
Worth considering a new file in https://github.com/rapidsai/cudf/tree/branch-24.12/cpp/benchmarks/common.
Declaring an NVBench-specific utility in a non-cuDF namespace within
types.hpp
is inappropriate, as it also forcestypes.hpp
to include an NVBench header, which is something we want to avoid.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with a new header in the benchmarks/common.
We could refactor some of the elements from
io/nvbench_helpers.hpp
with the new header as well.Though the refactor may be too much for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this further? How does this feature specifically benefit unit tests?
Benchmarks and unit tests are treated as "external" users of libcudf, which is why they are not included within the
cudf::
namespace. Thecudf/utilities
directory is reserved for use cases within libcudf or those under thecudf::
namespace.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean it will benefit unit tests, but meant it will benefit developers in debugging/testing.
I usually debug cudf APIs by printing out the type id of the input column. For example:
However, most of the time, for fast iteration with small issues, I just printed the type id integer value, and look up the enum, trying to identify the enum type from value. For example:
(Looking at the enum class
type_id
, and count from the first to the 23rd line, I can know type 23 isSTRING
)As such, having such
type_to_string
utility is very helpful. This debug print statement can be inserted anywhere. Most of the time I insert it into the middle of a cudf API. Sometimes I use it in unit tests too. It would be most beneficial to add it into libcudf's utilities module. We can either use it for debugging libcudf APIs or for external usage in benchmark/unit tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. It's a different feature request.
You are referring to the fact that
stringify_type
can be a common utility for both benchmarks and tests. A host-device print utility likeprint_type_id
together withstringify_type
can be added to https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/include/cudf_test/print_utilities.cuh. Once it's done, this file can simply include that print utility header and add a single line for nvbench readability enhancement:NVBENCH_DECLARE_ENUM_TYPE_STRINGS(cudf::type_id, cudf::test::print::stringify_type, stringify_type)
I’m inclined to leave this refactoring for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #17376 to track this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for filing the issue 👍