-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-1896: [C++] Do not allocate memory inside CastKernel. Clean up template instantiation to not generate dead identity cast code #3642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting this stuff felt sooooo good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yesssss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was being maintained with templates before... that was overly elaborate. If this gets too complicated, we can easily code-generate it
|
@emkornfield @xhochy @fsaintjacques for your comments... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass through. I need to look at where the aggregate stuff landed to see if output_type makes sense there.
On code generation I think its worth experimenting with it, but we should try to have some ground rules for when it makes sense (and a CMAKE rule to regenerate automatically so generated code doesn't get out of sync with checked in code. I think I prefer seeing the generated code so I do think it should be checked in, but I think I might prefer it living in its own directory if possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do DCHECKS support a message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copy-pasta. I'll add a message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the memset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in the original version, so I left it
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L1140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably isn't necessary, but it removing it breaks unit tests, could we file a follow-up JIRA to figure out why and remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: I know this is the convention in the code base if we pass in parameters that are going to be copied anyways, we might consider passing by value and using std::move inside the method instead. This allows for passing one time use parameters with std::move as well, which can avoid a copy. The performance benefits (especially for shared_ptrs) aren't clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with me. done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to indicate what script generated the file? (I know grep/directory structure would make it not too hard to find but still might be useful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULLPTR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only necessary in header files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to add a test for this (one of the reasons I added gmock...
I didn't on the first pass because I figured I would add it as part of the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have pushed harder to get ValueCounts done :(
cpp/src/arrow/array.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not your change but can we use the unknown null count constant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/compute/kernel.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious why the preference for this form vs the original?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way the memory pool doesn't have to leak. I don't have a strong preference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither do I just curious about the change.
|
I'll take care of the small comments tomorrow. I'm sort of against writing a lot more new code in this patch since the goal here is to refactor and remove dead code while leaving existing behaviors unchanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i left a bug in here that ASAN caught, we should also check that buffer_size > 0 (I can file a JIRA and submit a PR separately if you want me to handle it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Can you add a unit test as a follow up?
|
I can understand the need for the code generation but I still hope that someday a C++ ninja comes along and removes it again. With C++ template system and C preprocessor, I would not have expected that we need code generation. |
|
So TensorFlow has a similar situation for casts, and they have avoided code generation: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/cast_op_impl.h#L36 It's easier in TensorFlow, though:
I view code generation as a last resort, so I don't exact us to get into the habit of it |
…nside CastKernel. Use code generation to avoid instantiating CastFunctors for identity casts that are never used Change-Id: Ied8293612f46b953bab6ff1fdd737a43a65cba8f
Change-Id: I694a562066ff49863edf25eb81feba063390d0f6
Change-Id: I5180e072b4451ec575c6b9157679fa31dfba5b74
|
Would like to merge this once the build is passing, if there are any parting thoughts? |
Change-Id: Ia083b00f529b9fc172694417e0e42cbe9b00b480
Change-Id: I5ccbff285faf8e364d939d46269ff8ae1186509c
|
+1. Appveyor build https://ci.appveyor.com/project/wesm/arrow/builds/22384650 |
Also resolves ARROW-4110, which has been on my list for some time.
This ended up being a huge pain.
detail::PrimitiveAllocatingUnaryKernelcan now allocate memory for any kind of fixed width type.detail::PropagateNullsArrayData, since there are cases where it may be set tokUnknownNullCount(e.g. after a slice) and you need to know what it is. This isn't tested but I suggest addressing this in a follow up patchI also moved hand-maintained macro spaghetti for instantiating CastFunctors into a Python code-generation script. This might be the most controversial change in this patch, but the problem here is that we needed to exclude 1 macro case for each numeric type -- currently they were relying on
NUMERIC_CASES. This means the list of generated types is slightly different for each type, lending to poor code reuse. Rather than maintaining this code by hand, it is so much simpler to generate it with a script.Speaking of code generation, I think we should continue to invest in code generation scripts to make generating mundane C++ code for pre-compiled kernels simpler. I checked the file in but I'm not opposed to auto-generating the files as part of the CMake build -- we could do that in a follow up PR.