Skip to content
This repository was archived by the owner on Apr 28, 2023. It is now read-only.

Types support and min_distance.py function example #512

Merged
merged 2 commits into from
Jun 13, 2018

Conversation

nicolasvasilache
Copy link
Contributor

@nicolasvasilache nicolasvasilache commented Jun 12, 2018

Addresses #511, naive options perf of the 5 kernel variant seems reasonable for a start.
A lot more work is needed to get the single kernel and the 1-staged impl to good performance so the 2-staged version will probably be a nice low hanging fruit.

return Float(16, 4);
case lang::TK_VECTOR4_FLOAT32:
return Float(32, 4);
case lang::TK_VECTOR4_FLOAT64:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these used in this PR? Explicit vector types in the front-end seems like a bad idea to me. I thought that Halide would reject them anyway at Func definition time - it views vectorization as a property of a loop, and thus target-dependent and implicit, not as a property of the data type.

ftynse
ftynse previously requested changes Jun 12, 2018
Copy link
Contributor

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving your vector types into a separate WIP PR. They are not supported anywhere downstream (promotion, reductions) and will also fail on any operation other than addition given your library code.

@nicolasvasilache
Copy link
Contributor Author

cc @Ravenwater

@nicolasvasilache
Copy link
Contributor Author

Sounds, good I'll put them in a WIP PR

@abadams they are currently used in test/cuda/test_compile_and_run.cc, I didn't have time to implement checks before leaving last night so it may well be incorrect code.

Re. types it is one of the goals of TC to help easily implement and experiment with various types (quantized, posits, vector, sparse etc). Is it envisionable to have the pieces of Halide we use currently view ADTs as first class citizens? If not, at least let us propagate more exotic types to the TC mapper.
Then we can easily implement low-level operations as libraries in the generated code (like the strawman I did for the vector types in library.h).

@nicolasvasilache nicolasvasilache force-pushed the pr/types branch 2 times, most recently from e9686e2 to 7fae544 Compare June 13, 2018 14:49
@abadams
Copy link
Contributor

abadams commented Jun 13, 2018

In Halide, mathematical vectors, complex numbers, etc are represented as Tuples. The vector types are for simd, which is just a way of executing a loop, not a data type. E.g. simd on the next version of arm doesn't necessarily have a statically known width, so that's not a useful data type. If we want vector front-end types for the purpose of representing vectors, e.g. in R^3, we should lower them to Halide Tuples.

typedef unsigned short uint16;
typedef unsigned int uint32;
typedef unsigned long uint64;
// typedef half float16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially means that the compilation will fail with some, obscure for the user, message from nvrtc when attempting to use float16. I'd rather ditch it, or add a TC compilation error saying "float16 not supported because nvrtc does not want it".

_(TK_INT16, "int16", "int16") \
_(TK_INT32, "int32", "int32") \
_(TK_INT64, "int64", "int64") \
_(TK_FLOAT16, "float16", "float16") \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have both float32 and float? Do we want to differentiate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the semantics for the other types it is consistent.
Removing float will break codes.
I don't see a big issue in the duplication here.

tc/lang/sema.h Outdated
case tok: \
code_ = c; \
bits_ = b; \
#define TYPE_INFO_OPTION(tok, c, b, l) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is l user for?

This commit adds missing type support to the lexer, semantic analyzer,
and cuda implementation.
This also adds language and end-to-end functional tests for all types we support.

An annoying type issue comes from the inability to include system dependencies
with NVRTC so half support is explicitly disabled.
Maybe it is time to think about moving to NVCC in a separate process, which
we have been discussing for some time.

The following commit adds an issue submitted by @mdouze which now runs
properly thanks to this commit.
This commit adds examples provided by @mdouze where the argmin over
a reduced sum is required.
These examples are now functional thanks to the previous commit but extra
work is needed to make some of the variants perform reasonably:
1. for the fused kernel to parallelize properly across blocks we need
grid synchronization. This may be a nice concrete use case @math-fehr
2. for the 1-stage fissioned implementation we need device-wide synchronization
otherwise we will always be limited by running on a single SM
3. the 2-stage fissioned implementations can give us performance today
after tuning.

Without tuning the results on the larger size (1e7, 32, 16)
are shown [here](https://gist.github.com/nicolasvasilache/8a0addfb6831a831b2dca45c612f9c2d).
`mindis_16_32_10000000` is the totally fused kernel and performs evry poorly.
The following 5 kernels correspond to the final use case of interest.
@nicolasvasilache nicolasvasilache merged commit 7e2df7c into facebookresearch:master Jun 13, 2018
@nicolasvasilache nicolasvasilache deleted the pr/types branch June 13, 2018 19:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants