-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: float16 #67127
Comments
What would storage only mean in terms of the language? |
It would mean not much in the math package, and no arithmetic operations, only converting from one to another to store on the drive or transfer over the network. We should probably do a more proper implementation if possible, tho, but that would not be the top priority. |
@TailsFanLOL you want to reopen #32022 so create two new builtin premitive types ? type float16 float16
// complex32 is made of two [float16] for real and imaginary part.
type complex32 complex32 But only have |
Well, that pretty much sums it up. Again, arithmetics would be nice, but I would like some basic conversion only implementation first to get things done. EDIT: there's also bfloat16, which has a different format and is intended for neural computations only, and is also supported on some xeons, but I guess we won't really need this yet. |
Which #32022 (comment) gives the specification of Edit: just saw the edit in #67127 (comment). |
So here is a more complete picture AFAIT: Add two new builtin types: // float16 is the set of all IEEE754 half precision numbers.
type float16 float16
// complex32 is made of two [float16] for real and imaginary part respectively.
type complex32 complex32 Theses types would not support any operation except converting them back and forth to Add new functions to the func Float16bits(f float16) uint16
func Float16frombits(f uint16) float16 This allows to use them for compact data interchange as intended by IEEE754. |
Why not just bits/frombits which go directly from |
A liberal interpretation of:
would allow us to implement I don't yet see value in adding |
EDIT0: nvm just ignore everything I said I was sleepy and missed the entire point of the message above |
About the "why we should allow doing math" argument, well, the thing is, most programmers may assume that if float32 is faster than float64, then float16 must be even faster on all hardware. Tho by the time everyone gets the update fp16 arithmetic might even become widespread, so we could just make it a disableable compiler warning. |
As this proposal is for a storage-only format, this can be implemented as a separate package. That package can provide conversions between the new 16-bit floating-point type and In the language I don't think we could get away with having a storage-only type. If we have |
Our use case is handling float16 tensor outputs from a NPU on the RK3588 processor. We simply convert the output buffer from CGO to uint16 then make use of the https://github.com/x448/float16 package to convert to float32 for handling within Go. We did attempt to perform the conversion via CGO using the ARM Compute library which has NEON SIMD instructions to accelerate the conversion, but this was slower than sticking with the pure Go library above. However we do achieve a 35% performance increase (on RK3588) by precalculating a uint16->float32 lookup table to convert the buffer. On a threadripper workstation this method gives us a 69% increase. Further details on what we are doing here x448/float16#47 (comment) |
Based on the above discussion this is a likely decline. Leaving open for four weeks for final comments. |
I tried a similar method to the one above on Haswell and RK3399, and the cgo implementation is faster than the lookup table. Will post the code soon, I am not home rn |
Sorry for the wait, I have forgotten. I lost the original numbers and the program and made a quick and dirty replacement.
I don't know how I got cgo to be quicker. It's probably the data conversion between the two. I will try on other platforms tomorrow. |
if one divides the time it took for the table it's 1,2871563720703125 us per the 8 values and that's faster, perhaps it's measuring inaccurately. needs better benchmark |
Thanks for providing your code. I have taken it and applied it to our use case of converting the tensor outputs from float16 to float32 and benchmarked it against the x448/float16 Go code and Lookup table versions. That code is here. Benchmark data as follows;
The The
|
@TailsFanLOL An update as I realised I made a bad comparison in the above code as each chunk of our f16 buffer was involving a CGO call in a loop, instead of converting the entire buffer in C with a single CGO call. Updated benchmark code here. Benchmark data as follows;
As can be seen in The
Results running on ARM RK3588 processor show 3x performance improvement over Lookup version.
|
No change in consensus. |
Originally posted by @ianlancetaylor in #32022 (comment)
Relevant x86 commands for using float16 as a storage only format were added to AMD and Intel CPUs in ~2013. As for arithmetic and complex32 support, it was added to Sapphire Rapids Xeon series CPUs, and it was also briefly added to Alder Lake by accident which was enableable via certain BIOSes but got removed in later revisions (this probably means it is gonna be in upcoming Core CPUs).
As of other architectures, it was added in certain ARM, RISC-V and PowerISA CPUs.
Can you guys please just give us at least a limited storage-only implementation for float16? It is pretty useful for games and AI. Certain third party packages also provide it, and is used in popular libraries like CBOR, that could speed them up.
The text was updated successfully, but these errors were encountered: