Closed
Description
It would be interesting to see what we need to do to be able to do mixed precision training with Flux (https://forums.fast.ai/t/mixed-precision-training/20720, http://on-demand.gputechconf.com/gtc/2018/video/S81012/). With the way new hardware are moving it seems it will likely be important to support.
Right now, I think the major obstacle to overcome is the implementation of Float16 in Julia. AFAIU, these get sent to LLVM as Int16
and things are implemented by widening to Float32
and then truncating.
I think Valentin worked a bit on this (JuliaLang/julia#26381) so perhaps he (you) could give some input, cc @vchuravy, also cc @Keno.