Need float16 support in save op

The usual scenario for float16 inference is as follows:
1. We first train a model in float32 mode
2. We convert the float32 weights into float16, and save them on dist
3. During inference, we load the float16 weights and model, run inference engine in float16 mode.

To support this, we need to make save op capable of saving weights as float16 type.