[Performance] fp16 support and performance

### Describe the issue

FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16

### To reproduce

convert onnx model from fp32 to fp16 using onnxmltools
onnxruntime c++ liblary inference（convert inputs and outputs data format from fp32 to fp16）

### Urgency

_No response_

### Platform

Android

### OS Version

34

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.18.0

### ONNX Runtime API

C++

### Architecture

ARM64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] fp16 support and performance #22242

cbingdu
openedon Sep 27, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Assignees

Labels

Type

Projects

Milestone

Relationships

Development