Skip to content

[Performance] fp16 support and performance #22242

Open

Description

Describe the issue

FP16 model inference is slower compared to FP32. Does FP16 inference require additional configuration or just need to convert the model to FP16

To reproduce

convert onnx model from fp32 to fp16 using onnxmltools
onnxruntime c++ liblary inference(convert inputs and outputs data format from fp32 to fp16)

Urgency

No response

Platform

Android

OS Version

34

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressionsplatform:mobileissues related to ONNX Runtime mobile; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions