-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add calibration based INT8 quantization to TensorRT EP #5842
Conversation
static const std::string kDumpSubgraphs = "ORT_TENSORRT_DUMP_SUBGRAPHS"; | ||
static const std::string kEngineCacheEnable = "ORT_TENSORRT_ENGINE_CACHE_ENABLE"; | ||
static const std::string kEngineCachePath = "ORT_TENSORRT_ENGINE_CACHE_PATH"; | ||
static const std::string kCachePath = "ORT_TENSORRT_CACHE_PATH"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a compatibility breaking change.
users who are already using previous name, may not know it has changed.
i think it would be safer to add another check to throw error if the previous name is found set in the environment variables.
or we don't change the name at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can handle this in a subsequent PR?
Native TRT calibration table is only for the models that can run as a whole on native TRT
ORT generated calibration table can also work with models that only its subgraphs can run on TRT. Those subgraphs will run in INT8 precision in TRT EP if possible