Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ORTModule branch with master and fix tests #6526

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
202 commits
Select commit Hold shift + click to select a range
64709b1
Deprecate Python global configuration functions [Part 1] (#5923)
edgchen1 Dec 15, 2020
297c824
remove dnnl_dll_path from post build copy (#6142)
jywu-msft Dec 15, 2020
980a93c
Model Fusion For Bart (#6105)
liuziyue Dec 15, 2020
ac62cf8
Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6…
RyanUnderhill Dec 16, 2020
939cc9b
Enable running the mnist_training sample without cuda (#6085)
georgen117 Dec 16, 2020
b648bf6
nnapi add min max support (#6117)
guoyu-wang Dec 16, 2020
0978d2b
Fix CUDA test hang: (#6138)
toothache Dec 16, 2020
aa49e47
Fix TensorRT kernel conflict issue for subgraphs of control flow oper…
stevenlix Dec 16, 2020
8fd0858
Add gradient registration for Abs. (#6139)
Dec 16, 2020
8269048
Partition initial optimizer state for Zero-1 (#6093)
ashbhandare Dec 16, 2020
7250562
Fix edge case in BFCArena where allocation failures could lead to an …
skottmckay Dec 16, 2020
344a2a8
Revert "work around of the build break in mac (#6069)" (#6150)
snnn Dec 16, 2020
0fa04bd
Fix clean_docker_image_cache.py detection of image pushes. (#6151)
edgchen1 Dec 17, 2020
503b61d
MLAS: add NEON version of int8 depthwise convolution (#6152)
tracysh Dec 17, 2020
36c03b3
Using a map of of ops to stages as input of partition function. (#5940)
Dec 17, 2020
efa1b0d
Minor fix to satisfy c++14 (#6162)
pranavsharma Dec 17, 2020
32c67c2
Deprecating Horovod and refactored Adasum computations (#5468)
Dec 18, 2020
dec703b
Update TensorRT-ExecutionProvider.md (#6161)
jayrodge Dec 18, 2020
34725ae
Bugfix for topk cuda kernel (#6164)
duli2012 Dec 18, 2020
98d8a3e
Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)…
yufenglee Dec 18, 2020
c339bb2
Remove ignored build warnings for pybind on Mac (#6165)
guoyu-wang Dec 18, 2020
adc2071
save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)
baijumeswani Dec 18, 2020
824ef9a
Don't try to bind unused inputs in the Training frontend (#6166)
Dec 18, 2020
86493e6
Update documentation for contributing a PR and add deprecation notice…
pranavsharma Dec 18, 2020
39aedbc
aggregate model states only for the case when mixed precision was tru…
baijumeswani Dec 18, 2020
bbb52e9
[NNAPI EP] Enable per-channel quantization for QlinearConv (#6155)
guoyu-wang Dec 19, 2020
11b0a54
Fix typo in BERT pretraining script (#6175)
Dec 19, 2020
cd3a5ac
Update get_docker_image.py to enable use without image cache containe…
edgchen1 Dec 19, 2020
2da8060
Helper for compiling EP to generate deterministic unique ids for use …
skottmckay Dec 21, 2020
f874260
Backend APIs for checkpointing (#5803)
jingyanwangms Dec 21, 2020
201d0db
Android coverage dashboard (#6163)
satyajandhyala Dec 21, 2020
ea9cfa5
Add usage details of unified MCR container image (#6182)
smkarlap Dec 21, 2020
53307a5
improve perf for softmax (#6128)
weixingzhang Dec 21, 2020
67ac6ae
Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)
Dec 22, 2020
234e94b
Add Status.csv to EP Perf Tool (#6167)
oliviajain Dec 22, 2020
945fae8
Lochi/quantization tool for trt (#6103)
chilo-ms Dec 22, 2020
fc27074
Implement ScatterND for CUDA EP (#6184)
hariharans29 Dec 22, 2020
04b3e0e
Condition fix in Resize operator (#6193)
hariharans29 Dec 22, 2020
a8b4826
Clean up checkpoint tests to use the new checkpoint functions (#6188)
baijumeswani Dec 22, 2020
21395f8
Implement comparing outputs that are sequence of maps of strings to f…
Dec 22, 2020
c562952
Dockerfile to build onnxruntime with ROCm 4.0
jessebenson Dec 21, 2020
0494a0f
Add ability to skip GPU tests based on GPU adapter name (#6198)
Dec 22, 2020
7347996
Openvino ep 2021.2 (#6196)
sfatimar Dec 23, 2020
1fc7f92
Fix a memory leak in test_inference.cc (#6201)
snnn Dec 25, 2020
52228a7
Use TArray in AMD element-wise kernels, rather than manually copying …
jessebenson Dec 22, 2020
7ccdfed
Remove most ROCm-specific element-wise code and reuse CUDA element-wi…
jessebenson Dec 22, 2020
8a0f5c5
Minor change to improve performance for operator Pad. (#5537)
xadupre Dec 28, 2020
2d09db6
Support double for operators Log, Reciprocal, Sum (CPU) (#6032)
xadupre Dec 28, 2020
111ac29
Support double for operators Where, LpNormalisation (#6034)
xadupre Dec 28, 2020
df7e2f3
Support double for operators Relu, Tanh, Sigmoid (#6221)
xadupre Dec 29, 2020
bbb6b41
Fix ImportError in build.py (#6231)
mgoin Dec 30, 2020
5c584b2
Removed executor todo that looks dead. (#6234)
michaelgiba Dec 31, 2020
1b23b28
Remove MKLML/openblas/jemalloc build config (#6212)
snnn Dec 31, 2020
3911105
Remove python 3.5
snnn Dec 31, 2020
c15a858
Update the readme file
snnn Dec 31, 2020
39a988c
Upgrade build.py to assert for python 3.6+
WilliamTambellini Dec 1, 2020
4cc2ffe
Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233)
hariharans29 Dec 31, 2020
ecb2e11
MLAS: handle MlasGemm(M/N/K==0) cases (#6238)
tracysh Dec 31, 2020
70e2f96
Support double for operator TopK + fix one bug in TopK implementation…
xadupre Dec 31, 2020
5968a91
Support double for operator Gemm + fix bug in gemm implementation for…
xadupre Dec 31, 2020
84addcd
Support double for operator ReduceMean, ReduceLogSumExp (#6217)
xadupre Dec 31, 2020
cd14c1a
Support double for operator ArgMin (#6222)
xadupre Dec 31, 2020
d5cb17c
Update BUILD.md
snnn Dec 31, 2020
1685167
Update manylinux docker image to the latest (#6242)
snnn Jan 1, 2021
ffb4b62
Fix allocator issue for TensorRT IOBinding (#6240)
HectorSVC Jan 1, 2021
46e0e4e
Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) o…
Jan 2, 2021
c8de3f3
Refactor EP Perf Tool (#6202)
oliviajain Jan 4, 2021
93bf7c4
Documentation for distributed CI tests pipeline (#6140)
baijumeswani Jan 4, 2021
6fd9d34
Remove a debug log in provider_test_utils.cc (#6200)
snnn Jan 4, 2021
493bf93
Add the Concat Slice Elimination transform, fix constant_folding tran…
ashbhandare Jan 5, 2021
ce6161c
Add MakeStringLite which uses current locale, update some MakeString …
edgchen1 Jan 5, 2021
addb4b8
Liqun/speech model loop to scan (#6070)
liqunfu Jan 5, 2021
eea3806
model parallel refinement (#6244)
pengwa Jan 6, 2021
d42399e
Allow querying a GraphProto's doc_string as part of ModelMetadata (#6…
hariharans29 Jan 6, 2021
2347de4
Fix Linux/Mac error message on input type mismatch (#6256)
hariharans29 Jan 6, 2021
431604e
add bfloat16 to gathergrad type constrains (#6267)
souptc Jan 6, 2021
bbc9ed9
Fix VS 2017 build break (#6276)
hariharans29 Jan 7, 2021
d761571
Deprecate Python global configuration functions [Part 2] (#6171)
edgchen1 Jan 7, 2021
481a2cd
Add script to preprocess python documentation before publishing (#6129)
xadupre Jan 7, 2021
b80e8ce
rename past to past_key_values for GPT-2 (#6269)
tianleiwu Jan 7, 2021
c109486
Rename MakeString and ParseString functions. (#6272)
edgchen1 Jan 7, 2021
04287ec
Increase timeout for Linux GPU CUDA11 build. (#6280)
edgchen1 Jan 7, 2021
a72fcbd
Add helper to compare model with different precision (#6270)
wangyems Jan 8, 2021
7fc827a
Fix Min/Max CPU kernels for float16 type (#6205)
hariharans29 Jan 8, 2021
ac5ca2b
fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)
tianleiwu Jan 8, 2021
da952a9
A list of changes in transformers tool (#6224)
wangyems Jan 8, 2021
1059bfa
Workaround for static_cast<double>(half)
jessebenson Jan 8, 2021
fa851bf
Add workaround to remove ROCm-specific binary-elementwise files.
jessebenson Jan 8, 2021
5084ce0
Update nuget build (#6297)
snnn Jan 11, 2021
84024bd
Enable ONNX backend test of SequenceProto input/output (#6043)
jcwchen Jan 11, 2021
938e65d
add --sequence_lengths option (#6285)
tianleiwu Jan 11, 2021
ac5b5e5
more dtype for Equal CUDA kernel (#6288)
centwang Jan 12, 2021
c43ca45
Force reinstall onnx python package on Windows (#6309)
snnn Jan 12, 2021
a038924
update transformers required package versions (#6315)
tianleiwu Jan 12, 2021
3b3e698
Remove abs in LpPool (#6303)
luyaor Jan 12, 2021
a825766
Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)
zhanghuanrong Jan 12, 2021
ec81e29
Add longformer to python package (#6314)
tianleiwu Jan 12, 2021
b491d7c
Avoid false sharing on thread pool data structures (#6298)
tlh20 Jan 12, 2021
0ed56d4
fix opset imports for function body (#6287)
askhade Jan 12, 2021
aacc8db
Remove false positive prefast warning from threadpool (#6324)
tlh20 Jan 12, 2021
6b73bae
Java: add Semmle to Java publishing pipelines (#6326)
yuslepukhin Jan 12, 2021
f77ff1b
Quantization support for split operator with its NHWC support (#6107)
zhanghuanrong Jan 13, 2021
aeca96c
Liqun/enable pipeline parallel test (#6331)
liqunfu Jan 13, 2021
5623cc6
Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider…
alberto-magni Jan 13, 2021
87ec1f6
MLAS: add fallback implementation for quantized GEMM (#6335)
tracysh Jan 13, 2021
56ab216
Delete float16.py (#6336)
oliviajain Jan 13, 2021
62e4045
Enable add + softmax fusion for Rocm platform (#6259)
Jan 13, 2021
f7034b9
add external data support to tensor proto utils (#6257)
askhade Jan 13, 2021
d367941
changed wording. (#6337)
Jan 13, 2021
cfd6f10
Remove OpSchema dummy definition. Only needed for Function now, and w…
skottmckay Jan 13, 2021
fcd9fc9
remove gemmlowp submodule (#6341)
tracysh Jan 13, 2021
b220fee
[NNAPI] Add pow support (#6310)
guoyu-wang Jan 14, 2021
042053c
Add support for running Android emulator from build.py on Windows. (#…
edgchen1 Jan 14, 2021
e35db19
fix the pipeline failure (#6346)
guoyu-wang Jan 14, 2021
4df356d
Train BERT Using BFloat16 on A100 (#6090)
centwang Jan 14, 2021
5b9d993
Fix DerefNullPtr issues raised by SDLNativeRules. (#6348)
pranavsharma Jan 14, 2021
c24f295
update quantize to support basic optimization and e2e example for ima…
yufenglee Jan 14, 2021
fd21c84
Enable graph save for orttrainer (#6333)
ashbhandare Jan 14, 2021
ea6789b
Add PREfast to python packaging pipeline (#6343)
snnn Jan 14, 2021
5d9552c
fix longformer benchmark io_binding output_buffers (#6345)
wangyems Jan 14, 2021
e54e2f9
Use readelf for minimal build binary size checks. (#6338)
skottmckay Jan 14, 2021
6d0fb3e
Java: Set C language warnings to W4 and adjust JNI code (#6347)
yuslepukhin Jan 14, 2021
8ce252c
Pipeline Parallel Experimental Python API (#5815)
wschin Jan 15, 2021
961bb62
Add create session to WinML telemetry to track WinML Usage (#6356)
Jan 15, 2021
c8e37e3
Fix one more SDL warning (#6359)
pranavsharma Jan 15, 2021
f5a4f7f
fix -Wdangling-gsl (#6357)
askhade Jan 15, 2021
eab164e
Add python example of TensorRT INT8 inference on ResNet model (#6255)
stevenlix Jan 15, 2021
4db4982
This added telemetry isn't needed (#6363)
Jan 16, 2021
5b6753c
Wezuo/memory analysis (#5658)
wezuo Jan 19, 2021
baac7c9
Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)
tianleiwu Jan 19, 2021
ac36596
fix convert_common version retrival (#6382)
wangyems Jan 19, 2021
d7bdd96
Refine auto_pad based pad computation in ConvTranspose (#6305)
hariharans29 Jan 20, 2021
a1b5bfc
Fix SDL warning (#6390)
hariharans29 Jan 20, 2021
453431f
Add max_norm for gradient clipping. (#6289)
pengwa Jan 20, 2021
69af044
Add the custom op project information (#6334)
wenbingl Jan 20, 2021
33f60a0
Dont use default string marshalling in C# (#6219)
hariharans29 Jan 21, 2021
d9e4795
Fix Windows x86 compiler warnings in the optimizers project (#6377)
hariharans29 Jan 21, 2021
8574854
[Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376)
hariharans29 Jan 21, 2021
eb946c4
Unblock Android CI code coverage failure (#6393)
guoyu-wang Jan 21, 2021
99a38f4
fix build on cuda11 (#6394)
centwang Jan 21, 2021
98cc7b5
Load the model path correctly (#6369)
MartinMoon Jan 21, 2021
bba185a
Fix some compile warnings (#6316)
snnn Jan 22, 2021
4442d94
OpenVino docker file changes to bypass privileged mode
smkarlap Jan 22, 2021
60c772e
Megatron checkpointing (#6293)
ashbhandare Jan 22, 2021
61ecf52
Fix generate_submodule_cgmanifest.py Windows issues. (#6404)
edgchen1 Jan 22, 2021
3c3d363
Continue memory planning when unknown shape tensor is encountered. (#…
codemzs Jan 22, 2021
6507b4f
Reintroduce experimental api changes and fix remote build break (#6385)
Jan 22, 2021
e1dc268
Add support for custom ops to minimal build. (#6228)
skottmckay Jan 25, 2021
c20965f
enable pipeline to run quantization tests (#6416)
yufenglee Jan 25, 2021
24f1bd6
Minor cmake change (#6431)
hariharans29 Jan 25, 2021
6ed1240
Liqun/liqun/enable pipeline parallel test2 (#6399)
liqunfu Jan 25, 2021
f3a0344
Farewell TrainableDropout (#5793)
codemzs Jan 26, 2021
7e42840
fix null dereference warning (#6437)
yufenglee Jan 26, 2021
76dbd88
Expose graph ModelPath to TensorRT shared library (#6353)
stevenlix Jan 26, 2021
afd7b8b
add tool for generating test data for longformer (#6415)
tianleiwu Jan 27, 2021
0d20104
only build experimental api in redist (#6465)
smk2007 Jan 27, 2021
9835b46
Add an option to save the training graph after optimization (#6410)
ryotatomioka Jan 27, 2021
b5d1a49
Share allocator between CUDA EP & TRT EP. (#6332)
HectorSVC Jan 27, 2021
fd43806
fix max norm clipping test in python packaging pipeline test (#6468)
pengwa Jan 27, 2021
c05adb1
Initial version of CoreML EP (#6392)
guoyu-wang Jan 27, 2021
d5f51c4
Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.M…
smk2007 Jan 27, 2021
f68eb35
dequantize 1st input of lstm back if it is quantized (#6444)
yufenglee Jan 27, 2021
0100f33
[java] Adds support for OrtEnvironment thread pools (#6406)
Craigacp Jan 27, 2021
1ce1a51
fix SDL native rule warning #6246 (#6461)
fs-eire Jan 27, 2021
ed1ebd2
fix SDL rule (#6464)
fs-eire Jan 27, 2021
b6ac35f
use tickcount64 (#6447)
Jan 27, 2021
7a0ab9c
Update pypi package metadata (#6354)
faxu Jan 28, 2021
91b19b8
Delete nuget extra configs (#6477)
snnn Jan 28, 2021
d850fa6
Op kernel type reduction infrastructure. (#6466)
edgchen1 Jan 28, 2021
77d0eb3
Fixing a leak in OnnxSequences with String keys or values. (#6473)
Craigacp Jan 28, 2021
2e228d7
Increase the distributes tests pipeline timeout to 120 minutes (#6479)
baijumeswani Jan 28, 2021
752627c
[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP …
guoyu-wang Jan 28, 2021
c84bb9d
Add ability to track per operator types in reduced build config. (#6428)
skottmckay Jan 28, 2021
00afd00
merge e2e with distributed pipeline (#6443)
liqunfu Jan 28, 2021
ea2b560
Fix test breaks in Windows ingestion pipeline (#6476)
smk2007 Jan 28, 2021
3f60b27
Speed up the Mac CI runs (#6483)
guoyu-wang Jan 28, 2021
ce46f37
expose learningmodelpixelrange property (#5877)
zhangxiang1993 Jan 28, 2021
d4e1f5a
Fix of support api version bug for [de]quantize (#6492)
guoyu-wang Jan 29, 2021
21b4842
SDL fixes: add proper casts/format specifiers (#6446)
Jan 29, 2021
3b1227c
SDL annotation fixes (#6448)
Jan 29, 2021
1a5b75a
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
suryasidd Jan 29, 2021
7abb5b6
Support pad operator in quantization and quantized nhwc transformer. …
zhanghuanrong Jan 29, 2021
066520f
Improve work distribution for Expand operator, and sharded LoopCounte…
tlh20 Jan 29, 2021
d3203ad
Update document of transformer optimization (#6487)
tianleiwu Jan 29, 2021
71389ff
nuphar test to avoid test data download to improve passing rate (#6467)
liqunfu Jan 29, 2021
a19c48f
Fuse cuda conv with activation (#6351)
RandySheriffH Jan 29, 2021
06a6c63
[CoreML EP] Add support for some activations/Transpose, move some sha…
guoyu-wang Jan 29, 2021
8306150
Refine transformers profiler output (#6502)
tianleiwu Jan 29, 2021
8c6d76a
Update to match new test setup. (#6496)
skottmckay Jan 29, 2021
76bc0e4
Enable dense sequence optimized version of Pytorch exported BERT-L on…
Jan 29, 2021
7f57317
Optimize GatherGrad for AMD GPU (#6381)
weixingzhang Jan 29, 2021
76f5d9e
add explicit barriers for buffer overread and overrwrite (#6484)
Jan 29, 2021
531eb06
fix sdl bugs for uninitialized variables and returns (#6450)
Jan 29, 2021
3a30ad7
handle hr error conditions (#6449)
Jan 29, 2021
a36f627
Dnnl training (#6045)
georgen117 Jan 30, 2021
7c5bfba
Lochi/refactor yolov3 quantization (#6290)
chilo-ms Jan 30, 2021
f2872ff
Print a warning message for using newer c_api header on old binary (#…
guoyu-wang Jan 30, 2021
e5cbcec
Fix issues with ArmNN build setup (#6495)
skottmckay Jan 30, 2021
5b69cbe
Fix Windows CI builds by updating test scripts to work with numpy 1.2…
skottmckay Feb 1, 2021
891181d
Fix ORTModule branch for orttraining-* pipelines
Jan 29, 2021
6b890c2
Merge remote-tracking branch 'origin/master' into thiagofc/fix-orttra…
Feb 1, 2021
0432fa7
Update pytorch nightly version dependency
Feb 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP …
…options (#6481)

* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029b.
  • Loading branch information
guoyu-wang authored Jan 28, 2021
commit 752627c5bb72eabbab73425199d83c4004ba23ab
3 changes: 2 additions & 1 deletion cmake/onnxruntime_providers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,7 @@ if (onnxruntime_USE_COREML)
target_include_directories(onnxruntime_coreml_proto PUBLIC $<TARGET_PROPERTY:protobuf::libprotobuf,INTERFACE_INCLUDE_DIRECTORIES> "${CMAKE_CURRENT_BINARY_DIR}")
target_compile_definitions(onnxruntime_coreml_proto PUBLIC $<TARGET_PROPERTY:protobuf::libprotobuf,INTERFACE_COMPILE_DEFINITIONS>)
set_target_properties(onnxruntime_coreml_proto PROPERTIES COMPILE_FLAGS "-fvisibility=hidden")
set_target_properties(onnxruntime_coreml_proto PROPERTIES COMPILE_FLAGS "-fvisibility-inlines-hidden")
set(_src_sub_dir "coreml/")
onnxruntime_protobuf_generate(
APPEND_PATH
Expand Down Expand Up @@ -694,7 +695,7 @@ if (onnxruntime_USE_COREML)
source_group(TREE ${ONNXRUNTIME_ROOT}/core FILES ${onnxruntime_providers_coreml_cc_srcs})
add_library(onnxruntime_providers_coreml ${onnxruntime_providers_coreml_cc_srcs} ${onnxruntime_providers_coreml_objcc_srcs})
onnxruntime_add_include_to_target(onnxruntime_providers_coreml onnxruntime_common onnxruntime_framework onnx onnx_proto protobuf::libprotobuf-lite flatbuffers onnxruntime_coreml_proto)
target_link_libraries(onnxruntime_providers_coreml "-framework Foundation" "-framework CoreML")
target_link_libraries(onnxruntime_providers_coreml PRIVATE onnxruntime_coreml_proto "-framework Foundation" "-framework CoreML")
add_dependencies(onnxruntime_providers_coreml onnx onnxruntime_coreml_proto ${onnxruntime_EXTERNAL_DEPENDENCIES})
set_target_properties(onnxruntime_providers_coreml PROPERTIES CXX_STANDARD_REQUIRED ON)
set_target_properties(onnxruntime_providers_coreml PROPERTIES FOLDER "ONNXRuntime")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,32 @@

#include "onnxruntime_c_api.h"

// COREMLFlags are bool options we want to set for CoreML EP
// This enum is defined as bit flats, and cannot have negative value
// To generate an uint32_t coreml_flags for using with OrtSessionOptionsAppendExecutionProvider_CoreML below,
// uint32_t coreml_flags = 0;
// coreml_flags |= COREML_FLAG_USE_CPU_ONLY;
enum COREMLFlags {
COREML_FLAG_USE_NONE = 0x000,

// Using CPU only in CoreML EP, this may decrease the perf but will provide
// reference output value without precision loss, which is useful for validation
COREML_FLAG_USE_CPU_ONLY = 0x001,

// Enable CoreML EP on subgraph
COREML_FLAG_ENABLE_ON_SUBGRAPH = 0x002,

// Keep COREML_FLAG_MAX at the end of the enum definition
// And assign the last COREMLFlag to it
COREML_FLAG_LAST = COREML_FLAG_ENABLE_ON_SUBGRAPH,
};

#ifdef __cplusplus
extern "C" {
#endif

ORT_API_STATUS(OrtSessionOptionsAppendExecutionProvider_CoreML, _In_ OrtSessionOptions* options);
ORT_API_STATUS(OrtSessionOptionsAppendExecutionProvider_CoreML,
_In_ OrtSessionOptions* options, uint32_t coreml_flags);

#ifdef __cplusplus
}
Expand Down
1 change: 1 addition & 0 deletions onnxruntime/core/providers/coreml/builders/helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#pragma once

#include <stdint.h>
#include <functional>

namespace onnxruntime {

Expand Down
15 changes: 9 additions & 6 deletions onnxruntime/core/providers/coreml/builders/model_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@
namespace onnxruntime {
namespace coreml {

ModelBuilder::ModelBuilder(const GraphViewer& graph_viewer, const logging::Logger& logger)
ModelBuilder::ModelBuilder(const GraphViewer& graph_viewer, const logging::Logger& logger, uint32_t coreml_flags)
: graph_viewer_(graph_viewer),
logger_(logger) {
logger_(logger),
coreml_flags_(coreml_flags) {
}

Status ModelBuilder::Initialize() {
Expand Down Expand Up @@ -191,7 +192,7 @@ Status ModelBuilder::RegisterModelOutputs() {

Status ModelBuilder::Compile(std::unique_ptr<Model>& model, const std::string& path) {
ORT_RETURN_IF_ERROR(SaveCoreMLModel(path));
model.reset(new Model(path, logger_));
model.reset(new Model(path, logger_, coreml_flags_));
model->SetScalarOutputs(std::move(scalar_outputs_));
model->SetInputOutputInfo(std::move(input_output_info_));
return model->LoadModel();
Expand All @@ -202,9 +203,11 @@ Status ModelBuilder::SaveCoreMLModel(const std::string& path) {
std::ofstream stream(path, std::ofstream::out | std::ofstream::binary);
ORT_RETURN_IF_NOT(coreml_model_->SerializeToOstream(&stream), "Save the CoreML model failed");

// Delete, debug only
std::ofstream temp_stream("/Users/gwang/temp/aaa.mlmodel", std::ofstream::out | std::ofstream::binary);
ORT_RETURN_IF_NOT(coreml_model_->SerializeToOstream(&temp_stream), "Save the CoreML model failed");
// TODO, Delete, debug only
if (const char* path = std::getenv("ORT_COREML_EP_CONVERTED_MODEL_PATH")) {
std::ofstream temp_stream(path, std::ofstream::out | std::ofstream::binary);
ORT_RETURN_IF_NOT(coreml_model_->SerializeToOstream(&temp_stream), "Save the CoreML model failed");
}

return Status::OK();
}
Expand Down
3 changes: 2 additions & 1 deletion onnxruntime/core/providers/coreml/builders/model_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ struct OnnxTensorInfo;

class ModelBuilder {
public:
ModelBuilder(const GraphViewer& graph_viewer, const logging::Logger& logger);
ModelBuilder(const GraphViewer& graph_viewer, const logging::Logger& logger, uint32_t coreml_flags);
~ModelBuilder() = default;

Status Compile(std::unique_ptr<Model>& model, const std::string& path) ORT_MUST_USE_RESULT;
Expand All @@ -37,6 +37,7 @@ class ModelBuilder {
private:
const GraphViewer& graph_viewer_;
const logging::Logger& logger_;
uint32_t coreml_flags_;

std::unique_ptr<CoreML::Specification::Model> coreml_model_;
std::unordered_set<std::string> scalar_outputs_;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ namespace onnxruntime {

constexpr const char* COREML = "CoreML";

CoreMLExecutionProvider::CoreMLExecutionProvider()
: IExecutionProvider{onnxruntime::kCoreMLExecutionProvider, true} {
CoreMLExecutionProvider::CoreMLExecutionProvider(uint32_t coreml_flags)
: IExecutionProvider{onnxruntime::kCoreMLExecutionProvider, true},
coreml_flags_(coreml_flags) {
AllocatorCreationInfo device_info(
[](int) {
return onnxruntime::make_unique<CPUAllocator>(OrtMemoryInfo(COREML, OrtAllocatorType::OrtDeviceAllocator));
Expand All @@ -44,7 +45,7 @@ CoreMLExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph_vie

// We do not run CoreML EP on subgraph, instead we cover this in the control flow nodes
// TODO investigate whether we want to support subgraph using CoreML EP
if (graph_viewer.IsSubgraph()) {
if (graph_viewer.IsSubgraph() && !(coreml_flags_ & COREML_FLAG_ENABLE_ON_SUBGRAPH)) {
return result;
}

Expand Down Expand Up @@ -169,7 +170,7 @@ common::Status CoreMLExecutionProvider::Compile(const std::vector<FusedNodeAndGr
Node& fused_node = fused_node_and_graph.fused_node;
const onnxruntime::GraphViewer& graph_viewer(fused_node_and_graph.filtered_graph);

coreml::ModelBuilder builder(graph_viewer, *GetLogger());
coreml::ModelBuilder builder(graph_viewer, *GetLogger(), coreml_flags_);
std::unique_ptr<coreml::Model> coreml_model;
const std::string coreml_model_file_path = coreml::util::GetTemporaryFilePath();
ORT_RETURN_IF_ERROR(builder.Compile(coreml_model, coreml_model_file_path));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class Model;

class CoreMLExecutionProvider : public IExecutionProvider {
public:
CoreMLExecutionProvider();
CoreMLExecutionProvider(uint32_t coreml_flags);
virtual ~CoreMLExecutionProvider();

std::vector<std::unique_ptr<ComputeCapability>>
Expand All @@ -28,6 +28,10 @@ class CoreMLExecutionProvider : public IExecutionProvider {
std::vector<NodeComputeInfo>& node_compute_funcs) override;
#endif

// The bit flags which define bool options for COREML EP, bits are defined as
// COREMLFlags in include/onnxruntime/core/providers/coreml/coreml_provider_factory.h
const uint32_t coreml_flags_;

private:
// <fused_node_name, <coreml_model_file_path, compiled_coreml_model>>
std::unordered_map<std::string, std::unique_ptr<onnxruntime::coreml::Model>> coreml_models_;
Expand Down
15 changes: 9 additions & 6 deletions onnxruntime/core/providers/coreml/coreml_provider_factory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,25 @@ using namespace onnxruntime;

namespace onnxruntime {
struct CoreMLProviderFactory : IExecutionProviderFactory {
CoreMLProviderFactory() {}
CoreMLProviderFactory(uint32_t coreml_flags)
: coreml_flags_(coreml_flags) {}
~CoreMLProviderFactory() override {}

std::unique_ptr<IExecutionProvider> CreateProvider() override;
uint32_t coreml_flags_;
};

std::unique_ptr<IExecutionProvider> CoreMLProviderFactory::CreateProvider() {
return onnxruntime::make_unique<CoreMLExecutionProvider>();
return onnxruntime::make_unique<CoreMLExecutionProvider>(coreml_flags_);
}

std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_CoreML() {
return std::make_shared<onnxruntime::CoreMLProviderFactory>();
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_CoreML(uint32_t coreml_flags) {
return std::make_shared<onnxruntime::CoreMLProviderFactory>(coreml_flags);
}
} // namespace onnxruntime

ORT_API_STATUS_IMPL(OrtSessionOptionsAppendExecutionProvider_CoreML, _In_ OrtSessionOptions* options) {
options->provider_factories.push_back(onnxruntime::CreateExecutionProviderFactory_CoreML());
ORT_API_STATUS_IMPL(OrtSessionOptionsAppendExecutionProvider_CoreML,
_In_ OrtSessionOptions* options, uint32_t coreml_flags) {
options->provider_factories.push_back(onnxruntime::CreateExecutionProviderFactory_CoreML(coreml_flags));
return nullptr;
}
2 changes: 1 addition & 1 deletion onnxruntime/core/providers/coreml/model/model.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class Model {

OrtMutex mutex_;

Model(const std::string& path, const logging::Logger& logger);
Model(const std::string& path, const logging::Logger& logger, uint32_t coreml_flags);
onnxruntime::common::Status LoadModel();

void SetInputOutputInfo(std::unordered_map<std::string, OnnxTensorInfo>&& input_output_info) {
Expand Down
24 changes: 15 additions & 9 deletions onnxruntime/core/providers/coreml/model/model.mm
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "core/common/logging/logging.h"
#include "core/graph/onnx_protobuf.h"
#include "core/providers/coreml/builders/helper.h"
#include "core/providers/coreml/coreml_provider_factory.h"
#include "host_utils.h"
#include "model.h"

Expand Down Expand Up @@ -38,10 +39,12 @@ @interface CoreMLExecution : NSObject {
NSString* coreml_model_path_;
NSString* compiled_model_path_;
const onnxruntime::logging::Logger* logger_;
uint32_t coreml_flags_;
}

- (instancetype)initWithPath:(const std::string&)path
logger:(const onnxruntime::logging::Logger&)logger;
logger:(const onnxruntime::logging::Logger&)logger
coreml_flags:(uint32_t)coreml_flags;
- (void)cleanup;
- (void)dealloc;
- (onnxruntime::common::Status)loadModel API_AVAILABLE_OS_VERSIONS;
Expand Down Expand Up @@ -129,10 +132,12 @@ - (nullable MLFeatureValue*)featureValueForName:(nonnull NSString*)featureName {
@implementation CoreMLExecution

- (instancetype)initWithPath:(const std::string&)path
logger:(const onnxruntime::logging::Logger&)logger {
logger:(const onnxruntime::logging::Logger&)logger
coreml_flags:(uint32_t)coreml_flags {
if (self = [super init]) {
coreml_model_path_ = [NSString stringWithUTF8String:path.c_str()];
logger_ = &logger;
coreml_flags_ = coreml_flags;
}
return self;
}
Expand Down Expand Up @@ -202,8 +207,7 @@ - (void)dealloc {
}

MLPredictionOptions* options = [[MLPredictionOptions alloc] init];
// TODO add options
// options.usesCPUOnly = YES;
options.usesCPUOnly = coreml_flags_ & COREML_FLAG_USE_CPU_ONLY;
NSError* error = nil;
id<MLFeatureProvider> output_feature = [_model predictionFromFeatures:input_feature
options:options
Expand Down Expand Up @@ -262,7 +266,7 @@ - (void)dealloc {
// This class will bridge Model (c++) with CoreMLExecution (objective c++)
class Execution {
public:
Execution(const std::string& path, const logging::Logger& logger);
Execution(const std::string& path, const logging::Logger& logger, uint32_t coreml_flags);
~Execution(){};

Status LoadModel();
Expand All @@ -274,8 +278,10 @@ Status Predict(const std::unordered_map<std::string, OnnxTensorData>& inputs,
CoreMLExecution* execution_;
};

Execution::Execution(const std::string& path, const logging::Logger& logger) {
execution_ = [[CoreMLExecution alloc] initWithPath:path logger:logger];
Execution::Execution(const std::string& path, const logging::Logger& logger, uint32_t coreml_flags) {
execution_ = [[CoreMLExecution alloc] initWithPath:path
logger:logger
coreml_flags:coreml_flags];
}

Status Execution::LoadModel() {
Expand Down Expand Up @@ -303,8 +309,8 @@ Status Predict(const std::unordered_map<std::string, OnnxTensorData>& inputs,
return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Execution::LoadModel requires macos 10.15+ or ios 13+ ");
}

Model::Model(const std::string& path, const logging::Logger& logger)
: execution_(onnxruntime::make_unique<Execution>(path, logger)) {
Model::Model(const std::string& path, const logging::Logger& logger, uint32_t coreml_flags)
: execution_(onnxruntime::make_unique<Execution>(path, logger, coreml_flags)) {
}

Model::~Model() {}
Expand Down
4 changes: 2 additions & 2 deletions onnxruntime/test/framework/test_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ IExecutionProvider* TestRknpuExecutionProvider() {
#endif

#ifdef USE_COREML
IExecutionProvider* TestCoreMLExecutionProvider() {
static CoreMLExecutionProvider coreml_provider;
IExecutionProvider* TestCoreMLExecutionProvider(uint32_t coreml_flags) {
static CoreMLExecutionProvider coreml_provider(coreml_flags);
return &coreml_provider;
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/test/framework/test_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ IExecutionProvider* TestRknpuExecutionProvider();
#endif

#ifdef USE_COREML
IExecutionProvider* TestCoreMLExecutionProvider();
IExecutionProvider* TestCoreMLExecutionProvider(uint32_t coreml_flags);
#endif

template <typename T>
Expand Down
14 changes: 10 additions & 4 deletions onnxruntime/test/providers/coreml/coreml_basic_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

#include "core/common/logging/logging.h"
#include "core/providers/coreml/coreml_execution_provider.h"
#include "core/providers/coreml/coreml_provider_factory.h"
#include "core/session/inference_session.h"
#include "test/common/tensor_op_test_utils.h"
#include "test/framework/test_utils.h"
Expand Down Expand Up @@ -65,21 +66,26 @@ TEST(CoreMLExecutionProviderTest, FunctionTest) {
std::vector<int64_t> dims_mul_x = {1, 1, 3, 2};
std::vector<float> values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f};
OrtValue ml_value_x;
CreateMLValue<float>(TestCoreMLExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,

// We want to run UT on CPU only to get output value without losing precision
uint32_t coreml_flags = 0;
coreml_flags |= COREML_FLAG_USE_CPU_ONLY;

CreateMLValue<float>(TestCoreMLExecutionProvider(coreml_flags)->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,
&ml_value_x);
OrtValue ml_value_y;
CreateMLValue<float>(TestCoreMLExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,
CreateMLValue<float>(TestCoreMLExecutionProvider(coreml_flags)->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,
&ml_value_y);
OrtValue ml_value_z;
CreateMLValue<float>(TestCoreMLExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,
CreateMLValue<float>(TestCoreMLExecutionProvider(coreml_flags)->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x,
&ml_value_z);
NameMLValMap feeds;
feeds.insert(std::make_pair("X", ml_value_x));
feeds.insert(std::make_pair("Y", ml_value_y));
feeds.insert(std::make_pair("Z", ml_value_z));

RunAndVerifyOutputsWithEP(model_file_name, "CoreMLExecutionProviderTest.FunctionTest",
onnxruntime::make_unique<CoreMLExecutionProvider>(),
onnxruntime::make_unique<CoreMLExecutionProvider>(coreml_flags),
feeds);
}

Expand Down
10 changes: 8 additions & 2 deletions onnxruntime/test/util/default_providers.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
#ifdef USE_ROCM
#include "core/providers/rocm/rocm_provider_factory_creator.h"
#endif
#ifdef USE_COREML
#include "core/providers/coreml/coreml_provider_factory.h"
#endif
#include "core/session/onnxruntime_cxx_api.h"

namespace onnxruntime {
Expand All @@ -26,7 +29,7 @@ std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_Tensor
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_MIGraphX(int device_id);
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_ACL(int use_arena);
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_ArmNN(int use_arena);
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_CoreML();
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_CoreML(uint32_t);

// EP for internal testing
std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory_InternalTesting(
Expand Down Expand Up @@ -136,7 +139,10 @@ std::unique_ptr<IExecutionProvider> DefaultRocmExecutionProvider() {

std::unique_ptr<IExecutionProvider> DefaultCoreMLExecutionProvider() {
#if defined(USE_COREML)
return CreateExecutionProviderFactory_CoreML()->CreateProvider();
// We want to run UT on CPU only to get output value without losing precision
uint32_t coreml_flags = 0;
coreml_flags |= COREML_FLAG_USE_CPU_ONLY;
return CreateExecutionProviderFactory_CoreML(coreml_flags)->CreateProvider();
#else
return nullptr;
#endif
Expand Down
19 changes: 19 additions & 0 deletions tools/ci_build/github/azure-pipelines/mac-coreml-ci-pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
jobs:
- job: CoreML_CI
pool:
vmImage: 'macOS-10.15'
timeoutInMinutes: 120
steps:
- script: brew install coreutils ninja
displayName: Install coreutils and ninja

- script: |
python3 tools/ci_build/build.py \
--build_dir build \
--skip_submodule_sync \
--cmake_generator=Ninja \
--parallel \
--build_shared_lib \
--config Debug \
--use_coreml
displayName: CoreML EP, Build and Test on macOS