Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] All the issue related with e2e shark test suite #812

Open
pdhirajkumarprasad opened this issue Aug 27, 2024 · 4 comments
Open

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Aug 27, 2024

Full ONNX FE tracker is at: #564

Running model

In alt_e2e test suite:

setenv CACHE_DIR "some Path where model will be downloaded"

If building torch-mlir and iree from source:

source /path/to/iree-build/.env && export PYTHONPATH
export PYTHONPATH=/path/to/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir:/path/to/torch-mlir/test/python/fx_importer:$PYTHONPATH
export PATH=/path/to/iree-build/tools/:/path/to/torch-mlir/build/bin/:$PATH

python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t ModelName

For onnx/models/

critical issues

import and setup failures

# device issue type issue no #model impacted list of model assignee status
1 N/A missing weights (remove these) #862 30 model list
2 N/A cannot load model in ORT (remove?) #862 1 model list
3 N/A OOM during ORT #862 3 model list
4 N/A OOM import, missing dim_params, ORT PASS #860 #861 21 model list
5 N/A Unable to update opset ver due to BatchNormalization, ORT PASS #859 5 model list
6 N/A Unable to update opset ver due to BN, OOM import, ORT PASS #859 #861 1 model list
7 N/A duplicate metadata_prop keys, ORT PASS #863 1 model list
8 N/A OOM import, ORT PASS #861 25 model list
9 N/A No Azure Blob Found #864 20 model list

onnx to torch

# device issue type issue no #model impacted list of model assignee status
1 CPU 'util.initializer' op failed to inline into combined initializer 18386 56 modelList @vivekkhandelwal1
2 CPU failed to legalize operation 'hal.interface.constant.load' 45 modelList @vinayakdsci
3 CPU crash: mlir::PatternApplicator::matchAndRewrite 867 41 modelList @zjgarvey
4 CPU Crash 866 22 modelList @vinayakdsci
5 CPU 'memref.alloca' op expected no unbounded stack allocations 18810 5 modelList @jinchen62
6 CPU 'torch.prim.If' op along control flow edge from Region #0 to parent results: source type #0 696 6 modelList @renxida
7 CPU 'vector.transfer_write' op inferred mask type ('vector<1x1x4xi1>') and mask operand type ('vector<1x4x1xi1>') don't match 3 modelList
8 CPU 'stream.async.dispatch' op has invalid Read access range [0 to 7375872 for 7375872] of resource %15 with size 150528; length > resource size 3 modelList
9 CPU 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect 1 modelList
10 CPU operand #1 does not dominate this use iree#18815 1 modelList @IanWood1
11 CPU failed to legalize operation onnx.NonZero 820 1 modelList @renxida
12 CPU type of return operand 0 ('!torch.vtensor<[?,384],f32>') doesn't match function result type ('!torch.vtensor<[1,384],f32>') 1 modelList @Shukla-Gaurav
13 CPU torch.aten.convolution 1 modelList @PhaneeshB
14 CPU boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp 3293 @renxida
15 CPU Add TorchToLinalg lowering for MaxUnpool operation 718 @jinchen62
16 CPU Fix Onnx.DFT Torch->Linalg lowering 800 @PhaneeshB

torch to linalg

# device issue type issue no #model impacted list of model assignee status
1 CPU 'linalg.generic' op inferred input/output operand 825 11 modelList @zjgarvey

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

# device issue type issue no #model impacted list of model assignee Status
1 CPU error: One or more operations with large vector sizes (8192 bytes) were found 18677 22 modelList
2 GPU error: 'vector.transfer_read' op Anchoring on transfer_read with unsupported number of elements 18601 100+
3 GPU func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes 18603 100+

iree runtime

# device issue type issue no #model impacted list of model assignee Status
1 CPU Abort 18741 515+ modelList

numerics

# device issue type issue no #model impacted list of model assignee
1 CPU numeric need_to_analyze 101 modleList
2 [numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops 2 18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

@nod-ai nod-ai deleted a comment Aug 27, 2024
@nod-ai nod-ai deleted a comment Aug 27, 2024
@zjgarvey
Copy link
Collaborator

Can you update the model List links?

@jinchen62
Copy link
Contributor

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

@pdhirajkumarprasad
Copy link
Author

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

@jinchen62
Copy link
Contributor

jinchen62 commented Aug 29, 2024

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants