Push for Bert, MaskRCNN and Resnet-18 E2E support using Lazy Tensor Core

The purpose of this bug is to help keep track of tasks needed to have E2E support for `Bert`, `MaskRCNN`, and `Resnet-18` using the Lazy Tensor Core (LTC) and lowering through `torch-mlir`.

At the moment, the main part that is missing is op support in LTC. Below is a table of the LTC ops needed. The list of ops was determined by running two scripts: this [MaskRCNN script](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_maskrcnn_example.py) which generates this [output](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_maskrcnn_example_output.txt), and this [Resnet script](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_resnet18_example.py) which generates this [output](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_resnet18_example_output.txt). The ops missing are the ones with the `aten::` prefix in the output of the scripts. For more information of how to setup and run the examples, see [here](https://github.com/ramiro050/lazy-tensor-samples/blob/main/README.md).

### LTC Ops Needed

Status column symbols:
`-` unimplemented
`+` started
`x` finished

| Ops                             | Status | Owner     | Model            | Notes |
|---------------------------------|--------|-----------|------------------|-------|
| `aten::_index_put_impl_`        | -      |           | MaskRCNN         |       |
| `aten::arange.start_out`        | +      |     silvasean      | MaskRCNN         |       |
| `aten::exp.out`                 | x      | ramiro050          | MaskRCNN         | https://github.com/pytorch/pytorch/pull/67213      |
| `aten::floor.out`               | x      | ramiro050          | MaskRCNN         | https://github.com/pytorch/pytorch/pull/66770      |
| `aten::index.Tensor`            | +      | ramiro050          | MaskRCNN         |       |
| `aten::log2.out`                | x      | ramiro050          | MaskRCNN         | https://github.com/pytorch/pytorch/pull/66771      |
| `aten::max_pool2d_with_indices` | +      |   vivekkhandelwal1        | MaskRCNN, Resnet |       |
| `aten::upsample_nearest2d.out`  | -      |           | MaskRCNN         |       |
| `aten::mean.out`                | x      | alanwaketan | Resnet           | https://github.com/pytorch/pytorch/pull/67174      |
| `aten::sort`                    | x      |  silvasean  | Resnet           |   https://github.com/pytorch/pytorch/pull/67053    |


## torch-mlir ops Needed

Below is a list of ops needed on the `torch-mlir` side. This list was compiled by going over the ops detected by LTC when running this [MaskRCNN script](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_maskrcnn_example.py) (output with list of ops detected can be found [here](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_maskrcnn_example_output.txt)), this [Bert script](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_bert_example.py) ([output](https://github.com/ramiro050/lazy-tensor-samples/blob/main/lazytensor_bert_example_output.txt)), and the Resnet-18 model in the PyTorch benchmarks ([instructions](https://github.com/ramiro050/lazy-tensor-samples#resnet-18-inference-and-training) for setting it up with LTC), and checking which had lowerings in `torch-mlir` and which did not. 

**Note:** `Bert` and `Resnet18` are currently the only training models. The ops needed for `MaskRCNN` training will be added soon.

Status column symbols:
`-` unimplemented
`+` started
`x` finished

The full op lists including finished ones are moved to https://github.com/llvm/torch-mlir/issues/365#issuecomment-1022664476. This new table only contains ops to be done so that we can be more focused.
| Op                                        | Status | Owner    | Model                                                 | Notes |
|-------------------------------------------|--------|----------|-------------------------------------------------------|-------|
| `aten::bernoulli_`                        | +      |    pashu123      | Bert Training                                         |   rng op |
| `aten::embedding_dense_backward`          | +      | vivekkhandelwal1         | Bert Training                                         |   histogram    |
| `aten::native_layer_norm_backward`        | +    |  gprateek93        | Bert Training                                         |   [PR546](https://github.com/llvm/torch-mlir/pull/546), [PR570](https://github.com/llvm/torch-mlir/pull/570)   |
| `aten::nll_loss_backward`                 | +      |   pashu123       | Bert Training, Resnet-18 Training                     | [PR463](https://github.com/llvm/torch-mlir/pull/463)      |
| `aten::_copy_from`                        | +      | pashu123   | Resnet-18 Training, MaskRCNN Inference | torchscript baseline won't run (to be investigated)|
| `aten::convolution_backward_overrideable` | +      | gpetters94         | Resnet-18 Training                                    |       |
| `aten::max_pool2d_with_indices_backward`  | +      |    vivekkhandelwal1      | Resnet-18 Training                                    |       |
| `aten::native_batch_norm_backward`        | +      |   Shukla-Gaurav       | Resnet-18 Training                                    |       |
| `aten::native_batch_norm`                 | +      |  Shukla-Gaurav        | Resnet-18 Training                                    | [PR563](https://github.com/llvm/torch-mlir/pull/563)      |
| `aten::random_.to`                        | +      |     gprateek93     | Resnet-18 Training                                    |  rng op     |
| `aten::_copy_from_and_resize`             | +      |  gpetters94       | Resnet-18 Training, MaskRCNN Inference                |       |
| `aten::convolution_overrideable`          | +      |  gpetters94       | Resnet-18 Training, MaskRCNN Inference                |       |
| `aten::convolution`          | +    |  gpetters94        | Resnet-18 Training(through AOTAutograd)                |       |
| `aten::convolution_backward`          | +      |  gpetters94       | Resnet-18 Training(through AOTAutograd)                |       |
| `aten::max_pool2d_with_indices`           | +     | vivekkhandelwal1         | Resnet-18 Training, MaskRCNN Inference                | https://github.com/llvm/torch-mlir/pull/518      |
| `aten::_index_put_impl_`                  | -      |          | MaskRCNN Inference                                    |   histogram    |
| `aten::stack`                             | -      |  pashu123        | MaskRCNN Inference                                    |       |
| `aten::topk`                              | -      |    gprateek93      | MaskRCNN Inference                                    |       |
| `aten::upsample_nearest2d`                | -      |     gprateek93     | MaskRCNN Inference                                    |       |
| `torchvision::nms`                        | -      |          | MaskRCNN Inference                                    |       |
| `torchvision::roi_align`                  | -      |          | MaskRCNN Inference                                    |       |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Push for Bert, MaskRCNN and Resnet-18 E2E support using Lazy Tensor Core #365

LTC Ops Needed

torch-mlir ops Needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ops	Status	Owner	Model	Notes
`aten::_index_put_impl_`	-		MaskRCNN
`aten::arange.start_out`	+	silvasean	MaskRCNN
`aten::exp.out`	x	ramiro050	MaskRCNN	pytorch/pytorch#67213
`aten::floor.out`	x	ramiro050	MaskRCNN	pytorch/pytorch#66770
`aten::index.Tensor`	+	ramiro050	MaskRCNN
`aten::log2.out`	x	ramiro050	MaskRCNN	pytorch/pytorch#66771
`aten::max_pool2d_with_indices`	+	vivekkhandelwal1	MaskRCNN, Resnet
`aten::upsample_nearest2d.out`	-		MaskRCNN
`aten::mean.out`	x	alanwaketan	Resnet	pytorch/pytorch#67174
`aten::sort`	x	silvasean	Resnet	pytorch/pytorch#67053

Op	Status	Owner	Model	Notes
`aten::bernoulli_`	+	pashu123	Bert Training	rng op
`aten::embedding_dense_backward`	+	vivekkhandelwal1	Bert Training	histogram
`aten::native_layer_norm_backward`	+	gprateek93	Bert Training	PR546, PR570
`aten::nll_loss_backward`	+	pashu123	Bert Training, Resnet-18 Training	PR463
`aten::_copy_from`	+	pashu123	Resnet-18 Training, MaskRCNN Inference	torchscript baseline won't run (to be investigated)
`aten::convolution_backward_overrideable`	+	gpetters94	Resnet-18 Training
`aten::max_pool2d_with_indices_backward`	+	vivekkhandelwal1	Resnet-18 Training
`aten::native_batch_norm_backward`	+	Shukla-Gaurav	Resnet-18 Training
`aten::native_batch_norm`	+	Shukla-Gaurav	Resnet-18 Training	PR563
`aten::random_.to`	+	gprateek93	Resnet-18 Training	rng op
`aten::_copy_from_and_resize`	+	gpetters94	Resnet-18 Training, MaskRCNN Inference
`aten::convolution_overrideable`	+	gpetters94	Resnet-18 Training, MaskRCNN Inference
`aten::convolution`	+	gpetters94	Resnet-18 Training(through AOTAutograd)
`aten::convolution_backward`	+	gpetters94	Resnet-18 Training(through AOTAutograd)
`aten::max_pool2d_with_indices`	+	vivekkhandelwal1	Resnet-18 Training, MaskRCNN Inference	#518
`aten::_index_put_impl_`	-		MaskRCNN Inference	histogram
`aten::stack`	-	pashu123	MaskRCNN Inference
`aten::topk`	-	gprateek93	MaskRCNN Inference
`aten::upsample_nearest2d`	-	gprateek93	MaskRCNN Inference
`torchvision::nms`	-		MaskRCNN Inference
`torchvision::roi_align`	-		MaskRCNN Inference

Push for Bert, MaskRCNN and Resnet-18 E2E support using Lazy Tensor Core #365

Description

LTC Ops Needed

torch-mlir ops Needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions