Slower inference performance when switching from `torchxla_trace_once` to `openxla` compile backend

## 🐛 Bug

It looks like `torchxla_trace_once` is deprecated in favor of `openxla`, but when I tried to make that migration in some benchmark testing I saw a new warning message and some performance regressions. This was found when running an inference benchmark from openxla-benchmark - ResNet on GPU.

## To Reproduce

[Colab repro](https://colab.research.google.com/drive/1ImoH1agHVoRyZytx0caPve_SIp3mOmgY?resourcekey=0-ihHDMCuhIq3YuOWrB55WUw&usp=sharing).

Steps to reproduce the behavior:

1. Run colab with torchxla_trace_once - should dump files.
2. Run colab with openxla - should dump files (restart runtime if it does not)

Hopefully that provides enough information to be useful, if not I am happy to help further.

## Expected behavior

On-par performance and HLO graph generation between the two backends (`openxla` and `torchxla_trace_once`).

## Environment

 - Reproducible on XLA backend [CPU/TPU]: GPU
 - torch_xla version: nightly build (8/9/23)


## Additional context

Output traces: [save_ir.zip](https://github.com/pytorch/xla/files/12307201/save_ir.zip)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slower inference performance when switching from `torchxla_trace_once` to `openxla` compile backend #5430

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slower inference performance when switching from torchxla_trace_once to openxla compile backend #5430

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Slower inference performance when switching from `torchxla_trace_once` to `openxla` compile backend #5430