For things like https://github.com/meta-pytorch/BackendBench/blob/main/BackendBench/llm_client.py, https://github.com/meta-pytorch/BackendBench/blob/main/BackendBench/backends/kernel_agent.py, https://github.com/meta-pytorch/BackendBench/blob/main/BackendBench/backends/llm_relay.py, and https://github.com/meta-pytorch/BackendBench/blob/main/BackendBench/backends/llm.py.
Errors could occur due to things like rate limits, bad formatting, running out of money, etc. Rather than things which we'd normally catch such as the code not compiling or being full of errors. As a developer it's useful to know in which cases the llm / agent is making really silly mistakes to the point that we can't even read the code. We should add another class of errors to capture these types of mistakes and surface them in test scenarios where they happen!