[Core] Metric unintentional_worker_failures_total is not accurate

### What happened + What you expected to happen

We use ray on Kubernetes using the `kuberay` project. We have a sanity test that runs a simple job via the job submission API the workload succeeds however the metric `unintentional_worker_failures_total` is also incremented.

That metric should not however be incremented. The definition of the metric reads _Number of worker failures that are not intentional_.

I asked about it on [the slack channel](https://ray-distributed.slack.com/archives/C01DLHZHRBJ/p1692915384186229) and was told to file an issue.

### Versions / Dependencies

2.6.1

### Reproduction script

```python
# workload.py

import ray


@ray.remote
def workload(val: str) -> str:
    return f"got {val}"


if __name__ == "__main__":
    ray.init()
    assert ray.get(workload.remote("foo")) == "got foo"
```


```python
# submit.py

from ray.job_submission import JobSubmissionClient

client = JobSubmissionClient(ray_head_address)
job_id = client.submit_job(entrypoint='python workload.py')
```

### Issue Severity

Low: It annoys or frustrates me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Metric unintentional_worker_failures_total is not accurate #1918

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Core] Metric unintentional_worker_failures_total is not accurate #1918

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions