Internal Server Error due to attempt record with start_time/end_time both unset

### What happened?

In scenarios where we have recently manually killed a job (via `kill -TERM` on the worker VM or similar), clicking on the job ID on the batch UI page to go to the particular job page instead results in `500 Internal Server Error`.

Finding the server logs (see below) indicates that the problem is a record in the `attempts` database table with both `start_time` and `end_time` being NULL. Looking in the database shows that the record in question is the one for the next yet-to-be-started attempt, not the attempt that has just been killed (which in our observations has had `end_time` at least filled in).

This could be addressed by ensuring that at least one of these fields is always non-NULL, or more likely by making the `attempts.sort(…)` invocation more robust, e.g., via

```python
attempts.sort(key=lambda x: x['start_time'] or x['end_time'] or MAXINT)
```

(where `MAXINT` is a suitable value to make these entries sort last)

### Version

0.2.133

### Relevant log output

```shell
{"severity":"ERROR","levelname":"ERROR","asctime":"2024-12-04 22:51:26,136","filename":"web_protocol.py","funcNameAndLine":"log_exception:421","message":"Error handling request","exc_info":"Traceback (most recent call last):
  File \"/usr/local/lib/python3.9/dist-packages/aiohttp/web_protocol.py\", line 452, in _handle_request
    resp = await request_handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/aiohttp/web_app.py\", line 543, in _handle
    resp = await handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/aiohttp/web_middlewares.py\", line 114, in impl
    return await handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/gear/csrf.py\", line 27, in check_csrf_token
    return await handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/batch/utils.py\", line 19, in unavailable_if_frozen
    return await handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/gear/metrics.py\", line 28, in monitor_endpoints_middleware
    response = await prom_async_time(REQUEST_TIME.labels(endpoint=endpoint, verb=verb), handler(request))  # type: ignore
  File \"/usr/local/lib/python3.9/dist-packages/prometheus_async/aio/_decorators.py\", line 55, in measure
    rv = await future
  File \"/usr/local/lib/python3.9/dist-packages/aiohttp_session/__init__.py\", line 199, in factory
    response = await handler(request)
  File \"/usr/local/lib/python3.9/dist-packages/gear/auth.py\", line 68, in wrapped
    return await fun(request, userdata)
  File \"/usr/local/lib/python3.9/dist-packages/batch/front_end/front_end.py\", line 202, in wrapped
    return await fun(request, userdata, batch_id)
  File \"/usr/local/lib/python3.9/dist-packages/batch/front_end/front_end.py\", line 163, in wrapped
    return await fun(request, userdata, *args, **kwargs)
  File \"/usr/local/lib/python3.9/dist-packages/batch/front_end/front_end.py\", line 2940, in ui_get_job
    job, attempts, job_log_bytes, resource_usage = await asyncio.gather(
  File \"/usr/local/lib/python3.9/dist-packages/batch/front_end/front_end.py\", line 2640, in _get_attempts
    attempts.sort(key=lambda x: x['start_time'] or x['end_time'])
TypeError: '<' not supported between instances of 'NoneType' and 'int'","hail_log":1}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Server Error due to attempt record with start_time/end_time both unset #14768

What happened?

Version

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development