[target-allocator] Targets remain assigned to terminating pod until restart is complete

When load testing what would happen in a statefulset of 2 collector pods, I noticed some unexpected behavior.  The goal of the test was to see what happened when a pod dies in a pool of 2 collector pods.  I used an avalanche pod to generate a lot of metrics in a single target (this will ensure only one of the collector pods is experiencing the load spike).

e.g.

```
% k get po
NAME                                                         READY   STATUS             RESTARTS        AGE
curl-moh                                                     1/1     Running            0               144m
lightstep-collector-collector-0                              0/1     CrashLoopBackOff   6 (4m17s ago)   52m
lightstep-collector-collector-1                              1/1     Running            2 (22m ago)     36m
lightstep-collector-targetallocator-b6865b5bb-p4rnt          1/1     Running            0               20s
opentelemetry-operator-controller-manager-7f7bcf896d-wjgmd   2/2     Running            4 (149m ago)    2d11h
```

and 

```
$ k describe po lightstep-collector-collector-0
...
...
...
 State:          Running
      Started:      Sun, 31 Jul 2022 23:32:33 -0700
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sun, 31 Jul 2022 23:27:31 -0700
      Finished:     Sun, 31 Jul 2022 23:32:32 -0700
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     50m
      memory:  512Mi
    Requests:
      cpu:     50m
      memory:  512Mi
```

## What I expected:
I expected the load test targets to be assigned to the healthy pod while one pod was in the `CrashLoopBackOff` state due to `OOMKilled` 

## What actually happened:
target allocator does not reassign targets until killed pod is restarted.  This isn't really problematic when the restart happens quickly but occasionally the killed pod will be down for several minutes (like in the above output of `k get po`) and in this case the target allocator does not reassign targets until this pod comes back up.  I would have expected the targets to be assigned to the healthy pod immediately to reduce the chance of having metrics dropping for several minutes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[target-allocator] Targets remain assigned to terminating pod until restart is complete #1048

What I expected:

What actually happened:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development