Reconciling TargetGroup Removes ECS targets causing downtime

**Describe the bug**
I'm using Flux for gitops , using elvb2 to create loadbalancer , targetgroup, listeners & also using aws ack ECS for task definition and service. 

When the targetgroup is reconciled , it's removing the targets that were registered from ECS via the `targetGroupRef`. This leads to the target group having 0 targets and being down. Eventually it comes back online , 


**Steps to reproduce**
Target group is created , example 
```yaml
apiVersion: elbv2.services.k8s.aws/v1alpha1
kind: TargetGroup
metadata:
  name: foo-bar-tg-staging
spec:
  name: foo-bar-tg-staging
  healthCheckEnabled: true
  healthCheckIntervalSeconds: 30
  healthCheckPath: /
  healthCheckPort: traffic-port
  healthCheckProtocol: HTTP
  healthCheckTimeoutSeconds: 5
  healthyThresholdCount: 5
  ipAddressType: ipv4
  matcher:
    httpCode: "200"
  port: 8080
  protocol: HTTP
  protocolVersion: HTTP1
  targetType: ip
  unhealthyThresholdCount: 2
  vpcID: xxxxx
```
It is created and working fine, it's referenced in the ECS service using `targetGroupRef`. 
```yaml
apiVersion: ecs.services.k8s.aws/v1alpha1
kind: Service
metadata:
  name: foo-bar
spec:
  name: foo-bar
  capacityProviderStrategy:
  - base: 0
    capacityProvider: FARGATE
    weight: 1
  cluster: staging
  deploymentConfiguration:
    alarms:
      alarmNames:
      - none
      enable: false
      rollback: false
    deploymentCircuitBreaker:
      enable: true
      rollback: true
    maximumPercent: 200
    minimumHealthyPercent: 100
  deploymentController:
    type: ECS
  desiredCount: 1
  enableECSManagedTags: true
  enableExecuteCommand: false
  healthCheckGracePeriodSeconds: 0
  loadBalancers:
  - containerName: foo-bar
    containerPort: 8080
    targetGroupRef:
      from:
        name: foo-bar-tg-staging
  networkConfiguration:
    awsVPCConfiguration:
      assignPublicIP: DISABLED
      securityGroups:
      - sg-xxxxx
      subnets:
      - sg-xxxxxx
      - sg-xxxxxx
  platformVersion: 1.4.0
  propagateTags: NONE
  schedulingStrategy: REPLICA
  taskDefinitionRef:
    from:
      name: foo-bar-staging
```
When the ELBv2 controller reconciles the target group, it removes the targets completely causing downtime. about ~15 min later ECS re-adds them and it's back online. It happens daily as the reconciler is set to every 10 hours and leads to downtime. 

I do see an event for ECS noting `task remained in deregistered state for too long` 

Log from elbv2 controller 
```json
2025-03-11T10:45:53.047941104Z stderr F 
{"level":"info",
"ts":"2025-03-11T10:45:53.047Z",
"logger":"ackrt",
"msg":"desired resource state has changed",
"kind":"TargetGroup",
"namespace":"foo-bar",
"name":"foo-bar-tg-staging",
"account":"xxxx",
"role":"",
"region":"us-west-2",
"is_adopted":false,"generation":1,"diff":[{"Path":{"Parts":["Spec",
"Targets"]},"A":null,"B":[{"availabilityZone":"us-west-2a",
"id":"xx.x.xx.xx","port":8080}]}]
}
```
you can see the metrics show unhealthy state at the same time as that log entry when it reconciles. 

![Image](https://github.com/user-attachments/assets/a3781d55-7cf5-4c89-8710-4e5b01070783)

**Expected outcome**
target group to keep the targets when using ECS & `targetGroupRef`. 

**Environment**

* Using EKS: Yes v1.29.13-eks-8cce635
* AWS service targeted: ECS, ELBV2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reconciling TargetGroup Removes ECS targets causing downtime #2372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reconciling TargetGroup Removes ECS targets causing downtime #2372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions