Skip to content

Job Stuck in Queue stage #1957

Open
Open
@mcarajatchawla

Description

@mcarajatchawla

Bug description 🐞

We are doing a POC for Terrakube and trying to evaluidate if its a good option for out use case. We have installed the version 2.25.0 using helm chart. we are using AWS dynamic credentials. And everything works fine and we are able to run some sample jobs. However randomly some jobs get stuck in queue stage without any error on UI and there is no option to terminate the job from UI as well. I see below error in API pod for that job run. So looks like some issue with API call. We have to go to database and manually mark it as failed and only then we can process with next jobs. This happens once for every 10-15 job runs and I am not able to pin point the reason.

[threadPoolTaskExecutor-1]` INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - Generating AWS dynamic credentials files inside the workspace execution
[threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - Writing AWS credentials to /home/cnb/.terraform-spring-boot/executor/d9b58bd3-f3fc-4056-a026-1163297e80a8/03a581cb-9f80-46c0-ba77-48c27abaa0bb/terrakube_config_dynamic_credentials_aws.txt
[threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.status.UpdateJobStatusImpl - Step list is not empty...
[threadPoolTaskExecutor-1] ERROR org.springframework.aop.interceptor.SimpleAsyncUncaughtExceptionHandler - Unexpected exception occurred invoking async method: public void org.terrakube.executor.service.executor.ExecutorJobImpl.createJob(org.terrakube.executor.service.mode.TerraformJob)
feign.FeignException$FeignClientException: [423 ] during [PATCH] to [http://terrakube-api-service:8080/api/v1/organization/d9b58bd3-f3fc-4056-a026-1163297e80a8/job/60] [TerrakubeClient#updateJob(JobRequest,String,String)]: [{"errors":[{"detail":"ERROR: null value in column "job_id" of relation "step" violates not-null constraint\n  Detail: Failing row contains (436c9b25-5317-4cc7-af78-0b54a2dc5480, 150, null, pending, null, Approve Plan from Terraform CLI, null)."}]}]
	at feign.FeignException.clientErrorStatus(FeignException.java:244)
	at feign.FeignException.errorStatus(FeignException.java:203)
	at feign.FeignException.errorStatus(FeignException.java:194)
	at feign.codec.ErrorDecoder$Default.decode(ErrorDecoder.java:103)
	at feign.InvocationContext.decodeError(InvocationContext.java:126)
	at feign.InvocationContext.proceed(InvocationContext.java:72)
	at feign.ResponseHandler.handleResponse(ResponseHandler.java:63)
	at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:114)
	at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:70)
	at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:99)
	at jdk.proxy2/jdk.proxy2.$Proxy93.updateJob(Unknown Source)
	at org.terrakube.executor.service.status.UpdateJobStatusImpl.setRunningStatus(UpdateJobStatusImpl.java:54)
	at org.terrakube.executor.service.executor.ExecutorJobImpl.createJob(ExecutorJobImpl.java:47)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:355)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:196)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:113)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Loading...
[You're using Lens Personal (for individuals or companies with < $10M annual revenue or funding)](https://k8slens.dev/pricing)

Steps to reproduce

Run terraform below terraform with CLI multiple times.

terraform {
  backend "remote" {
    organization = "simple"
    hostname = "xxxxx"
    workspaces {
      name = "xxxx"
    }
  }
}
provider "aws" {
  assume_role {
    role_arn = "arn:aws:iam::xxxxxx:role/terrakube-cross-account"
  }
}
# provider "aws" {
#   alias               = "dev"
#   assume_role {
#     role_arn = "arn:aws:iam::xxxxxx:role/terrakube-cross-account"
#   }
# }
resource "aws_s3_bucket" "example" {
  bucket = "my-tf-xxxx-awerqerqwcc-xxxxxx"
  #provider = aws.dev
  tags = {
    Name        = "My bucket"
    Environment = "Dev"
  }
}

Expected behavior

Job should run or give some error. But its gets stuck in queue stage

Example repository

No response

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions