-
Notifications
You must be signed in to change notification settings - Fork 119
Initial TaskRun Reconcile Refactor #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial TaskRun Reconcile Refactor #1866
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Use standard controller-runtime fake client for unit tests. Signed-off-by: Adam Kaplan <adam.kaplan@redhat.com>
6ce0e33
to
8e1f2c5
Compare
Initial refactor to use a factory pattern when generating the Tekton TaskRun for a Shipwright BuildRun. This starts us down the path of using other k8s objects besides TaskRuns to execute the build process. Other possibilities include PipelineRuns, Argo Workflows, and even standard k8s objects (Pod, Job). Key logic moved out of the main BuildRun reconciler and `resources` package: - Cancellation of underlying runner (TaskRun) - Syncing of status to the BuildRun object - Sync of status from TaskRun results. - Sync of status from TaskRun conditions. - Sync of status from failure messages recorded in termination logs. - Report timestamps of started, completed, and other lifecycle events: - Pod creation time - Pod init containers finished - Execution start time - Migrate unit tests to use controller-runtime fake client - Move status condition update calls back to the main reconciler. Further abstractions are likely needed to ensure we properly generate the execution objects for other APIs. Signed-off-by: Adam Kaplan <adam.kaplan@redhat.com>
8e1f2c5
to
eca85d0
Compare
// taskrun pod ramp-up (time between pod creation and last init container completion) | ||
buildmetrics.TaskRunPodRampUpDurationObserve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we can advocate for in the Tekton community? This exposes metrics that are specific to an underlying implementation, making it difficult to use a different API for reconciling BuildRuns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing the behavior here. We so far measured the duration from the time when we created the object that does the work (TaskRun) up to when we knew it would start performing the strategy steps (end of TaskRun init containers).
Now you measure pod creation as start and take out the Tekton controller and its time to create a Pod from a TaskRun completely.
If we decide that we want to use different objects to run the BuildRun and want to continue to provide such a metric, then we'd have to rename it for sure. But if we want to keep something like this, then we should not change it. You could add a CalculateRampUpDuration function on the BuildRunner.
// taskrun ramp-up duration (time between taskrun creation and taskrun pod creation) | ||
buildmetrics.TaskRunRampUpDurationObserve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue - this metric is tied to an underlying implementation. In theory a runner's API does not need to create Kubernetes pods, it just needs to represent the execution of a workload. For example, a KubeVirt VirtualMachineInstance
: https://kubevirt.io/user-guide/user_workloads/lifecycle/
generatedTaskRun *pipelineapi.TaskRun | ||
) | ||
|
||
generatedTaskRun, err := resources.GenerateTaskRun(t.config, build, buildRun, serviceAccount.Name, strategy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm leaving the existing logic for creating TaskRuns in place. I plan on breaking this logic up and moving it to the TaskRunBuildRunner
struct (or something similar). This PR is big enough as it is.
// Set OwnerReference for BuildRun and TaskRun | ||
if err := controllerutil.SetOwnerReference(buildRun, generatedTaskRun, t.scheme); err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this here for now, but I may move the setting of owner references to the main Reconcile function in a subsequent PR.
/assign @SaschaSchwarze0 |
return e.Message | ||
} | ||
|
||
func (e *BuildRunnerValidationError) IsTerminal() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godoc please, currently doing the first pass and I cannot imagine what this is for - good indicator it requires docs.
@@ -0,0 +1,491 @@ | |||
package runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how the validate
check is green given this lacks copyright.
buildSpec := buildrun.Status.BuildSpec | ||
|
||
if buildSpec.Source == nil { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate. status.buildSpec
is a copy of the Build spec. But the real source may be the BuildRun source. For example, in case of a local source, this causes an overwrite.
buildRun.Status.Output.Digest = result.Value.StringVal | ||
|
||
case generateOutputResultName(imageSizeResult): | ||
// TODO: Throw error that should not be retried? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to log a warning if the conversion here fails. Usually means that the build strategy is incorrect in the case of strategy-managed push.
} | ||
|
||
case pipelineapi.TaskRunReasonTimedOut: | ||
reason = "BuildRunTimeout" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no constant for this ?
// taskrun pod ramp-up (time between pod creation and last init container completion) | ||
buildmetrics.TaskRunPodRampUpDurationObserve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing the behavior here. We so far measured the duration from the time when we created the object that does the work (TaskRun) up to when we knew it would start performing the strategy steps (end of TaskRun init containers).
Now you measure pod creation as start and take out the Tekton controller and its time to create a Pod from a TaskRun completely.
If we decide that we want to use different objects to run the BuildRun and want to continue to provide such a metric, then we'd have to rename it for sure. But if we want to keep something like this, then we should not change it. You could add a CalculateRampUpDuration function on the BuildRunner.
Changes
This is a refactor of how we reconcile TaskRun objects into BuildRun objects. The intent is to create useful abstractions of the TaskRun's lifecycle so that we can extend these principles to other Kubernetes object types.
/kind cleanup
Submitter Checklist
See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.
Release Notes