Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller)!: Updates the resource duration calculation. Fixes #2934 #2937

Merged
merged 13 commits into from
May 12, 2020
9 changes: 0 additions & 9 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,6 @@ type Config struct {
// MetricsConfig specifies configuration for metrics emission
MetricsConfig PrometheusConfig `json:"metricsConfig,omitempty"`

// FeatureFlags for general/experimental features
FeatureFlags FeatureFlags `json:"featureFlags,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we getting rid of feature flags entirely? Wouldn't we want to keep this for future features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is bad design. You can use normal config item or a envvar? Both are easier and more flexible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we enabling resource duration by default?


// TelemetryConfig specifies configuration for telemetry emission
TelemetryConfig PrometheusConfig `json:"telemetryConfig,omitempty"`

Expand Down Expand Up @@ -87,12 +84,6 @@ type PodSpecLogStrategy struct {
AllPods bool `json:"allPods,omitempty"`
}

// More general feature flags.
type FeatureFlags struct {
// ResourcesDuration.
ResourcesDuration bool `json:"resourcesDuration,omitempty"`
}

// KubeConfig is used for wait & init sidecar containers to communicate with a k8s apiserver by a outofcluster method,
// it is used when the workflow controller is in a different cluster with the workflow workloads
type KubeConfig struct {
Expand Down
18 changes: 7 additions & 11 deletions docs/resource-duration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,6 @@
Argo Workflows provides an indication of how much resource your workflow has used and saves this
information. This is intended to be an **indicative but not accurate** value.

## Configuration

This is only set if [the feature flags is enabled in config](workflow-controller-configmap.yaml).

## Calculation

The calculation is always an estimate, and is calculated by [../util/resource/duration.go](../util/resource/duration.go)
Expand All @@ -19,12 +15,12 @@ defaults.

Each indicator is divided by a common denominator depending on resource type.

### Base amounts
### Base Amounts

Each resource type has a "base amount" used to normalize usage across containers.
Each resource type has a denominator used to make large values smaller.

* CPU: `1000m`
* Memory: `1Gi`
* CPU: `1`
* Memory: `100Mi`
* Storage: `10Gi`
* Ephemeral Storage: `10Gi`
* All others: `1`
Expand All @@ -35,7 +31,7 @@ the container's Resource Duration.
For example, if you've requested `100Mi` of memory (one tenth of the base amount), and the container
runs 120sec, then the reported Resource Duration will be `12sec * (1Gi memory)`.

### Request defaults
### Request Defaults

If `requests` are not set for a container, Kubernetes defaults to `limits`. If `limits` are not set,
Argo falls back to `100m` for CPU and `100Mi` for memory.
Expand All @@ -49,8 +45,8 @@ A pod that runs for 3min, with a CPU limit of `2000m`, no memory request and an
resource limit of `1`:

```
CPU: 3min * 2000m / 1000m = 6min * (1000m cpu)
Memory: 3min * 100Mi / 1Gi = 18sec * (1Gi memory)
CPU: 3min * 2000m / 1000m = 6min * (1 cpu)
Memory: 3min * 100Mi / 1Gi = 18sec * (100Mi memory)
GPU: 3min * 1 / 1 = 2min * (1 nvidia.com/gpu)
```

Expand Down
4 changes: 0 additions & 4 deletions docs/workflow-controller-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,6 @@ data:
scope: pod
url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}

featureFlags:
# Enable resource duration
resourcesDuration: false

# artifactRepository defines the default location to be used as the artifact repository for
# container artifacts.
artifactRepository:
Expand Down
2 changes: 0 additions & 2 deletions manifests/quick-start-mysql.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,6 @@ data:
secretKeySecret:
name: my-minio-cred
key: secretkey
featureFlags: |
resourcesDuration: true
links: |
- name: Example Workflow Link
scope: workflow
Expand Down
2 changes: 0 additions & 2 deletions manifests/quick-start-no-db.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,6 @@ data:
secretKeySecret:
name: my-minio-cred
key: secretkey
featureFlags: |
resourcesDuration: true
links: |
- name: Example Workflow Link
scope: workflow
Expand Down
2 changes: 0 additions & 2 deletions manifests/quick-start-postgres.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,6 @@ data:
secretKeySecret:
name: my-minio-cred
key: secretkey
featureFlags: |
resourcesDuration: true
links: |
- name: Example Workflow Link
scope: workflow
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ data:
secretKeySecret:
name: my-minio-cred
key: secretkey
featureFlags: |
resourcesDuration: true
metricsConfig: |
disableLegacy: true
enabled: true
Expand Down
15 changes: 14 additions & 1 deletion pkg/apis/workflow/v1alpha1/workflow_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (

apiv1 "k8s.io/api/core/v1"
policyv1beta "k8s.io/api/policy/v1beta1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

Expand Down Expand Up @@ -989,7 +990,7 @@ func (in ResourcesDuration) Add(o ResourcesDuration) ResourcesDuration {
func (in ResourcesDuration) String() string {
var parts []string
for n, d := range in {
parts = append(parts, fmt.Sprintf("%v*%s", d, n))
parts = append(parts, fmt.Sprintf("%v*(%s %s)", d, ResourceQuantityDenominator(n).String(), n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks way better in my opinion

}
return strings.Join(parts, ",")
}
Expand All @@ -998,6 +999,18 @@ func (in ResourcesDuration) IsZero() bool {
return len(in) == 0
}

func ResourceQuantityDenominator(r apiv1.ResourceName) *resource.Quantity {
q, ok := map[apiv1.ResourceName]resource.Quantity{
apiv1.ResourceMemory: resource.MustParse("100Mi"),
apiv1.ResourceStorage: resource.MustParse("10Gi"),
apiv1.ResourceEphemeralStorage: resource.MustParse("10Gi"),
}[r]
if !ok {
q = resource.MustParse("1")
}
return &q
}

type WorkflowConditions []WorkflowCondition

func (wc *WorkflowConditions) UpsertCondition(condition WorkflowCondition) {
Expand Down
14 changes: 12 additions & 2 deletions pkg/apis/workflow/v1alpha1/workflow_types_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,21 @@ func TestNodes_Any(t *testing.T) {
func TestResourcesDuration(t *testing.T) {
t.Run("String", func(t *testing.T) {
assert.Equal(t, ResourcesDuration{}.String(), "")
assert.Equal(t, ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}.String(), "1s*memory")
assert.Equal(t, ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}.String(), "1s*(100Mi memory)")
})
t.Run("Add", func(t *testing.T) {
assert.Equal(t, ResourcesDuration{}.Add(ResourcesDuration{}).String(), "")
assert.Equal(t, ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}.Add(ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}).String(), "2s*memory")
assert.Equal(t, ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}.
Add(ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}).
String(), "2s*(100Mi memory)")
})
t.Run("CPUAndMemory", func(t *testing.T) {
assert.Equal(t, ResourcesDuration{}.Add(ResourcesDuration{}).String(), "")
s := ResourcesDuration{corev1.ResourceCPU: NewResourceDuration(2 * time.Second)}.
Add(ResourcesDuration{corev1.ResourceMemory: NewResourceDuration(1 * time.Second)}).
String()
assert.Contains(t, s, "1s*(100Mi memory)")
assert.Contains(t, s, "2s*(1 cpu)")
})
}

Expand Down
13 changes: 13 additions & 0 deletions ui/src/app/shared/duration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,19 @@ export function formatDuration(seconds: number) {
return formattedDuration;
}

export function denominator(resource: string) {
switch (resource) {
case 'memory':
return '100Mi';
case 'storage':
return '10Gi';
case 'ephemeral-storage':
return '10Gi';
default:
return '1';
}
}

export function wfDuration(status: models.WorkflowStatus) {
return ((status.finishedAt ? new Date(status.finishedAt) : new Date()).getTime() - new Date(status.startedAt).getTime()) / 1000;
}
4 changes: 2 additions & 2 deletions ui/src/app/shared/resources-duration.tsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import * as React from 'react';
import {formatDuration} from './duration';
import {denominator, formatDuration} from './duration';

interface Props {
resourcesDuration: {[resource: string]: number};
Expand All @@ -11,7 +11,7 @@ export class ResourcesDuration extends React.Component<Props> {
<>
{this.props.resourcesDuration &&
Object.entries(this.props.resourcesDuration)
.map(([resource, duration]) => formatDuration(duration) + '*' + resource)
.map(([resource, duration]) => formatDuration(duration) + '*(' + denominator(resource) + ' ' + resource + ')')
.join(',')}{' '}
<a href='https://github.com/argoproj/argo/blob/master/docs/resource-duration.md'>
<i className='fa fa-info-circle' />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ interface Props {

const AttributeRow = (attr: {title: string; value: any}) => (
<div className='row white-box__details-row' key={attr.title}>
<div className='columns small-3'>{attr.title}</div>
<div className='columns small-9'>{attr.value}</div>
<div className='columns small-4'>{attr.title}</div>
<div className='columns small-8'>{attr.value}</div>
</div>
);
const AttributeRows = (props: {attributes: {title: string; value: any}[]}) => (
Expand Down
4 changes: 2 additions & 2 deletions ui/src/models/workflows.ts
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ export interface NodeStatus {
finishedAt: kubernetes.Time;

/**
* How much resource was used.
* How much resource was requested.
*/
resourcesDuration?: {[resource: string]: number};

Expand Down Expand Up @@ -758,7 +758,7 @@ export interface WorkflowStatus {
storedTemplates: {[name: string]: Template};

/**
* ResourcesDuration tracks how much resources were used.
* ResourcesDuration tracks how much resources were requested.
*/
resourcesDuration?: {[resource: string]: number};

Expand Down
48 changes: 5 additions & 43 deletions util/resource/duration.go
Original file line number Diff line number Diff line change
@@ -1,47 +1,17 @@
package resource

import (
"time"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"

wfv1 "github.com/argoproj/argo/pkg/apis/workflow/v1alpha1"
)

type containerSummary struct {
corev1.ResourceList
corev1.ContainerState
}

func (s containerSummary) duration(now time.Time) time.Duration {
if s.Terminated != nil {
return s.Terminated.FinishedAt.Time.Sub(s.Terminated.StartedAt.Time)
} else if s.Running != nil {
return now.Sub(s.Running.StartedAt.Time)
} else {
return 0
}
}

func resourceDenominator(r corev1.ResourceName) *resource.Quantity {
q, ok := map[corev1.ResourceName]resource.Quantity{
corev1.ResourceCPU: resource.MustParse("1000m"),
corev1.ResourceMemory: resource.MustParse("1Gi"),
corev1.ResourceStorage: resource.MustParse("10Gi"),
corev1.ResourceEphemeralStorage: resource.MustParse("10Gi"),
}[r]
if !ok {
q = resource.MustParse("1")
}
return &q
}

func DurationForPod(pod *corev1.Pod, now time.Time) wfv1.ResourcesDuration {
summaries := map[string]containerSummary{}
func DurationForPod(pod *corev1.Pod) wfv1.ResourcesDuration {
summaries := Summaries{}
for _, c := range append(pod.Spec.InitContainers, pod.Spec.Containers...) {
// Initialize summaries with default limits for CPU and memory.
summaries[c.Name] = containerSummary{ResourceList: map[corev1.ResourceName]resource.Quantity{
summaries[c.Name] = Summary{ResourceList: map[corev1.ResourceName]resource.Quantity{
// https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b
corev1.ResourceCPU: resource.MustParse("100m"),
// https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-memory-6b41e9a955f9
Expand All @@ -55,16 +25,8 @@ func DurationForPod(pod *corev1.Pod, now time.Time) wfv1.ResourcesDuration {
summaries[c.Name].ResourceList[name] = quantity
}
}
// Add container states.
for _, c := range append(pod.Status.InitContainerStatuses, pod.Status.ContainerStatuses...) {
summaries[c.Name] = containerSummary{ResourceList: summaries[c.Name].ResourceList, ContainerState: c.State}
}
i := wfv1.ResourcesDuration{}
for _, s := range summaries {
duration := s.duration(now)
for r, q := range s.ResourceList {
i = i.Add(wfv1.ResourcesDuration{r: wfv1.NewResourceDuration(time.Duration(q.Value() * duration.Nanoseconds() / resourceDenominator(r).Value()))})
}
summaries[c.Name] = Summary{ResourceList: summaries[c.Name].ResourceList, ContainerState: c.State}
}
return i
return summaries.Duration()
}
24 changes: 11 additions & 13 deletions util/resource/duration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ func TestDurationForPod(t *testing.T) {
want wfv1.ResourcesDuration
}{
{"Empty", &corev1.Pod{}, wfv1.ResourcesDuration{}},
{"RunningContainerWithCPURequest", &corev1.Pod{
{"ContainerWithCPURequest", &corev1.Pod{
Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "main", Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("2000m"),
Expand All @@ -32,20 +32,19 @@ func TestDurationForPod(t *testing.T) {
{
Name: "main",
State: corev1.ContainerState{
Running: &corev1.ContainerStateRunning{
StartedAt: metav1.Time{
Time: now.Add(-1 * time.Minute),
},
Terminated: &corev1.ContainerStateTerminated{
StartedAt: metav1.Time{Time: now.Add(-1 * time.Minute)},
FinishedAt: metav1.Time{Time: now},
},
},
},
},
},
}, wfv1.ResourcesDuration{
corev1.ResourceCPU: wfv1.NewResourceDuration(2 * time.Minute),
corev1.ResourceMemory: wfv1.NewResourceDuration(5 * time.Second),
corev1.ResourceMemory: wfv1.NewResourceDuration(1 * time.Minute),
}},
{"TerminatedContainerWithCPURequest", &corev1.Pod{
{"ContainerWithCPURequest", &corev1.Pod{
Spec: corev1.PodSpec{Containers: []corev1.Container{{Name: "main", Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("2000m"),
Expand All @@ -59,24 +58,23 @@ func TestDurationForPod(t *testing.T) {
{
Name: "main",
State: corev1.ContainerState{
Running: &corev1.ContainerStateRunning{
StartedAt: metav1.Time{
Time: now.Add(-3 * time.Minute),
},
Terminated: &corev1.ContainerStateTerminated{
StartedAt: metav1.Time{Time: now.Add(-3 * time.Minute)},
FinishedAt: metav1.Time{Time: now},
},
},
},
},
},
}, wfv1.ResourcesDuration{
corev1.ResourceCPU: wfv1.NewResourceDuration(6 * time.Minute),
corev1.ResourceMemory: wfv1.NewResourceDuration(0 * time.Second),
corev1.ResourceMemory: wfv1.NewResourceDuration(3 * time.Minute),
corev1.ResourceName("nvidia.com/gpu"): wfv1.NewResourceDuration(3 * time.Minute),
}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := DurationForPod(tt.pod, now)
got := DurationForPod(tt.pod)
assert.Equal(t, tt.want, got)
})
}
Expand Down
Loading