Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions openapi/openapiv2.json
Original file line number Diff line number Diff line change
Expand Up @@ -8125,6 +8125,10 @@
"type": "object",
"$ref": "#/definitions/v1WorkerHeartbeat"
}
},
"leaseDuration": {
"type": "string",
"description": "Duration for which the worker lease should be valid. During this time, the server considers the worker to be active.\nThe worker is expected to send periodic heartbeats to renew its lease before it expires.\n\nServer will calculate the actual expiration time based on when it receives this request.\nIf not specified or zero, the server will use a default lease duration of 1 minute.\n\nThere are 3 states for a worker: Active, Inactive, and CleanedUp.\nLifecycle transitions:\n- Active->Active: Each time the server receives a heartbeat from the worker, it will renew the lease and keep the worker in the active state.\n\n- Active->Inactive: When the lease expires, the server will consider the worker to be inactive, and reschedule activities that were known to be running as of that time.\n\n- Inactive->Active: If the server receives subsequent heartbeat from this worker, then it will transition it back to the active state.\n\n- Inactive->CleanedUp: If the worker remains inactive for a prolonged period, the server will cleanup the worker state. This is a terminal state.\n If the server receives subsequent heartbeat from this worker, then it will always return an non-retryable FailedPrecondition error.\n The worker will need to shutdown and re-register using a different WorkerInstanceKey to become active again."
}
}
},
Expand Down
21 changes: 21 additions & 0 deletions openapi/openapiv3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10355,6 +10355,27 @@ components:
type: array
items:
$ref: '#/components/schemas/WorkerHeartbeat'
leaseDuration:
pattern: ^-?(?:0|[1-9][0-9]{0,11})(?:\.[0-9]{1,9})?s$
type: string
description: |-
Duration for which the worker lease should be valid. During this time, the server considers the worker to be active.
The worker is expected to send periodic heartbeats to renew its lease before it expires.

Server will calculate the actual expiration time based on when it receives this request.
If not specified or zero, the server will use a default lease duration of 1 minute.

There are 3 states for a worker: Active, Inactive, and CleanedUp.
Lifecycle transitions:
- Active->Active: Each time the server receives a heartbeat from the worker, it will renew the lease and keep the worker in the active state.

- Active->Inactive: When the lease expires, the server will consider the worker to be inactive, and reschedule activities that were known to be running as of that time.

- Inactive->Active: If the server receives subsequent heartbeat from this worker, then it will transition it back to the active state.

- Inactive->CleanedUp: If the worker remains inactive for a prolonged period, the server will cleanup the worker state. This is a terminal state.
If the server receives subsequent heartbeat from this worker, then it will always return an non-retryable FailedPrecondition error.
The worker will need to shutdown and re-register using a different WorkerInstanceKey to become active again.
RecordWorkerHeartbeatResponse:
type: object
properties: {}
Expand Down
19 changes: 19 additions & 0 deletions temporal/api/workflowservice/v1/request_response.proto
Original file line number Diff line number Diff line change
Expand Up @@ -2467,6 +2467,25 @@ message RecordWorkerHeartbeatRequest {
string identity = 2;

repeated temporal.api.worker.v1.WorkerHeartbeat worker_heartbeat = 3;

// Duration for which the worker lease should be valid. During this time, the server considers the worker to be active.
// The worker is expected to send periodic heartbeats to renew its lease before it expires.
//
// Server will calculate the actual expiration time based on when it receives this request.
// If not specified or zero, the server will use a default lease duration of 1 minute.
//
// There are 3 states for a worker: Active, Inactive, and CleanedUp.
// Lifecycle transitions:
// - Active->Active: Each time the server receives a heartbeat from the worker, it will renew the lease and keep the worker in the active state.
//
// - Active->Inactive: When the lease expires, the server will consider the worker to be inactive, and reschedule activities that were known to be running as of that time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this already true? If not we may want to go back and update the comment after it is, or, alternatively, let's hold off merging this until the implementation behind it is also ready.

Love the detail in the comment now though!

Copy link
Author

@rkannan82 rkannan82 Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, none of this is implemented yet.

  • Do we typically merge only after the server changes are merged/close to being merged? If so, what is the best practice for new feature development when we have cross repo dependencies like this?
  • Can we just add a comment saying under development and merge?
  • For context, later I will be evolving this proto to pass activities. Server side requires these protos for implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all but the most obvious things, we usually would like an accompanying server implementation to ensure that the API matches what we want. We learn a lot during implementation and it can cause problems providing "under development" fields. We find that we so often cut releases of API that people will employ these models into their proxies and in other ways that makes backwards incompatible alterations problematic.

Would strongly suggest at least having server implementation built before merging this. For larger efforts, even with the server implementation built, we may want it in a separate branch so SDKs can design against it and we can apply any API learnings from that, but not sure this qualifies for such a need.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have converted this into a draft PR, so you can still review it. I will not be merging this.

//
// - Inactive->Active: If the server receives subsequent heartbeat from this worker, then it will transition it back to the active state.
//
// - Inactive->CleanedUp: If the worker remains inactive for a prolonged period, the server will cleanup the worker state. This is a terminal state.
// If the server receives subsequent heartbeat from this worker, then it will always return an non-retryable FailedPrecondition error.
// The worker will need to shutdown and re-register using a different WorkerInstanceKey to become active again.
google.protobuf.Duration lease_duration = 4;
}

message RecordWorkerHeartbeatResponse {
Expand Down
Loading