Skip to content

Commit

Permalink
Improve granularity of flow state reporting. (Velocidex#3628)
Browse files Browse the repository at this point in the history
Currently flows can be in the running, completed or errored states. This
is not enough granularity leaving users wondering what is happening to
their flows in case the client crashes or reboots for some reason, or
may become unresponsive. In such cases sometimes flows remain in the
running state so it is not easy to retry them.

This PR adds a server side mechanism to actively check up on the
progress of the flows. This ensures that we can keep track of running
flows better and terminate them when the client reboots. Flows now move
through more fine grained states:

1. RUNNING or Scheduled - in this state the flow is scheduled for the
client which may not be online.
2. WAITING - The flow is waiting on the client to run - usually clients
have a concurrency limit which blocks too many queries from running at
the same time. Additional queries will be blocked in the WAITING state
until they can begin running.
3. IN_PROGRESS - this state indicates that the query is running on the
client. Periodic status updates will be sent to the server to make sure
the query is still alive.
4. UNRESPONSIVE - if no periodic updates arrived for the query for a
while, the flow goes into the UNRESPONSIVE state. This could happen if
the client crashed or disappeared
5. FINISHED - This indicates the flow is completes successfully.
6. ERROR - The flow is in error.

If a flow has not been seen for a while, and the client connects again,
the server will now send a special request to the client asking it to
update the status of those in flight flows. The flows will go from the
UNRESPONSIVE state to the IN_PROGRESS state if the flow is still
running. But in the more common case where the client restarted and lot
progress on the flow, the client will send an ERROR message to the flow
causing the in flight flow to be properly closed off with an error.

This is a more robust mechanism which should ensure that flows are not
left in the running state without being finalized one way or the other.
  • Loading branch information
scudette authored Jul 18, 2024
1 parent b4e45a3 commit d3d94f8
Show file tree
Hide file tree
Showing 47 changed files with 1,921 additions and 743 deletions.
43 changes: 32 additions & 11 deletions actions/proto/vql.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions actions/proto/vql.proto
Original file line number Diff line number Diff line change
Expand Up @@ -208,4 +208,8 @@ message ClientInfo {
uint64 last_hunt_timestamp = 17;
uint64 last_event_table_version = 18;
uint64 labels_timestamp = 23;

// A List of flows that are currently in flight and their last
// update epoch time.
map<string, int64> in_flight_flows = 28;
}
28 changes: 14 additions & 14 deletions api/proto/api.pb.gw.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit d3d94f8

Please sign in to comment.