Skip to content

Upstreaming Runtime Monitoring and Tracing #1406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

steve-the-edwards
Copy link
Contributor

This is the evolution of a class we've long used in production internally. This takes a lot of the bookkeeping that we had there and makes it available to other pluggable WorkflowRuntimeTracers.

There is also an update to the RuntimeUpdate type that is clearer about when the runtime loop is finished (regardless of if the rendering is skipped or not).

First commit removes the old TracingWorkflowInterceptor and the corresponding trace-encoder module.

The second commit adds the WorkflowRuntimeMonitor as well as a bunch of reflection-less logging utilities that we've used for some time.

* to customize their logging names for [WorkflowRuntimeMonitor] output.
*/
public interface Loggable {
public fun toLogString(): String = toString().wfStripSquarePackage()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about upstreaming this I always felt bad about how com.squareup.com is hard coded into wfStripSquarePackage(). Wonder if we should expose a static property or something to allow customization. I don't think that should derail this initial commit, just food for thought.

But in the meantime, it does seem sketchy to have a default implementation that relies on a non-public function.

Is this call redundant here anyway? Do we also make downstream calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wise, it's a terrible default anyway. Shouldn't be making it so easy to use toString() by accident, defeats the purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should expose a static property or something to allow customization. I don't think that should derail this initial commit, just food for thought.

I thought about this too, but its a bit confusing to add and is a lot of power to munge the logs that i landed on just keeping it simple.

/**
* Used as a key to detect actions originating from the Workflow that implements a worker.
*/
// Keep this up-to-date with the action in com.squareup.workflow1.WorkerWorkflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a unit test that will break if we screw that up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. but we should. I will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did one better and just made it a constant in Worker.kt shared in both places

/**
* Alternative to [toString] used by Workflow logging.
*/
internal fun getWfLogString(log: Any?): String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this function non-public defeats a major use case. We want feature code implementing Loggable to be able to call this on members so that the opt-in scheme works recursively. I'd be inclined to most or all of this file public -- making Loggable public without this invites serious footguns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it public, but my general idea was that you would implement your own getLogString() to handle more cases than what we can include here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to err on the side of privacy probably. Easier to open up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the getLogString implementation is less broad than ours in register, because there are a lot of internal types that we don't want to lift here that we handle.

My idea was that we could keep using our Loggable as SquareLoggable for those, if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. If it feels good once you've tried it out could suggest it in the kdoc here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many of our internal types we could get out of the big if / else if they implemented Unwrappable?

import com.squareup.workflow1.tracing.ActionAppliedLogLine.WorkflowActionLogType.WORKER_OUTPUT

/**
* PLEASE NOTE: these logs lines are turned into strings in production, all the time, and there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"log lines"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public var currentRenderCause: RenderCause?

/**
* Add an update into the [RuntimeUpdates] tracked by [WorkflowRuntimeMonitor].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The kdoc on RuntimeUpdateLogLine is more informative, should probably say a lot of that here instead / too.
  • We can suggest a useful spot to call this from (see below).
  • Be good to update the kdoc of reportNavigation to draw attention to this, or if dependencies allow it add this to the default onNavigate param value.
* Consider calling this from the `onNavigate` function you provide to
* `reportNavigation`, e.g.
*
*         val renderings: Flow<Screen> by lazy {
*           renderWorkflowIn(
*             workflow = RootNavigationWorkflow,
*             scope = viewModelScope,
*             savedStateHandle = savedState,
*             runtimeConfig = RuntimeConfigOptions.ALL
*           ).reportNavigation {
*             runtimeMonitor.addRuntimeUpdate(
*               UiUpdateLogLine(getWfLogString(it))
*             )
*           }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, TIL about NavigationMonitor!

Copy link
Contributor

@rjrjr rjrjr Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I snuck that out when updating the tutorials. I'm proud to report that we use it in prod, it's solid.

/** SECTION: [WorkflowInterceptor] overrides. */

/**
* [WorkflowRuntimeTracer]s should not override this method, using [onWorkflowSessionStarted]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't b/c this is final, so "should not" is an odd thing to say.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i was just diary-ing my thoughts rather than making clear kdoc. Will change.

api(libs.kotlin.jdk8)
api(libs.kotlinx.coroutines.core)
api(libs.squareup.papa)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this direct tie to papa going to cause dependency hell down the road? Can we introduce a separate module that ties to papa, similar to what we did for radiography?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, we could

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. separate out into workflow-tracing-papa

@steve-the-edwards steve-the-edwards force-pushed the sedwards/runtime-tracing branch from 7b71c90 to 40ad34f Compare August 1, 2025 19:55
Upstream from our internal project; including with papa tracing.

Adds a bunch of tests for verification of behaviour.
@steve-the-edwards steve-the-edwards force-pushed the sedwards/runtime-tracing branch from 40ad34f to 629bbe5 Compare August 1, 2025 20:18
@steve-the-edwards steve-the-edwards marked this pull request as ready for review August 1, 2025 20:21
@steve-the-edwards steve-the-edwards requested a review from rjrjr August 1, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants