Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace2: add child start and exit events #1131

Merged

Conversation

ldennington
Copy link
Contributor

@ldennington ldennington commented Feb 27, 2023

This change adds the child_start and child_exit events to GCM's TRACE2 tracing system.

The first two patches are small bug fixes. The first fix ensures all events are written in snake case. Up until this point, we've only written single word events, but the TRACE2 convention for multi-word events is to write in snake case, so for simplicity we update all events to use this convention. The second fix ensures we do not redirect standard error for Git-related processes, as this causes deadlocks with TRACE2 tracing enabled.

The third patch contains a refactor of the Trace2 class necessary to support child_start and child_exit.
The main change is addition of an Initialize() method to Trace2.cs to detect whether TRACE2 tracing is enabled and to set long-lived variables. This also sets a new _initialized property that allows us to determine whether TRACE2 tracing has been initialized. GCM TRACE2 tracing is disabled until _initialized is set to true by the Initialize() method (although it is worth noting that Git's TRACE2 tracing is not disabled and will be reflected in logs).

The third patch pulls Sid-related logic into a separate SidManager class to improve separation of concerns and make Sid logic less error-prone/easier to reason about.

The fourth patch adds a ChildProcess class that wraps Process and gives us the required control to write process-related TRACE2 events. It also decouples process-related logic from environment logic to maintain a functional, non-circular order of dependencies with the new wrapper. All Process invocations have been updated to use ChildProcess.

The fifth patch adds the appropriate Trace2 methods to write child_start and child_exit events.

@ldennington ldennington force-pushed the trace2-child-start-and-exit branch 4 times, most recently from 19e3610 to b748de3 Compare February 28, 2023 19:47
@ldennington ldennington self-assigned this Feb 28, 2023
@ldennington ldennington marked this pull request as ready for review February 28, 2023 19:51
@ldennington ldennington marked this pull request as draft February 28, 2023 20:13
@ldennington ldennington force-pushed the trace2-child-start-and-exit branch 2 times, most recently from c01560c to 312c9d8 Compare February 28, 2023 21:22
@ldennington ldennington marked this pull request as ready for review February 28, 2023 22:12
@jeffhostetler
Copy link

I'm not sure I understand what you're doing with the SidManager class. A first skim makes me think
you're making it more complicated that it needs to be.

Basically, when your process starts it needs to read the value of GIT_TRACE2_PARENT_SID from the
environment. That is GCM.exe's parent's SID. In GCM.exe (or in any of the helper exe's), read that
value, append a GUID (prepended with a slash if the parent's is not empty), and save that in a process
global AND update the value of GIT_TRACE2_PARENT_SID in the environment of your process
(so that any child processes that you create will inherit the new value).

Then if GCM.exe launches SomeUI.exe, SomeUI.exe will inherit that extended value. (If you pass
an explicit environment vector to CreateProcess() rather than letting the child inherit, use the
updated SID value. Then when the SomeUI.exe runs, the value of GIT_TRACE2_PARENT_SID that
IT sees will be that of GCM.exe and it can add it's own "/" when the UI process creates
its own SID (and sets a global and updates its environment).

The net-net is that you don't need to build custom SIDs for each child nor do any other SID
manipulation. Just set up yours and let the children set up theirs. (That's assuming that I
understood the code, which I'm beginning to doubt...)

I'm not sure I understood the thread references in the commit message.

I'm also concerned about how the different uses of the SidManager.Append() and .Remove()
calls. They are updating global state. Which makes me think there needs to be locking, but
again, I'm not sure how these calls are used. In the way I describe above, the ctor() and Append()
would only be called once from main() in GCM.exe and never updated again, so we wouldn't need
locking.

@ldennington ldennington force-pushed the trace2-child-start-and-exit branch from 312c9d8 to 9bf4dc2 Compare March 2, 2023 19:25
@ldennington
Copy link
Contributor Author

I'm not sure I understand what you're doing with the SidManager class. A first skim makes me think you're making it more complicated that it needs to be.

You were right - I went ahead and dropped that commit. The sid for a child process is now appended using the SetSid method, called from the ChildProcess class to ensure that it can be passed to each child process-related Trace2 method.

We could also set the sid in Trace2.cs, but then we'd have to modify Trace2's _sid variable both to append to the child process GUID and remove the appended value which felt a bit messy/dangerous.

Copy link
Collaborator

@mjcheetham mjcheetham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! I have some comments initially after a first pass.

src/shared/Core.Tests/StringExtensionsTests.cs Outdated Show resolved Hide resolved
src/shared/Core.Tests/StringExtensionsTests.cs Outdated Show resolved Hide resolved
src/shared/Core/StringExtensions.cs Outdated Show resolved Hide resolved
src/shared/Core/Trace2.cs Show resolved Hide resolved
src/shared/Core.UI.Avalonia/Controls/AboutWindow.axaml.cs Outdated Show resolved Hide resolved
src/shared/Core.Tests/WslUtilsTests.cs Outdated Show resolved Hide resolved
src/shared/Core/Diagnostics/EnvironmentDiagnostic.cs Outdated Show resolved Hide resolved
src/shared/Core/Interop/Posix/PosixProcessManager.cs Outdated Show resolved Hide resolved
src/shared/Core/Trace2.cs Outdated Show resolved Hide resolved
@jeffhostetler
Copy link

I'm not sure I understand what you're doing with the SidManager class. A first skim makes me think you're making it more complicated that it needs to be.

You were right - I went ahead and dropped that commit. The sid for a child process is now appended using the SetSid method, called from the ChildProcess class to ensure that it can be passed to each child process-related Trace2 method.

We could also set the sid in Trace2.cs, but then we'd have to modify Trace2's _sid variable both to append to the child process GUID and remove the appended value which felt a bit messy/dangerous.

I think the solution is actually simpler than this. All events generated by the GCM.exe process
should have the same SID on them. The child_start and child_exit events are written from
the point of view of the invoking process (in this case GCM.exe) -- in a sense, GCM is saying that
IT is creating a child process (and may or may not wait for it to finish) and here are the args
that it is using. This is the "outer span" on the lifetime of the child process. That is, the
time from CreateProcess() thru WaitForSingleObject(). I digress, the point is that the SID needs
to be that of GCM.exe so that we can attach it to GCM's trace graph.

The child process itself should receive the inherited parent-SID and compute it's own and use that
for the child's events. So when the child process does it's version, start, and exit events
with the child's new SID, we get the "internal span" on the lifetime of the child process. That is,
the internal time between main() and exit()/return. The child's data with the child's SID will be
used to create a new and independent trace graph for the child's lifetime.

The two trace graphs will be stitched together in the datastore after both have been converted
to OTLP (using OTel TraceIDs/SpanIDs synthesized from the two SIDs). (And the outer and
inner lifetime spans will be adjacent, but distinct, in the combined graph.)

If the child process is not Trace2-aware, then there will not be a child trace graph for it,
so the parent trace graph needs to be self-sufficient. That is, we'll only see the outer span.
Which is the case for Git hooks which are usually shell scripts. But there's nothing we can
do about that.

Anyway, hope this helps. And sorry for the confusion.

@jeffhostetler
Copy link

I don't understand the need for any of the snake casing changes, but then again, my C# is rusty.

It seems to me that we could (a) continue with the existing model and just add new events to the enum:

public enum Trace2Event {
    [EnumMember(Value = "version")]
    Version,
    ...
    [EnumMember(Value = "child_start")]
    ChildStart,
    ...
}

Or (b) since the set of enums is very fixed and not changing and we already have to have a
per-event Trace2Message derived class (like StartMessage), we could just have a
static const string event name field in each of those derived classes set to the string literal
of the event name. Then have the ToJson() or ToNormal() formatters just reference
the constant or pass it to the formatters in the abstract class.

Or (c) do (b) and update the Event property getter in the abstract class to call a function
to get the string constant from the derived class.

@mjcheetham
Copy link
Collaborator

mjcheetham commented Mar 3, 2023

The child process itself should receive the inherited parent-SID and compute it's own and use that
for the child's events. So when the child process does it's version, start, and exit events
with the child's new SID, we get the "internal span" on the lifetime of the child process. That is,
the internal time between main() and exit()/return. The child's data with the child's SID will be
used to create a new and independent trace graph for the child's lifetime.

Just to check my understanding here, this would look like this in pseudo GCM code?

// Program.cs
static void Main(string[] args)
{
    using var context = new CommandContext();
    using var app = new Application(context);

    context.Trace2.Initialize(context.Environment, context.Settings.GetTrace2Settings(), DateTimeOffset.UtcNow);
    context.Trace2.Start(context.ApplicationPath, args);

    var exitCode = app.RunAsync(args)....;

    context.Trace2.Stop(exitCode);
}
// CommandContext.cs
public CommandContext()
{
    ...
    Trace2 = new Trace2();
    ...
}
// Trace2.cs
private string _sid;

public void Initialize(IEnvironment env, ...)
{
    ...
    _sid = CreateSid(env);
    env.SetEnvironmentVariable("GIT_TRACE2_PARENT_SID", _sid);
}

private static string CreateSid(IEnvironment env)
{
    if (env.Variables.TryGet("GIT_TRACE2_PARENT_SID", out string parentSid))
    {
        // We are a child process; append to the parent to create our new SID
        return $"{parentSid}/{Guid.NewGuid():D}";
    }
    else
    {
        // We are the root process; create our own 'root' SID
        return Guid.NewGuid().ToString("D");
    }
}

public void Start(...)
{
    ...
    WriteVersion(...);
    ...
}

private void WriteVersion(...)
{
    WriteMessage(
        new VersionMessage(...)
        {
            Sid = _sid
        }
    );
}

public void WriteChildStart(...)
{
    WriteMessage(
        new ChildStartMessage(...)
        {
            Sid = _sid // this is OUR sid, the child will compute its own
        }
    );
}

@mjcheetham
Copy link
Collaborator

Or (b) since the set of enums is very fixed and not changing and we already have to have a
per-event Trace2Message derived class (like StartMessage), we could just have a
static const string event name field in each of those derived classes set to the string literal
of the event name. Then have the ToJson() or ToNormal() formatters just reference
the constant or pass it to the formatters in the abstract class.

Something like this shouldn't be too hard to maintain?

private static string GetEventName(Trace2Event event)
{
    return event switch
    {
        Trace2Event.Version => "version",
        Trace2Event.ChildStart => "child_start",
        ...
        _ => throw new ArgumentOutOfRangeException()
// OR if we don't want to die..
        _ => event.ToString()
    };
}

@jeffhostetler
Copy link

WRT the SID computation, yes.

I saw your note over the weekend and realized that perhaps the original confusion lies in my use of English when naming the event -- a noun-verb vs verb-noun issue, if you will. Think of the "child-start" event as "i'm-starting-a-child" and likewise for "child-exit" is actually a "my-child-exited-with-code". The parent emits those events and does so with it's SID. If the child is Trace2-aware, it will create it's own SID using the append trick; if the child is not Trace2 aware, we won't get any telemetry from it.

Sorry for the confusion. That naming scheme keep the event pairs symmetric "foo-start" and "foo-exit", "region-enter" and "region-leave" and etc.

@mjcheetham
Copy link
Collaborator

That naming scheme keep the event pairs symmetric "foo-start" and "foo-exit", "region-enter" and "region-leave" and etc.

Interesting question.. what should we do, if anything, about tracing child process starts that we don't subsequently wait for? A child-start without a child-end?

The example we have is clicking a link and having it open a web browser, for example.
I'd imagine we don't care about this, and every -start event should only be emitted when we can pair it with an -end event.

@jeffhostetler
Copy link

I have a "child-ready" event for children that we start and release into the wild. I use it for example when something like "git fsmonitor--daemon start" disassociates with the TTY and spawns a background process "git fsmonitor--daemon run".

The idea was that I wanted to be able to see that the child successfully got started (the exe was found and we had 0755 access and etc) -- even if it immediately died because of a bad CL arg or something.

In the fsmonitor example, the test suite can use "... start" and know that that process will do whatever is necessary (and not return before) the actual "run" process gets going. the test can then talk to the background "run" process over the socket/pipe/whatever -- and not have to worry about spinning until it is ready. (and not have to rely on polling stdin/out or anything).

@jeffhostetler
Copy link

So we should see either a "child-start" ... "child-exit" -or- "child-start" ... "child-ready" pair. If the child cannot get started (bad exe or accesss or whatever), emit a "child-exit" with the usual exit code.

All of this is a bit hacky, but given the unix and windows process models, my hands were kinda tied...

@jeffhostetler
Copy link

private static string GetEventName(Trace2Event event)
{
return event switch
{
Trace2Event.Version => "version",
Trace2Event.ChildStart => "child_start",
...

Yeah, that's one way of doing it that is relatively easy to maintain and efficient.
I do question the need for the Trace2Event type at all, but that may just be style
and/or might help in the testing, so that's OK.

I'm just wondering whether we actually use the enum anywhere else? I mean, in the following explicitly pass the event type to the initializer following the ctor.

   public void WriteChildStart(DateTimeOffset startTime,
        Trace2ProcessClass processClass,
        bool useShell,
        string appName,
        string argv,
        string sid,
        string filePath = "",
        int lineNumber = 0)
    {
        ...  
        WriteMessage(new ChildStartMessage()
        {
            Event = Trace2Event.ChildStart,
            Sid = sid,
            Time = startTime,
            File = Path.GetFileName(filePath).ToLower(),
            Line = lineNumber,
            Id = ++_childProcCounter,
            Classification = processClass,
            UseShell = useShell,
            Argv = procArgs
        });
        ...

The call to the ChildStartMessage() ctor already knows it is a child start event (and the set of required fields for that message is already encoded in the class definition). The spelling of the event name could also be hidden inside that message class. Passing in an event type to the ctor (or whatever we call the property bucket attached to the trivial ctor call), seems like an opportunity for error.

Is it possible to do something like (my C# is rusty):

public class ChildStartMessage : Trace2Message
{
    static const MyEventName = "child_start"
    public ChildStartMessage()
        : Trace2Message(MyEventName)
    {
    }
    ...

and let the rest of the initializer (following the new()) take care of the rest?

Just a thought and I could go either way here.

@ldennington
Copy link
Contributor Author

public void WriteChildStart(...)
{
    WriteMessage(
        new ChildStartMessage(...)
        {
            Sid = _sid // this is OUR sid, the child will compute its own
        }
    );
}

@mjcheetham - since the child is computing its own Sid and CreateSid is private, do we need to start passing an IEnvironment to ChildProcess? Will it have its own CreateSid method? Just want to clarify before going down that path.

@ldennington ldennington force-pushed the trace2-child-start-and-exit branch 4 times, most recently from ff231e1 to c829f59 Compare March 9, 2023 19:13
@ldennington
Copy link
Contributor Author

@mjcheetham, @jeffhostetler - I believe this latest series addresses all your feedback. A couple things to call out:

  1. I decided to keep the snake casing patch. Although we don't necessarily need the Trace2Event enum (as Jeff pointed out), we do (in my opinion) need the new Trace2ProcessClass to structure the information child process callers are passing in. Leaving the Trace2Event enum just follows this pattern for consistency.
  2. I also kept the SidManager around, although it's now a static class that is invoked directly from Program.cs so that we immediately set the environment variable. This means that invoked Git processes will have the correct Sid, even if they're called before TRACE2 is initialized. I also adjusted so that the start time is recorded just as the application starts.

Copy link

@jeffhostetler jeffhostetler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Thanks for putting up with me and all my comments. 😄

Create static class to manage session ids (sids). This allows us to
immediately set up GCM's sid in Program.cs to ensure Git processes that
are executed before Trace2 tracing is set up have the correct parent sid.
We also use it to give helper exes unique child sids.

Although this will also execute when TRACE2 tracing is not enabled, it
should not be an issue - it will simply set the (unused)
GIT_TRACE2_PARENT_SID varible for the process to a GUID.
Our current strategy of using a Stream to write to files for Trace2
tracing is not working - Git processes are overwriting GCM's data and vice
versa.

To resolve this, switch from using the Trace2StreamWriter to write to
files to a dedicated Trace2FileWriter class. This also includes a refactor
to add a parent base class for Trace2Writers for shared functaionality
between the Trace2StreamWriter and Trace2FileWriter.
The TRACE2 convention is for multi-word event names to be written in snake
case (all lower case with words separated by underscores). While up until
this point we've had single-word event names, this series will add the
multi-word child start and child exit events. In anticipation of that,
this change adds a string extension that will be used in later patches to
write multi-word enum values in snake case.
When a new process is created, standard error can optionally be
redirected. This is useful in situations where the output is verbose and
should be suppressed rather than output to the console, for example.

However, this can cause an issue with TRACE2 deadlocks, as Git writes to
standard error and GCM does nothing. This change corrects this by not
redirecting standard error for Git-related processes (which also extends
to places such as Environment.cs, given Git's reliance on envars).

Finally, there are some places where we should still redirect standard
error for the reason mentioned above. For these scenarios, comments have
been added to help readers understand the purposeful nature of these
redirects.
Diagnostics classes are dependent on multiple pieces of the
CommandContext. Adding child start and child exit events will mean yet
another dependency (Trace2). To simplify in anticipation of this, update
Diagnostic classes to take in the CommandContext and access applicable
dependencies from it.
The TRACE2 child start and child exit events will require some refactoring
around the way GCM handles processes to provide finer-grained control and
the ability to capture process-related information.

In preparation for this, we need to do some re-organization of the Trace2
code. With this change we:

1. Move TRACE2 tracing startup into Program.cs for GCM and the helper
exes. This allows us to start tracing earlier and capture more events.
2. Add an Initialize() method to Trace2.cs to detect whether Trace2
tracing is enabled and set long-lived variables. This also sets a new
_initialized property that allows us to determine whether Trace2 tracing
has been initialized. Trace2 tracing is disabled until _initialized is set
to true by the Initialize() method.
3. Change the name of TryParseSettings() to InitializeWriters() for
improved clarity.
4. Pass the CommandContext to Trace2 to use its Streams and Environment.
@ldennington ldennington force-pushed the trace2-child-start-and-exit branch from c829f59 to 493154f Compare March 9, 2023 23:27
Copy link
Collaborator

@mjcheetham mjcheetham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Happy to merge as is.

src/shared/Core.UI.Avalonia/Controls/AboutWindow.axaml.cs Outdated Show resolved Hide resolved
src/shared/Core/ChildProcess.cs Outdated Show resolved Hide resolved
Lessley Dennington added 2 commits March 10, 2023 15:12
Wrap the Process class to allow finer-grained control for writing future
process-related TRACE2 events.

Pull process-relevant information out of IEnvironment/Environment and into
its own process-oriented ProcessManager class, interface, and child
classes. This allows us to  maintain a functional, non-circular order
of dependencies with our process wrapper.

Add a ChildProcess class to wrap certain functionality of Process.cs.
This gives us the level of control required to write child process
information to TRACE2 in the final patch in this series. Update all
current invocations of Process to instead use this wrapper.
Write the TRACE2 child_start and child_exit events for each child
process invoked by GCM.
@ldennington ldennington force-pushed the trace2-child-start-and-exit branch from 493154f to 336278c Compare March 10, 2023 22:19
@ldennington ldennington merged commit f7c9fd0 into git-ecosystem:main Mar 10, 2023
mjcheetham added a commit that referenced this pull request May 2, 2023
**Changes:**

- Support ports in URL-scoped config (#825)
- Support URL-scoped enterprise default settings (#1149)
- Add support for client TLS certificates (#1152)
- Add TRACE2 support(#1131, #1151, #1156, #1162)
- Better browser detection inside of WSL (#1148)
- Handle expired OAuth refresh token for generic auth (#1196)
- Target *-latest runner images in CI workflow (#1178)
- Various bug fixes:
  - Ensure we create a WindowsProcessManager on Windows (#1146)
  - Ensure we start child processes created with ProcessManager (#1177)
  - Fix app path name of Windows dropping file extension (#1181)
  - Ensure we init IEnvironment before SessionManager (#1167)
  - git: consistently read from stdout before exit wait (#1136)
  - trace2: guard against null pipe client in dispose (#1135)
- Make Avalonia UI the default Windows and move to in-process (#1207)
- Add Git configuration options for trace & debug (#1228)
- Transition from Nerdbank.GitVersioning to a version file (#1231)
- Add support for using the current Windows user for WAM on DevBox
(#1197)
- Various documentation updates:
  - org-rename: update references to GitCredentialManager (#1141)
  - issue templates: remove core suffix (#1180)
  - readme: add link to project roadmap (#1204)
  - docs: add bitbucket app password requirements (#1213)
  - .net tool: clarify install instructions (#1126)
  - docs: call out different GCM install paths in WSL docs (#1168)
  - docs: add trace2 to config/env documentation (#1230)
@ldennington ldennington deleted the trace2-child-start-and-exit branch July 12, 2023 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants