Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wpilib] Tracer Overhaul #7099

Open
wants to merge 66 commits into
base: main
Choose a base branch
from
Open

Conversation

oh-yes-0-fps
Copy link
Contributor

@oh-yes-0-fps oh-yes-0-fps commented Sep 19, 2024

Example usage kinda

resolves #6583
resolves #609

@oh-yes-0-fps oh-yes-0-fps requested a review from a team as a code owner September 19, 2024 05:11
@oh-yes-0-fps oh-yes-0-fps requested a review from a team as a code owner September 19, 2024 05:12
@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

I don't think Tracer should be a static global, as it is used in multiple places that shouldn't have mixed outputs.

@oh-yes-0-fps
Copy link
Contributor Author

oh-yes-0-fps commented Sep 19, 2024

I don't think Tracer should be a static global, as it is used in multiple places that shouldn't have mixed outputs.

It's being used differently here, it's similar to a flame graph with a cascading list of durations.
There are no mixed outputs, the tracer covers the whole robot and everything that happens during its runtime.
Also the tracer is more of ThreadLocal static and supports tracing multiple threads at once or restricting it to 1 thread.
It also has some tooling to attempt to isolate gc time from execution time but that can be turned off on a per thread basis.
image

@oh-yes-0-fps
Copy link
Contributor Author

If performance is a concern we can make it so after a certain number of nests it is sent to only datalog instead.

@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

Ok, that's pretty cool.

A couple things:

  • Tracer is part of the public API, so we will need to do a deprecation cycle on the old interface

  • I think we should talk about how we want to handle telemetry like this; much like alerts we don't want it to be printed to the console, but it needs to be obvious how to access it. I also don't usually like tying things to NT directly, but maybe that's fine for now.

  • have you done any benchmarks on the rio? I doubt performance will be a significant regression from the current implementation but it'd be good to double check.

@oh-yes-0-fps
Copy link
Contributor Author

Performance wise it's quite good, we ran a version close to this all of the 2024 season based on some cursory testing it seems to use less than a ms per loop. The current implementation is more refined with 1/3 as many string allocs so it should be retested.

I don't like doing 1 topic per duration but there isn't really a way around it.

In terms of displaying to user im unsure of better ways to convey information. The watchdog will still print when overruns happen but not the epoch times. With glass, ascope and maybe elastic being packaged with wpilib I think teams should be expected to be able to use a dashboard to take full advantage of wpilib.

@oh-yes-0-fps
Copy link
Contributor Author

oh-yes-0-fps commented Sep 19, 2024

In terms of deprecation of the old tracer api, we could just name this something else while still removing tracer from watchdog. Profiler isn't a terrible name but also not great.

Whats the etiquette on just leaving the old api with deprecation warnings and just having all the functions no-op or smthn until we can remove it.

@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

Whats the etiquette on just leaving the old api with deprecation warnings and just having all the functions no-op or smthn until we can remove it.

That's worse than removing it without deprecation.

@KangarooKoala
Copy link
Contributor

Generally the route for changing to a new API is that you make the new API with a different name and deprecate the old one (without doing any other changes to the old version)- The point of deprecation is that you can still use it without breakage. When it comes time to remove the old API (usually a year after deprecation), you completely remove it.
I can't think of any case where you'd want to change methods to no-op, since that's a silent breakage.

@oh-yes-0-fps
Copy link
Contributor Author

So I guess this is a broader question but what is the worry of changing the public API for something when adding it in a new major version? Especially something that is currently not used very much by users.

@calcmogul
Copy link
Member

calcmogul commented Sep 19, 2024

So I guess this is a broader question but what is the worry of changing the public API for something when adding it in a new major version?

Ostensibly, it lets teams migrate their code gradually over the course of a year rather than breaking it all at once. In my experience though, teams either fix the deprecation warnings immediately because they think they're errors, or they don't migrate until we force them to by removing the old API.

At least deprecation warnings provide guidance on how to upgrade though; immediate removal or API change gives no guidance whatsoever and can be frustrating, especially if we broke working code from the previous year.

I'm not sure what we'd name the new class instead though, because Tracer is the most obvious name.

@oh-yes-0-fps
Copy link
Contributor Author

This class's functionality is now static, it wouldn't be hard to make an instance of it to keep the functionality of the old tracer. It would just be confusing but using proper docs and deprecation warnings could help with that.

@oh-yes-0-fps oh-yes-0-fps changed the title [wpilibj] Tracer Overhaul [wpilib] Tracer Overhaul Sep 23, 2024
wpilibc/src/main/native/cpp/Tracer.cpp Outdated Show resolved Hide resolved
wpilibc/src/test/native/cpp/TracerTest.cpp Outdated Show resolved Hide resolved
wpilibc/src/test/native/cpp/TracerTest.cpp Outdated Show resolved Hide resolved
wpilibc/src/test/native/cpp/TracerTest.cpp Outdated Show resolved Hide resolved
wpilibc/src/test/native/cpp/TracerTest.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@Gold856 Gold856 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any documentation for the time units Tracer publishes in?

wpilibc/src/main/native/cpp/Tracer.cpp Outdated Show resolved Hide resolved
@oh-yes-0-fps
Copy link
Contributor Author

Is there any documentation for the time units Tracer publishes in?

Where would the best place to document those be? maybe it should even be placed in NT?

@Gold856
Copy link
Contributor

Gold856 commented Oct 3, 2024

Documentation in NT and the class doc would be good.

wpilibc/src/main/native/cpp/Tracer.cpp Outdated Show resolved Hide resolved
wpilibc/src/main/native/cpp/Tracer.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@Gold856 Gold856 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java has a code example, but C++ doesn't, for parity, C++ should have one too.

Comment on lines +62 to +64
// If the cycle is poisoned, it will warn the user
// and not publish any data
bool m_cyclePoisoned = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the cycle is poisoned, it will warn the user
// and not publish any data
bool m_cyclePoisoned = false;
// If the cycle is poisoned, it will warn the user and not publish any data
bool m_cyclePoisoned = false;

Comment on lines +86 to +90
/*
* If the cycle is poisoned, it will warn the user
* and not publish any data.
*/
boolean m_cyclePoisoned = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
* If the cycle is poisoned, it will warn the user
* and not publish any data.
*/
boolean m_cyclePoisoned = false;
/** If the cycle is poisoned, it will warn the user and not publish any data. */
boolean m_cyclePoisoned = false;

private final ArrayList<String> m_traceStack = new ArrayList<>();

/**
* ideally we only need `traceStack` but in the interest of memory optimization and string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ideally we only need `traceStack` but in the interest of memory optimization and string
* Ideally we only need `traceStack` but in the interest of memory optimization and string

* <p>Example inside a {@code Drive Subsystem}
*
* <pre><code>
* // Subsystem periodics are automaticall traced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* // Subsystem periodics are automaticall traced
* // Subsystem periodics are automatically traced

Comment on lines +173 to +176
// reporting no data is better than bad data
for (var publisher : m_publishers.entrySet()) {
publisher.getValue().set(0.0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// reporting no data is better than bad data
for (var publisher : m_publishers.entrySet()) {
publisher.getValue().set(0.0);
}
// reporting no data is better than bad data
for (var publisher : m_publishers.values()) {
publisher.set(0.0);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add loop timing telemetry
6 participants