Skip to content

Add start_trace and stop_trace API in profiler #8743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 27, 2025
Merged

Conversation

lsy323
Copy link
Collaborator

@lsy323 lsy323 commented Feb 26, 2025

Add start_trace and stop_trace APIs to programmatically start and stop profiling session. Before this PR, we can only start the profiling with a time duration, or within a context manager. This support allows better control over the profiling session.

The implementation is based on the profiler implementation in JAX.

Example usage:

server = xp.start_server(8001)
xp.start_trace(profilng_dir)
# Run some computation
...
xp.stop_trace()

@lsy323 lsy323 changed the title Support programmatically start and stop profiling session Add start_trace and stop_trace API in profiler Feb 26, 2025
@lsy323 lsy323 marked this pull request as ready for review February 26, 2025 07:39
@tengyifei
Copy link
Collaborator

That's amazing

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Siyuan!

@miladm
Copy link
Collaborator

miladm commented Feb 26, 2025

@lsy323

  • plz add this form of profiling to our documentations?
  • I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

cc @mikegre-google

@miladm miladm self-requested a review February 26, 2025 22:25
y.cpu()


class TestProfilerSession(absltest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a long-duration profile test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we need a long-duration profile test in torchxla. The goal of this PR is to provide better usability to users, the capability of profiler is out of scope of this PR (Should be in the underlying tsl library)

@lsy323
Copy link
Collaborator Author

lsy323 commented Feb 26, 2025

@lsy323

  • plz add this form of profiling to our documentations?
  • I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

cc @mikegre-google

plz add this form of profiling to our documentations?

Let me add in a follow up PR I just realized we don't have a user guide on how to use the profiler.

I wonder how large of a profile file we can create. Do we know the user experience impact if the profiling duration is super long?

If the profiling time is super long, the traced content will be omitted in the tensorboard.
image

@lsy323 lsy323 merged commit b4ba17b into master Feb 27, 2025
23 checks passed
@lsy323 lsy323 deleted the lsiyuan/profiler branch February 27, 2025 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants