Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workqueue latency observation tool #4897

Merged
merged 2 commits into from
Feb 12, 2024

Conversation

jackygam2001
Copy link
Contributor

add tool to observe work's waiting latency on kernel's workqueue

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a useful tool. Sometimes during production we do wonder whether workqueue latency is too long which amplifies the chance for race condition, etc.

Two things:
First, could you add the tool with brief description into README.md (CPU and Scheduler Tools)
Second, sometimes we want to identify which original task is enqueued which caused long latency. So it might be useful to add another distribution factor (tid (together with its comm)).

What do you think?

@jackygam2001
Copy link
Contributor Author

I think this is a useful tool. Sometimes during production we do wonder whether workqueue latency is too long which amplifies the chance for race condition, etc.

Two things: First, could you add the tool with brief description into README.md (CPU and Scheduler Tools) Second, sometimes we want to identify which original task is enqueued which caused long latency. So it might be useful to add another distribution factor (tid (together with its comm)).

What do you think?

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

@yonghong-song
Copy link
Collaborator

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

Currently we have '-W' option to print histogram based on different workqueues. So we have

   workqueue1:
      <histogram>
   workqueue2:
     <histogram>
   ...

What I means is to add '-P' option to print histogram based on PIDs. (I think PID (process id) granularity is good enough).

   pid1:
      <histogram>
   pid2:
      <histogram>

If both -W and -P are specified, we can have

   workqueue1:
      pid1:
         <histogram>
      pid2:
         <histogram>
   workqueue2:
      pid3:
          <histogram>
      pid4:
          <histogram>
   ...

Similar to '-W', '-p ' is also supported, so we only care a particular pid. So we have support of the following combinations:

   . -W && -P
   . -W && -p <pid>
   . -w <workqueue> && -P
   . -w <workqueue> && -p <pid>

What do you think?

@jackygam2001
Copy link
Contributor Author

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

Currently we have '-W' option to print histogram based on different workqueues. So we have

   workqueue1:
      <histogram>
   workqueue2:
     <histogram>
   ...

What I means is to add '-P' option to print histogram based on PIDs. (I think PID (process id) granularity is good enough).

   pid1:
      <histogram>
   pid2:
      <histogram>

If both -W and -P are specified, we can have

   workqueue1:
      pid1:
         <histogram>
      pid2:
         <histogram>
   workqueue2:
      pid3:
          <histogram>
      pid4:
          <histogram>
   ...

Similar to '-W', '-p ' is also supported, so we only care a particular pid. So we have support of the following combinations:

   . -W && -P
   . -W && -p <pid>
   . -w <workqueue> && -P
   . -w <workqueue> && -p <pid>

What do you think?

-P option you mentioned in some cases does not make senses since kernel may commit the work to workqueue in interrupt context; and in the case histogram base on PID is not the real process which commit the work, right?

@yonghong-song
Copy link
Collaborator

-P option you mentioned in some cases does not make senses since kernel may commit the work to workqueue in interrupt context; and in the case histogram base on PID is not the real process which commit the work, right?

Good point. I guess let us not do this PID thing now. I will do some investigation about different percentages from process context, softirq context, or others. I agree if majority is not from process context, then PID histogram probably not useful. Even if quite some workqueue works from process context, we might still want to filter out those not from process context.

@yonghong-song yonghong-song merged commit f798668 into iovisor:master Feb 12, 2024
dkruces pushed a commit to dkruces/bcc that referenced this pull request Nov 28, 2024
Add workqueue latency observation tool.
dkruces pushed a commit to dkruces/bcc that referenced this pull request Mar 18, 2025
Add workqueue latency observation tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants