Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and code style, refactor, partial rewrite #17

Merged
merged 31 commits into from
Jan 19, 2024
Merged

Improve performance and code style, refactor, partial rewrite #17

merged 31 commits into from
Jan 19, 2024

Conversation

mforthewin
Copy link
Contributor

The program in its current state takes a lot of time to execute compared to programs like ps.

The reasons for this behavior are for example:

  • The usage of nftw(), which results in LOTS of unnecessary newfstatat() syscalls
  • The reading of files one byte at a time
  • Possibly also regex, as this is not needed for the simple format of "/proc/<pid>/stat"

This becomes clear when profiling the syscalls:

$ strace -c zps -s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 69.37    1.386260           4    333730           newfstatat
 10.63    0.212523           8     25761         1 getdents64
 10.41    0.208056           3     68370           read
  3.89    0.077740           5     14517      1171 openat
  3.39    0.067740           2     25748           fcntl
  2.12    0.042305           3     13346           close
  0.10    0.002090           2       705           write
  0.05    0.000917           8       107           brk
  0.01    0.000256          85         3           mprotect
  0.01    0.000139         139         1           munmap
  0.00    0.000098          12         8           mmap
  0.00    0.000039          39         1           clock_gettime
  0.00    0.000038          38         1           getrandom
  0.00    0.000035          35         1           prlimit64
  0.00    0.000034          34         1           rseq
  0.00    0.000021          10         2         1 arch_prctl
  0.00    0.000020          20         1           dup2
  0.00    0.000018          18         1           set_tid_address
  0.00    0.000018          18         1           set_robust_list
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    1.998347           4    482309      1174 total

Those are almost 500k syscalls! If we compare this to the new version of the code, the difference is huge:

$ strace -c build/zps -s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 39.79    0.003438           9       358           read
 37.96    0.003280           9       362           openat
 22.07    0.001907           5       362           close
  0.13    0.000011           3         3           write
  0.05    0.000004           2         2           getdents64
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1         1 ioctl
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         2           dup2
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         4           newfstatat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           prlimit64
  0.00    0.000000           0         1           getrandom
  0.00    0.000000           0         1           rseq
------ ----------- ----------- --------- --------- ----------------
100.00    0.008640           7      1120         3 total

Also looking at real execution times, we can see the following speedup:

$ time zps -s

real	0m1.106s
user	0m0.320s
sys	0m0.743s
$ time build/zps -s

real	0m0.011s
user	0m0.003s
sys	0m0.008s

Another aspect is that having a -f flag along with nftw() is not sensible, as this could happen on a box with 2 GB of RAM:

[arch@archlinux ~]$ ./zps -f 111111111111111
PID     PPID    STATE               NAME COMMAND
Killed
[arch@archlinux ~]$ sudo dmesg | tail -n 5
[ 5856.621333] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-1.scope,task=zps,pid=24883,uid=1000
res=success'
[arch@archlinux ~]$ ./zps -f 11111111111111111
PID     PPID    STATE               NAME COMMAND
=================================================================
==24937==ERROR: AddressSanitizer: out of memory: allocator is trying to allocate 0x136b39e38 bytes
    #0 0x7fedfcee1359 in __interceptor_malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x7fedfcd1fa6e  (/usr/lib/libc.so.6+0x101a6e) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)

==24937==HINT: if you don't care about these errors you may set allocator_may_return_null=1
SUMMARY: AddressSanitizer: out-of-memory /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69 in __interceptor_malloc
==24937==ABORTING

As we can see, it is trying to allocate more than 5.21 GB and the program is getting OOM-killed! This defeats the actual purpose of the program (freeing resources). Thus, I propose these changes to fix any issues of performance or resource usage. Additionally, I addressed TODO#2, and the program now sends a SIGCHLD signal by default (without the reap option). Also, as the static global array in the program was not optimal, a simple vector implementation is used now, eliminating the overflow issues from before. Please see commit b20d954 for more specific details.

Matthäus Wininger added 14 commits December 7, 2023 23:19
This commit addresses TODO#2 by universalizing the functions handling
the zombie processes and sending `SIGCHLD` in the case where the
user did not explicitly tell the program to reap zombies (reflected
in the `terminate` member of our settings. Whenever a signal
is sent, the used signal is displayed to the user.

Some parts of old functions were put into new functions to also
improve readability.

The most substantial improvement lies in the performance aspect
of the program. The use of `nftw()` in the old code was the
biggest bottleneck, as an insane, unnecessary number of syscall
are issued for each process. Thus, the directory and file reading
part was adapted, using manual `readdir()` calls. As the `-f` flag
did not make sense to have in the program anyway (and could lead
to OOM kill), it was also dropped.

Regex was also dropped, as its use was unnecessary and complicated
things, and the parsing of `"/proc/<pid>/stat"` was rewritten.

As kernel processes were not excluded in the program before,
this commit checks for processes which should be left out
of our considerations (PID `2`, as well as its children;
`init` is also excluded).

Display widths for the output are now defined in the accompanying
header, and should be in line with the `proc(5)` documentation.

The code for the prompt was improved with respect to the handling
of invalid user input and coloring.

Finally, to completely remove the unnecessary globals, a vector
structure was defined in the header, which can dynamically grow.
As the old version of the code did not check for overflows in this
context, this is also a great improvement.
@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2023

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (f15cd84) 96.18% compared to head (30985e8) 86.80%.

Files Patch % Lines
src/zps.c 88.29% 24 Missing ⚠️
src/zps.h 73.07% 7 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #17      +/-   ##
==========================================
- Coverage   96.18%   86.80%   -9.38%     
==========================================
  Files           1        2       +1     
  Lines         131      235     +104     
==========================================
+ Hits          126      204      +78     
- Misses          5       31      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mforthewin mforthewin marked this pull request as draft December 18, 2023 05:40
@mforthewin mforthewin marked this pull request as ready for review December 18, 2023 06:32
@orhun
Copy link
Owner

orhun commented Dec 21, 2023

Oh wow, this is huge! zps was my first C project when I started college so it is good to see someone taking it apart years later 🐻

I believe there are a lot of things that I will learn from this code so I will review it carefully. Obviously I'm not a C guru so expect some beginner questions :)

Anyhoo, thanks a lot for this. I just tested it and the speed difference is freaking unbelievable ⚡

src/zps.h Outdated Show resolved Hide resolved
mforthewin and others added 3 commits December 21, 2023 14:26
Co-authored-by: Orhun Parmaksız <orhunparmaksiz@gmail.com>
This will disable the pre-condition assertions in all functions,
which should never be false for releases anyway.
Matthäus Wininger added 2 commits January 8, 2024 23:12
This commit ...

* allows users to directly specify signals to
send to zombie parents (using common signal
numbers, abbreviations and abbreviations with
"sig" in front of them (case-insensitive)

* changes existing command-line options to simpler
ones

* places constraints on the possible combinations
of command-line options used

* adds functionality to detect if the program's
output is, for example, piped to another program
that will automatically disable ANSI color codes

* adds a `--no-color` option to manually disable
the default coloring

The manual page and READMEs would still need
to be updated.
@mforthewin mforthewin changed the title Improve performance and code style, change display, send a SIGCHLD signal by default and remove the -f flag Improve performance and code style, refactor, partial rewrite Jan 8, 2024
@mforthewin mforthewin marked this pull request as draft January 8, 2024 22:28
@mforthewin mforthewin requested a review from orhun January 8, 2024 22:28
@mforthewin
Copy link
Contributor Author

I've also made some changes to the CLI options, adding some new functionalities, such as a signal option. Let me know what you think – the man page and the READMEs would need an update before we should merge this, so I marked it as a draft again.

@orhun
Copy link
Owner

orhun commented Jan 9, 2024

Looks good! Let's go ahead and finalize this, I can provide a review soon.

@mforthewin mforthewin marked this pull request as ready for review January 12, 2024 16:16
@mforthewin
Copy link
Contributor Author

Looks good! Let's go ahead and finalize this, I can provide a review soon.

Alright! I've updated the remaining files and also added updated demo GIFs.

README.md Outdated Show resolved Hide resolved
src/zps.c Show resolved Hide resolved
src/zps.c Show resolved Hide resolved
src/zps.c Show resolved Hide resolved
@mforthewin
Copy link
Contributor Author

Is there anything left to do? Personally, I think this is in a good state now for merging.

@orhun
Copy link
Owner

orhun commented Jan 19, 2024

Yes, this is awesome, thanks a lot once again!

I will create a release now and share it on my socials tomorrow.

@orhun orhun merged commit 76ffe80 into orhun:master Jan 19, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants