[Linux] Getting a process' swap usage is much slower than possible #2173

joostmeulenbeld · 2022-11-17T16:29:53Z

Summary

Getting a process' swap usage is slow because it uses the /proc/<pid>/smaps_rollup file. It can be made much faster (5x-4000x) by using the /proc/<pid>/status file.

OS: Linux
Type: performance

Description

I'm interested in the swap usage of my process. To get this information from psutil, I can call psutil.Process().memory_full_info().swap. However, this internally reads the /proc/<pid>/smaps_rollup file (as found in the source) which takes quite long to read if the process has a lot of allocated memory: for a process with a 34GB RAM allocated it takes 180ms on my system. For my usecase where I want to regularly (i.e. 1 times per second) check memory usage this takes too long.

Possible solution

The swap usage of a process can also be retrieved by looking at the /proc/<pid>/status file and extracting the VmSwap entry. Reading the status file is much faster than reading the smaps_rollup file - depending on if the process has a large amount of memory allocated it is 5x-4000x as fast in my benchmarks (see below).

Implementation

I'm not too sure about implementation. Integrating it in memory_info() would make that function slower, even if all information it currently provides is retrieved from status instead of statm. A separate function for the Process class is another possibility.

I have yet to check if the status file also contains all info required for the memory_full_info() method.

Benchmark

The below python script shows the difference in read times of the two files at three points in time:

At start of the script
After allocating a large object (34GB in this case)
After allocating an additional small object (5kb)

To test the two files, change the path variable to one of the two files and run the script. This is to make sure no caching happens between the different files.

I ran this on a laptop with 32GB physical RAM and 16GB swap.

import os
import time

fpath_statm = f"/proc/{os.getpid()}/statm"
fpath_status = f"/proc/{os.getpid()}/status"
fpath_smaps_rollup = f"/proc/{os.getpid()}/smaps_rollup"

path = fpath_statm


def benchmark():
    for _ in range(2):  # do it twice to show effect of caching
        t = time.perf_counter()
        with open(path, "rb") as f: f.read()
        print(f"    {path} read time: {time.perf_counter() - t:e}")


print("Read times with small memory usage")
benchmark()

large_string = "a" * 34_000_000_000  # 34GB string
print("Read times with large memory usage")
benchmark()

small_string = "a" * 5000  # 5kB - small but larger than page size
print("Read times after allocating an extra small string")
benchmark()

Output for smaps_rollup file:

Read times with small memory usage
    /proc/143543/status read time: 5.002200e-05
    /proc/143543/status read time: 2.946800e-05
Read times with large memory usage
    /proc/143543/status read time: 1.303925e-02
    /proc/143543/status read time: 4.718400e-05
Read times after allocating an extra small string
    /proc/143543/status read time: 4.437401e-05
    /proc/143543/status read time: 4.429000e-05

The status file:

Read times with small memory usage
    /proc/143278/status read time: 3.837700e-05
    /proc/143278/status read time: 2.317900e-05
Read times with large memory usage
    /proc/143278/status read time: 1.318608e-02
    /proc/143278/status read time: 4.085100e-05
Read times after allocating an extra small string
    /proc/143278/status read time: 4.049400e-05
    /proc/143278/status read time: 3.431900e-05

The statm file:

Read times with small memory usage
    /proc/144956/statm read time: 4.296300e-05
    /proc/144956/statm read time: 2.520000e-05
Read times with large memory usage
    /proc/144956/statm read time: 1.455946e-02
    /proc/144956/statm read time: 3.651700e-05
Read times after allocating an extra small string
    /proc/144956/statm read time: 1.933900e-05
    /proc/144956/statm read time: 1.408800e-05

Observations:

the status file is always much faster to read than the smaps_rollup file
After allocating a large array, both files take longer to read.
The smaps_rollup keeps taking longer to read after a large memory allocation happened, while the status file seems to do some caching, and becomes fast to read the second time. It stays fast to read when allocating an extra small amount of memory.
After allocating a large object, both files take longer to read
There is some caching happening: reading the same file a second time takes shorter. This is especially the case for the status file after allocating a lot of memory.

The text was updated successfully, but these errors were encountered:

joostmeulenbeld added the enhancement label Nov 17, 2022

github-actions bot added linux performance labels Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Linux] Getting a process' swap usage is much slower than possible #2173

[Linux] Getting a process' swap usage is much slower than possible #2173

joostmeulenbeld commented Nov 17, 2022 •

edited

Loading

[Linux] Getting a process' swap usage is much slower than possible #2173

[Linux] Getting a process' swap usage is much slower than possible #2173

Comments

joostmeulenbeld commented Nov 17, 2022 • edited Loading

Summary

Description

Possible solution

Implementation

Benchmark

joostmeulenbeld commented Nov 17, 2022 •

edited

Loading