Skip to content

memcpy performance varies wildly depending on file size and also from run to run #29

@travisdowns

Description

@travisdowns

When benchmarking files of different sizes, I saw a huge variation in memcpy performance. On my machine "large" memcpy (i.e,. much larger than L3, like 100 MB) runs at about 10 - 11 GB/s, and many times lzbench reports that, but then for even larger files the performance often drops by an order of magnitude (e.g., 1 GB/s). The effect isn't consistent - for very large files (say 1 GB) it usually happens, and for smaller files it usually doesn't, but there are exceptions on both sides (e.g., if you run it a few times with smaller files you'll get some runs with bad performance, etc).

Back to back runs often tend to show improvements, e.g, run 1 might get you 1 GB/s, then 2 GB/s, then 5 GB/s, then it will stay there.

Similarly, the performance sometimes affected only the "compression" side of memcpy, sometimes only the "decompression" side, and often both (i.e., you'd get something like 1 GB/s compression, 10 GB/s decomp, or vice-versa).

I traced this down to the way the buffers are allocated - the file, comp buffers use malloc and the decomp uses calloc. The issue is that for large mallocs (and sometimes, for large callocs) the memory isn't commited by the OS - it will be committed on first access. So the first algorithm to run pays a large penalty to page-in the entire buffer.

So why doesn't this always bite? Why does the performance differ from run to run? It comes down to the DEFAULT_LOOP_TIME (100 ms) - if an algorithm executes in less than that it gets a second run, which will run at full speed, and since FASTEST is the default mode for picking a time, you get a full speed result. So somewhere between 100 MB and 1,000 MB on my box, the first memcpy run starts taking more than 100 ms, and hence doesn't get a second run and the slow time is reported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions