memcpy performance varies wildly depending on file size and also from run to run

When benchmarking files of different sizes, I saw a huge variation in `memcpy` performance.  On my machine "large" `memcpy` (i.e,. much larger than L3, like 100 MB) runs at about 10 - 11 GB/s, and many times `lzbench` reports that, but then for even larger files the performance often drops by an order of magnitude (e.g., 1 GB/s). The effect isn't consistent - for very large files (say 1 GB) it _usually_ happens, and for smaller files it _usually_ doesn't, but there are exceptions on both sides (e.g., if you run it a few times with smaller files you'll get some runs with bad performance, etc).

Back to back runs often tend to show improvements, e.g, run 1 might get you 1 GB/s, then 2 GB/s, then 5 GB/s, then it will stay there.

Similarly, the performance sometimes affected only the "compression" side of `memcpy`, sometimes only the "decompression" side, and often both (i.e., you'd get something like 1 GB/s compression, 10 GB/s decomp, or vice-versa).

I traced this down to the way the buffers are allocated - the file, comp buffers use `malloc` and the decomp uses `calloc`. The issue is that for large `malloc`s (and sometimes, for large `calloc`s) the memory isn't commited by the OS - it will be committed on first access. So the first algorithm to run pays a large penalty to page-in the entire buffer.

So why doesn't this _always_ bite? Why does the performance differ from run to run? It comes down to the `DEFAULT_LOOP_TIME` (100 ms) - if an algorithm executes in less than that it gets a second run, which will run at full speed, and since `FASTEST` is the default mode for picking a time, you get a full speed result. So somewhere between 100 MB and 1,000 MB on my box, the first `memcpy` run starts taking more than 100 ms, and hence doesn't get a second run and the slow time is reported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memcpy performance varies wildly depending on file size and also from run to run #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

memcpy performance varies wildly depending on file size and also from run to run #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions