Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Memory benchmarks #107

Closed
zbraniecki opened this issue May 28, 2020 · 14 comments · Fixed by #446
Closed

Add Memory benchmarks #107

zbraniecki opened this issue May 28, 2020 · 14 comments · Fixed by #446
Assignees
Labels
A-performance Area: Performance (CPU, Memory) C-meta Component: Relating to ICU4X as a whole T-enhancement Type: Nice-to-have but not required
Milestone

Comments

@zbraniecki
Copy link
Member

I'm going to try to add memory/size benchmarks to our cargo bench.

For size, I'm going to add a simple example binary that will pull ICU4X and we'll measure the impact of that.

@zbraniecki zbraniecki added the A-performance Area: Performance (CPU, Memory) label May 28, 2020
@zbraniecki zbraniecki self-assigned this May 28, 2020
@zbraniecki zbraniecki added the T-enhancement Type: Nice-to-have but not required label May 28, 2020
@sffc sffc added the C-meta Component: Relating to ICU4X as a whole label May 28, 2020
@zbraniecki
Copy link
Member Author

Seems I was a bit optimistic. Criterion doesn't have an easy way to capture custom values from benchmarks yet - bheisler/criterion.rs#97

@sffc
Copy link
Member

sffc commented Jun 2, 2020

By compiling to WASM, you can get some good data on code size. For memory usage, is there a Rust equivalent of valgrind (or does valgrind work with Rust code)?

@Manishearth
Copy link
Member

valgrind works great. I recall there being some issues with zero-sized types in the past but I think those are fixed.

@sffc sffc added this to the 2020 Q3 milestone Jun 17, 2020
@sffc sffc modified the milestones: 2020 Q3, ICU4X 0.1 Sep 11, 2020
@zbraniecki zbraniecki modified the milestones: ICU4X 0.1, ICU4X 0.2 Oct 9, 2020
@zbraniecki zbraniecki added help wanted Issue needs an assignee good first issue Good for newcomers labels Oct 15, 2020
@sffc sffc assigned sffc and unassigned zbraniecki Oct 24, 2020
@sffc sffc modified the milestones: ICU4X 0.2, 2020 Q4 Oct 24, 2020
@sffc sffc removed good first issue Good for newcomers help wanted Issue needs an assignee labels Oct 24, 2020
@sffc sffc assigned zbraniecki and unassigned sffc Nov 19, 2020
@zbraniecki
Copy link
Member Author

We should use njn's recommendations - https://nnethercote.github.io/perf-book/introduction.html

@zbraniecki
Copy link
Member Author

@zbraniecki
Copy link
Member Author

@gregtatum
Copy link
Member

I was talking to @zbraniecki about this, and I'm interested in taking on this work.

@gregtatum
Copy link
Member

I've been researching using dhat to collect these metrics. It looks like this can get the information we need, but it will require a bit of work to get it integrated into CI.

➤ cargo run --example work_log --release
    Finished release [optimized] target(s) in 0.09s
     Running `target/release/examples/work_log`
0) Sep 8, 2001, 6:46 PM
1) Jul 13, 2017, 7:40 PM
2) Sep 13, 2020, 5:26 AM
3) Jan 6, 2021, 10:13 PM
4) May 2, 2021, 5:00 PM
5) Aug 26, 2021, 10:46 AM
6) Nov 20, 2021, 3:33 AM
7) Apr 14, 2022, 10:20 PM
8) Aug 8, 2022, 4:06 PM
9) May 17, 2033, 8:33 PM
dhat: Total:     21,642 bytes in 133 blocks
dhat: At t-gmax: 9,584 bytes in 91 blocks
dhat: At t-end:  1,112 bytes in 3 blocks
dhat: The data in dhat-heap.json is viewable with dhat/dh_view.html

The total bytes is the total amount of allocations over the entire run of the program. Here it's 21,642 bytes. The t-gmax refers to the maximum amount of allocated memory at a single time. This is then 9,584. The t-end represents the amount of bytes left over after the dhat reference is dropped.

At this point, I'll need to figure out how to programmatically gather this information. I took some time to understand the output JSON format, and all of this information is easily accessible.

Some negatives here are that each file we want to test will need to be manually instrumented, dhat-rs can't attached from the outside. It's a pretty simple setup, but will pollute the examples a bit. In addition, it will output additional information to stdout. The JSON outputs to a fixed location, so a script runner will need to clean up things after running.

As for a more detailed analysis, I found the valgrind viewer to be a bit difficult to work with, so doing anything other than a direct measurement may be a bit difficult.

Here is a commit showing the work so far:

https://github.com/unicode-org/icu4x/compare/master...gregtatum:dhat?expand=1

@gregtatum
Copy link
Member

Here is a hosted version of the viewer: https://gregtatum.github.io/dhat-viewer/dh_view.html

Here is an example run, which will need to be unzipped, and loaded in the viewer.
dhat-heap.json.zip

I also hacked it into the Firefox Profiler format to explore what the data looks like a bit more, and what's available in the analysis. The data is only working in the call tree view.

https://share.firefox.dev/3hPBRKc

@zbraniecki
Copy link
Member Author

zbraniecki commented Jan 4, 2021

Some negatives here are that each file we want to test will need to be manually instrumented, dhat-rs can't attached from the outside. It's a pretty simple setup, but will pollute the examples a bit. In addition, it will output additional information to stdout. The JSON outputs to a fixed location, so a script runner will need to clean up things after running.

Could we have a script that injects dhat into examples? It seems that all it would take is:

  1. Make dhat an optional dependency and mark it as required-feature for that example
  2. In CI, take any example, add two blocks:
#[global_allocator]
static ALLOCATOR: dhat::DhatAlloc = dhat::DhatAlloc;
let _dhat = dhat::Dhat::start_heap_profiling();

and done.

@gregtatum
Copy link
Member

https://github.com/google/rerast

Rerast looks like it could handle the code modding.

@sffc
Copy link
Member

sffc commented Jan 5, 2021

I think Valgrind can be used directly on Rust binaries, too. It takes machine code binaries and runs them in a virtualized environment to track heap allocations.

$ cargo build --examples
$ valgrind ./target/debug/examples/work_log
==1828164== Memcheck, a memory error detector
==1828164== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1828164== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1828164== Command: ./target/debug/examples/work_log
==1828164== 

====== Work Log (en) example ============
0) Sep 8, 2001, 6:46 PM
1) Jul 13, 2017, 7:40 PM
2) Sep 13, 2020, 5:26 AM
3) Jan 6, 2021, 10:13 PM
4) May 2, 2021, 5:00 PM
5) Aug 26, 2021, 10:46 AM
6) Nov 20, 2021, 3:33 AM
7) Apr 14, 2022, 10:20 PM
8) Aug 8, 2022, 4:06 PM
9) May 17, 2033, 8:33 PM
==1828164== 
==1828164== HEAP SUMMARY:
==1828164==     in use at exit: 1,232 bytes in 6 blocks
==1828164==   total heap usage: 137 allocs, 131 frees, 16,121 bytes allocated
==1828164== 
==1828164== LEAK SUMMARY:
==1828164==    definitely lost: 0 bytes in 0 blocks
==1828164==    indirectly lost: 0 bytes in 0 blocks
==1828164==      possibly lost: 0 bytes in 0 blocks
==1828164==    still reachable: 1,232 bytes in 6 blocks
==1828164==         suppressed: 0 bytes in 0 blocks
==1828164== Rerun with --leak-check=full to see details of leaked memory
==1828164== 
==1828164== For lists of detected and suppressed errors, rerun with: -s
==1828164== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

There are ways to make it output additional metrics or print them in a more machine-readable format.

@gregtatum
Copy link
Member

I think Valgrind can be used directly on Rust binaries

One drawback is that Valgrind is Linux only, which would be fine for the CI to run it, but can only reproduced on Linux machines.

@sffc sffc changed the title Add Memory/Size benchmarks Add Memory benchmarks Jan 7, 2021
@sffc sffc modified the milestones: 2020 Q4, 2021-Q1-m1 Jan 7, 2021
@gregtatum
Copy link
Member

gregtatum commented Jan 7, 2021

I got the charts generating today:

image

Here is an example run, the artifacts can be downloaded: https://github.com/gregtatum/icu4x/actions/runs/470094202

I ended up forking the benchmarking library in order to get generic reporting working, by creating an ndjson tool target. https://github.com/gregtatum/github-action-benchmark

I'll try and upstream the changes, but the repo doesn't look like it's actively maintained anymore.

Here are the TODOs remaining:

  • Create a memory build profile, that's optimized but has debug symbols
  • Add a build step to the CI pipeline (I couldn't get this working, as caching wasn't working as I expected)
  • Look into code modding to inject dhat into examples
  • Make sure the github pages are being generated correctly
  • Create a PR for the benchmarking library
  • Look into generating the charts for mac, and windows, (is this needed?)
  • Move all benchmarking files into a common directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-performance Area: Performance (CPU, Memory) C-meta Component: Relating to ICU4X as a whole T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants