Skip to content

Added per tier pool class rolling average latency #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

guptask
Copy link
Collaborator

@guptask guptask commented Jul 23, 2022

This change is Reviewable

@guptask guptask marked this pull request as ready for review July 23, 2022 08:35
@guptask guptask requested a review from igchor July 23, 2022 08:35
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @guptask and @igchor)


cachelib/common/RollingStats.h line 1 at r1 (raw file):

/*

"Also, have you measured how much performance impact this latency tracking have? Could you please add an option to config to enable/disable gathering this latency info?

Ans: I'm working on a compile type flag option.
"

Will you add this option to this PR?


cachelib/common/RollingStats.h line 35 at r1 (raw file):

    avg_ *= ratio;
    ++cnt_;
    avg_ += value / cnt_;

"
What happens if cnt_ overflows (division by 0?)
Ans: I checked. Even when cnt_ equals 0, the algorithm won't cause divide by 0 error.
"

How's that? I see there's value / cnt_.


cachelib/common/RollingStats.h line 40 at r1 (raw file):

  // Return the rolling average.
  uint64_t estimate() {
    return static_cast<uint64_t>(avg_);

I think it would be better to return double here. I know you need uint64_t later but this class might also be used for different purposes where double might be more appropriate


cachelib/common/RollingStats.h line 71 at r1 (raw file):

  }

  RollingLatencyTracker& operator=(RollingLatencyTracker&& rhs) noexcept {

didn't you say those are unncessary? (i.e. can't we delete move ctor and assignment operator?)

@guptask
Copy link
Collaborator Author

guptask commented Jul 27, 2022

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

"Also, have you measured how much performance impact this latency tracking have? Could you please add an option to config to enable/disable gathering this latency info?

Ans: I'm working on a compile type flag option.
"

Will you add this option to this PR?

I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.

@guptask
Copy link
Collaborator Author

guptask commented Jul 27, 2022

cachelib/common/RollingStats.h line 40 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

I think it would be better to return double here. I know you need uint64_t later but this class might also be used for different purposes where double might be more appropriate

ok. I'll update it to double

@guptask
Copy link
Collaborator Author

guptask commented Jul 27, 2022

cachelib/common/RollingStats.h line 71 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

didn't you say those are unncessary? (i.e. can't we delete move ctor and assignment operator?)

I mentioned that about RollingStats. I have taken those out. I kept the assignment operator for RollingLatencyTracker based on comparison to the LatencyTracker (used by PercentileStats). I don't see it being used currently.

Copy link
Collaborator Author

@guptask guptask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @igchor)


cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

"
What happens if cnt_ overflows (division by 0?)
Ans: I checked. Even when cnt_ equals 0, the algorithm won't cause divide by 0 error.
"

How's that? I see there's value / cnt_.

cnt_ is getting incremented before this line. cnt_ is currently uint64_t. Only scenario is can be 0 is if the value rolls around from 2^64-1.

@guptask guptask force-pushed the rolling_stats branch 2 times, most recently from 6fa3770 to 12e187d Compare July 27, 2022 16:06
@guptask
Copy link
Collaborator Author

guptask commented Jul 27, 2022

Command used :

./bin/cachebench --json_test_config test_configs/hit_ratio/graph_cache_leader_fbobj/config-8GB-DRAM.json
     -report_memory_usage_stats -progress 1 -progress_stats_file <per_sec_stats_txt> &> <overall_stats_txt>

The data on the right is with RollingLatencyTracker commented out.
The rolling latency print statement has an incorrect % sign at end. I'll remove it.

== Test Results ==                                              == Test Results ==
== Allocator Stats ==                                           == Allocator Stats ==
Items in RAM  : 17,016,761                                    | Items in RAM  : 17,036,539
Items in NVM  : 0                                               Items in NVM  : 0
Alloc Attempts: 68,161,566 Success: 100.00%                   | Alloc Attempts: 68,117,977 Success: 100.00%
RAM Evictions : 51,144,741                                    | RAM Evictions : 51,081,395
tid 0 free slabs : 0.00%                                        tid 0 free slabs : 0.00%
tid 0 pid 0 cid   0    64.00B memorySize:     0.00B             tid 0 pid 0 cid   0    64.00B memorySize:     0.00B
tid 0 pid 0 cid   1    96.00B memorySize:    76.00MB            tid 0 pid 0 cid   1    96.00B memorySize:    76.00MB
tid 0 pid 0 cid   2   144.00B memorySize:   248.00MB            tid 0 pid 0 cid   2   144.00B memorySize:   248.00MB
tid 0 pid 0 cid   3   216.00B memorySize:   544.00MB            tid 0 pid 0 cid   3   216.00B memorySize:   544.00MB
tid 0 pid 0 cid   4   328.00B memorySize:     1.47GB            tid 0 pid 0 cid   4   328.00B memorySize:     1.47GB
tid 0 pid 0 cid   5   496.00B memorySize:     1.93GB            tid 0 pid 0 cid   5   496.00B memorySize:     1.93GB
tid 0 pid 0 cid   6   744.00B memorySize:  1016.00MB            tid 0 pid 0 cid   6   744.00B memorySize:  1016.00MB
tid 0 pid 0 cid   7     1.09KB memorySize:   836.00MB         | tid 0 pid 0 cid   7     1.09KB memorySize:   840.00MB
tid 0 pid 0 cid   8     1.64KB memorySize:   416.00MB         | tid 0 pid 0 cid   8     1.64KB memorySize:   420.00MB
tid 0 pid 0 cid   9     2.46KB memorySize:   268.00MB         | tid 0 pid 0 cid   9     2.46KB memorySize:   272.00MB
tid 0 pid 0 cid  10     3.70KB memorySize:   304.00MB           tid 0 pid 0 cid  10     3.70KB memorySize:   304.00MB
tid 0 pid 0 cid  11     5.55KB memorySize:   268.00MB         | tid 0 pid 0 cid  11     5.55KB memorySize:   264.00MB
tid 0 pid 0 cid  12     8.32KB memorySize:    68.00MB           tid 0 pid 0 cid  12     8.32KB memorySize:    68.00MB
tid 0 pid 0 cid  13    12.48KB memorySize:    36.00MB           tid 0 pid 0 cid  13    12.48KB memorySize:    36.00MB
tid 0 pid 0 cid  14    18.78KB memorySize:    44.00MB           tid 0 pid 0 cid  14    18.78KB memorySize:    44.00MB
tid 0 pid 0 cid  15    28.24KB memorySize:    60.00MB         | tid 0 pid 0 cid  15    28.24KB memorySize:    64.00MB
tid 0 pid 0 cid  16    42.66KB memorySize:   412.00MB         | tid 0 pid 0 cid  16    42.66KB memorySize:   404.00MB
tid 0 pid 0 cid  17    64.00KB memorySize:    72.00MB         | tid 0 pid 0 cid  17    64.00KB memorySize:    64.00MB
tid 0 pid 0 cid  18    97.52KB memorySize:    28.00MB           tid 0 pid 0 cid  18    97.52KB memorySize:    28.00MB
tid 0 pid 0 cid  19   151.70KB memorySize:     8.00MB           tid 0 pid 0 cid  19   151.70KB memorySize:     8.00MB
tid 0 pid 0 cid  20   227.55KB memorySize:     0.00B            tid 0 pid 0 cid  20   227.55KB memorySize:     0.00B
tid 0 pid 0 cid  21   372.36KB memorySize:     0.00B            tid 0 pid 0 cid  21   372.36KB memorySize:     0.00B
tid 0 pid 0 cid  22   585.14KB memorySize:     0.00B            tid 0 pid 0 cid  22   585.14KB memorySize:     0.00B
tid 0 pid 0 cid  23     1.00MB memorySize:     0.00B            tid 0 pid 0 cid  23     1.00MB memorySize:     0.00B
tid 0 pid 0 cid   0    64.00B free: 100.00%                     tid 0 pid 0 cid   0    64.00B free: 100.00%
tid 0 pid 0 cid   1    96.00B free: 0.00%                       tid 0 pid 0 cid   1    96.00B free: 0.00%
tid 0 pid 0 cid   2   144.00B free: 0.00%                       tid 0 pid 0 cid   2   144.00B free: 0.00%
tid 0 pid 0 cid   3   216.00B free: 0.00%                       tid 0 pid 0 cid   3   216.00B free: 0.00%
tid 0 pid 0 cid   4   328.00B free: 0.00%                       tid 0 pid 0 cid   4   328.00B free: 0.00%
tid 0 pid 0 cid   5   496.00B free: 0.00%                       tid 0 pid 0 cid   5   496.00B free: 0.00%
tid 0 pid 0 cid   6   744.00B free: 0.01%                       tid 0 pid 0 cid   6   744.00B free: 0.01%
tid 0 pid 0 cid   7     1.09KB free: 0.02%                      tid 0 pid 0 cid   7     1.09KB free: 0.02%
tid 0 pid 0 cid   8     1.64KB free: 0.02%                      tid 0 pid 0 cid   8     1.64KB free: 0.02%
tid 0 pid 0 cid   9     2.46KB free: 0.02%                      tid 0 pid 0 cid   9     2.46KB free: 0.02%
tid 0 pid 0 cid  10     3.70KB free: 0.04%                      tid 0 pid 0 cid  10     3.70KB free: 0.04%
tid 0 pid 0 cid  11     5.55KB free: 0.06%                      tid 0 pid 0 cid  11     5.55KB free: 0.06%
tid 0 pid 0 cid  12     8.32KB free: 0.06%                      tid 0 pid 0 cid  12     8.32KB free: 0.06%
tid 0 pid 0 cid  13    12.48KB free: 0.03%                      tid 0 pid 0 cid  13    12.48KB free: 0.03%
tid 0 pid 0 cid  14    18.78KB free: 0.04%                      tid 0 pid 0 cid  14    18.78KB free: 0.04%
tid 0 pid 0 cid  15    28.24KB free: 0.02%                      tid 0 pid 0 cid  15    28.24KB free: 0.02%
tid 0 pid 0 cid  16    42.66KB free: 0.01%                      tid 0 pid 0 cid  16    42.66KB free: 0.01%
tid 0 pid 0 cid  17    64.00KB free: 0.00%                      tid 0 pid 0 cid  17    64.00KB free: 0.00%
tid 0 pid 0 cid  18    97.52KB free: 0.00%                      tid 0 pid 0 cid  18    97.52KB free: 0.00%
tid 0 pid 0 cid  19   151.70KB free: 0.00%                      tid 0 pid 0 cid  19   151.70KB free: 0.00%
tid 0 pid 0 cid  20   227.55KB free: 100.00%                    tid 0 pid 0 cid  20   227.55KB free: 100.00%
tid 0 pid 0 cid  21   372.36KB free: 100.00%                    tid 0 pid 0 cid  21   372.36KB free: 100.00%
tid 0 pid 0 cid  22   585.14KB free: 100.00%                    tid 0 pid 0 cid  22   585.14KB free: 100.00%
tid 0 pid 0 cid  23     1.00MB free: 100.00%                    tid 0 pid 0 cid  23     1.00MB free: 100.00%
tid 0 pid 0 cid   0    64.00B latency:     0.00%                tid 0 pid 0 cid   0    64.00B latency:     0.00%
tid 0 pid 0 cid   1    96.00B latency: 11586.37%              | tid 0 pid 0 cid   1    96.00B latency:     0.00%
tid 0 pid 0 cid   2   144.00B latency: 12463.21%              | tid 0 pid 0 cid   2   144.00B latency:     0.00%
tid 0 pid 0 cid   3   216.00B latency: 13428.45%              | tid 0 pid 0 cid   3   216.00B latency:     0.00%
tid 0 pid 0 cid   4   328.00B latency: 19003.28%              | tid 0 pid 0 cid   4   328.00B latency:     0.00%
tid 0 pid 0 cid   5   496.00B latency: 16499.70%              | tid 0 pid 0 cid   5   496.00B latency:     0.00%
tid 0 pid 0 cid   6   744.00B latency: 12304.09%              | tid 0 pid 0 cid   6   744.00B latency:     0.00%
tid 0 pid 0 cid   7     1.09KB latency: 11744.12%             | tid 0 pid 0 cid   7     1.09KB latency:     0.00%
tid 0 pid 0 cid   8     1.64KB latency: 11291.19%             | tid 0 pid 0 cid   8     1.64KB latency:     0.00%
tid 0 pid 0 cid   9     2.46KB latency: 11218.06%             | tid 0 pid 0 cid   9     2.46KB latency:     0.00%
tid 0 pid 0 cid  10     3.70KB latency: 11231.27%             | tid 0 pid 0 cid  10     3.70KB latency:     0.00%
tid 0 pid 0 cid  11     5.55KB latency: 11290.93%             | tid 0 pid 0 cid  11     5.55KB latency:     0.00%
tid 0 pid 0 cid  12     8.32KB latency: 11351.12%             | tid 0 pid 0 cid  12     8.32KB latency:     0.00%
tid 0 pid 0 cid  13    12.48KB latency: 11304.93%             | tid 0 pid 0 cid  13    12.48KB latency:     0.00%
tid 0 pid 0 cid  14    18.78KB latency: 11432.36%             | tid 0 pid 0 cid  14    18.78KB latency:     0.00%
tid 0 pid 0 cid  15    28.24KB latency: 11395.97%             | tid 0 pid 0 cid  15    28.24KB latency:     0.00%
tid 0 pid 0 cid  16    42.66KB latency: 11821.42%             | tid 0 pid 0 cid  16    42.66KB latency:     0.00%
tid 0 pid 0 cid  17    64.00KB latency: 11578.09%             | tid 0 pid 0 cid  17    64.00KB latency:     0.00%
tid 0 pid 0 cid  18    97.52KB latency: 11896.95%             | tid 0 pid 0 cid  18    97.52KB latency:     0.00%
tid 0 pid 0 cid  19   151.70KB latency:  8816.12%             | tid 0 pid 0 cid  19   151.70KB latency:     0.00%
tid 0 pid 0 cid  20   227.55KB latency: 23360.75%             | tid 0 pid 0 cid  20   227.55KB latency:     0.00%
tid 0 pid 0 cid  21   372.36KB latency:     0.00%               tid 0 pid 0 cid  21   372.36KB latency:     0.00%
tid 0 pid 0 cid  22   585.14KB latency:     0.00%               tid 0 pid 0 cid  22   585.14KB latency:     0.00%
tid 0 pid 0 cid  23     1.00MB latency:     0.00%               tid 0 pid 0 cid  23     1.00MB latency:     0.00%
Cache Gets    : 120,000,000                                     Cache Gets    : 120,000,000
Hit Ratio     :  43.20%                                       | Hit Ratio     :  43.24%
NVM Gets      :               0, Coalesced : 100.00%            NVM Gets      :               0, Coalesced : 100.00%
NVM Puts      :               0, Success   :   0.00%, Clean     NVM Puts      :               0, Success   :   0.00%, Clean  
NVM Evicts    :               0, Clean     : 100.00%, Unclean   NVM Evicts    :               0, Clean     : 100.00%, Unclean
NVM Deletes   :               0 Skipped Deletes: 100.00%        NVM Deletes   :               0 Skipped Deletes: 100.00%

== Throughput for  ==                                           == Throughput for  ==
Total Ops : 120.00 million                                      Total Ops : 120.00 million
Total sets: 68,161,566                                        | Total sets: 68,117,977
get       :   926,544/s, success   :  43.20%                  | get       :   934,599/s, success   :  43.24%
set       :   526,289/s, success   : 100.00%                  | set       :   530,525/s, success   : 100.00%
del       :         0/s, found     :   0.00%                    del       :         0/s, found     :   0.00%

@guptask
Copy link
Collaborator Author

guptask commented Jul 27, 2022

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.

Please refer to the results shared in a comment below. Based on these results, I think the performance footprint of rolling stats is quite lot.

@guptask guptask requested a review from igchor July 27, 2022 22:28
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @guptask and @igchor)


cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

        out << folly::sformat(
                   "tid{:2} pid{:2} cid{:4} {:8.2f}{} memorySize:{:8.2f}{} "
                   "free:{:4.2f}% latency:{:8.2f}ns",

Please extend the description - what latency is being reported?


cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

Please refer to the results shared in a comment below. Based on these results, I think the performance footprint of rolling stats is quite lot.

Can you run it a few times and average the results? The difference is small enough to be just a run-to-run variance


cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

cnt_ is getting in

Can you elaborate? I don;t understand "cnt_ is getting in".
Even if you increase cnt_ you can cause it overflow (when it's equal to std::numeric_limits<uint64_t>::max()) to 0.

@guptask guptask force-pushed the rolling_stats branch 4 times, most recently from a834f16 to b3ec830 Compare August 3, 2022 10:21
@guptask
Copy link
Collaborator Author

guptask commented Aug 3, 2022

cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Please extend the description - what latency is being reported?

changed it to rollingAvgLatency

@guptask
Copy link
Collaborator Author

guptask commented Aug 3, 2022

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Can you run it a few times and average the results? The difference is small enough to be just a run-to-run variance

I ran it 3 times. The difference is small enough to be considered a runtime variance.

@guptask
Copy link
Collaborator Author

guptask commented Aug 3, 2022

cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Can you elaborate? I don;t understand "cnt_ is getting in".
Even if you increase cnt_ you can cause it overflow (when it's equal to std::numeric_limits<uint64_t>::max()) to 0.

Sorry I accidentally pressed send before finishing the sentence. I added the full sentence in another comment which is not showing up in the Reviewable mode. I wrote:
"cnt_ is getting incremented before this line. cnt_ is currently uint64_t. Only scenario is can be 0 is if the value rolls around from 2^64-1."

You're talking about the same thing. So I added a clause that stops calculating rolling latency once cnt_ reaches numerical limit.

@guptask guptask requested a review from igchor August 3, 2022 10:43
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 1 unresolved discussion (waiting on @guptask and @igchor)


cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

Previously, guptask (Sounak Gupta) wrote…

changed it to rollingAvgLatency

I meant "latency of what"? You should specify that it's find latency (unless it is already displayed somewhere?)


cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

Sorry I accidentally pressed send before finishing the sentence. I added the full sentence in another comment which is not showing up in the Reviewable mode. I wrote:
"cnt_ is getting incremented before this line. cnt_ is currently uint64_t. Only scenario is can be 0 is if the value rolls around from 2^64-1."

You're talking about the same thing. So I added a clause that stops calculating rolling latency once cnt_ reaches numerical limit.

I guess you could just start from 0 in that case. Instead of stopping, just set cnt_ to 0 and move on. You don't need this idel variable in that case as well.

@guptask
Copy link
Collaborator Author

guptask commented Aug 3, 2022

cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

I meant "latency of what"? You should specify that it's find latency (unless it is already displayed somewhere?)

Done.

@guptask
Copy link
Collaborator Author

guptask commented Aug 3, 2022

cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

I guess you could just start from 0 in that case. Instead of stopping, just set cnt_ to 0 and move on. You don't need this idel variable in that case as well.

done

Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 7 files reviewed, all discussions resolved (waiting on @igchor)

@guptask guptask merged commit 11fadaa into pmem:develop Aug 4, 2022
@guptask guptask deleted the rolling_stats branch August 4, 2022 17:36
danielobiri pushed a commit to danielobiri/CacheLib that referenced this pull request Oct 19, 2022
Summary:
fix comment typo

Pull Request resolved: facebook#96

Reviewed By: jiayuebao

Differential Revision: D33021002

Pulled By: haowu14

fbshipit-source-id: d01887b92e167c9f59af49d027d2377f4740ea26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants