Optimization for eliminating redundant memory operations #165

yyzdtccjdtc · 2022-05-24T02:39:00Z

Optimization for eliminating redundant memory operations related to this issue.
#163 (comment)

sharkdp · 2022-05-24T17:02:02Z

Nice. Can you tell us a bit more about how you benchmarked this?

yyzdtccjdtc · 2022-05-24T18:19:01Z

Nice. Can you tell us a bit more about how you benchmarked this?

I tested it with the command line 'pastel distinct 2000 >/dev/null' and then took the average execution time. According to my test results, the running time was reduced from 9.54 seconds to 8.41 seconds, which has a 13.4% speedup.

Also I compiled it with the opt-level = 3.

sharkdp · 2022-05-26T11:35:28Z

Unfortunately, I can not reproduce these benchmark results:

Command	Mean [s]	Min [s]	Max [s]	Relative
`./pastel-master distinct 200`	1.166 ± 0.023	1.141	1.208	1.00
`./pastel-165 distinct 200`	1.252 ± 0.023	1.215	1.298	1.07 ± 0.03

These results were created using a normal cargo build --release build. Benchmarking was done with hyperfine:

hyperfine -L version master,165 --warmup 3 './pastel-{version} distinct 200' --export-markdown results.md

yyzdtccjdtc · 2022-05-26T17:24:56Z

Unfortunately, I can not reproduce these benchmark results:

Command Mean [s] Min [s] Max [s] Relative
./pastel-master distinct 200 1.166 ± 0.023 1.141 1.208 1.00
./pastel-165 distinct 200 1.252 ± 0.023 1.215 1.298 1.07 ± 0.03
These results were created using a normal cargo build --release build. Benchmarking was done with hyperfine:
hyperfine -L version master,165 --warmup 3 './pastel-{version} distinct 200' --export-markdown results.md

I believe if you try to change it to distinct 2000, you can see the difference.

sharkdp · 2022-05-26T17:53:17Z

pastel distinct 2000 is not really a realistic use-case, to be honest. Everything up to 100... maybe. But who wants to generate 2000 "visually distinct" colors?

yyzdtccjdtc · 2022-05-29T00:40:10Z

By carefully looking into the code I found the problem. The Lab structures in the Vec<(Color, Lab)> are generated here.

pastel/src/distinct.rs

Lines 82 to 85 in 47f9ddd

    
           let colors = initial_colors 
        
               .iter() 
        
               .map(|c| (c.clone(), c.to_lab())) 
        
               .collect();

So my get_labs function to get the Lab structure out is a redundant operation.

    pub fn get_labs(&self) -> Vec<Lab> {
        self.colors.iter().map(|(_, l)| l.clone()).collect()
    }

My current approach is to generate two separate labs and colors vectors directly at the time of this SimulatedAnnealing structure creation.

According to my tests with ./pastel distinct 200 >results.md, the average execution time decreased from 0.7158s to 0.6389s, which is a 12% speedup. Hopefully, your test will achieve the same results as mine :)

sharkdp · 2022-05-29T09:48:34Z

src/distinct.rs

@@ -65,7 +65,8 @@ pub struct SimulationParameters {
 }

 pub struct SimulatedAnnealing<R: Rng> {
-    colors: Vec<(Color, Lab)>,
+    colors: Vec<Color>,
+    labs: Vec<Lab>,


Can we maybe call it lab_values everywhere, instead of labs?

sharkdp · 2022-05-29T09:50:31Z

src/lib.rs

@@ -938,7 +938,7 @@ impl fmt::Display for LMS {
    }
 }

-#[derive(Debug, Clone, PartialEq)]
+#[derive(Debug, Clone, PartialEq, Copy)]


Do we really want to derive copy? I'd prefer if we explicitly call .clone() when we really want it. The struct contains 4 64bit floats, so it's probably cheaper to usually pass it by reference, not by value, right?

sharkdp · 2022-05-29T09:50:50Z

src/distinct.rs

        match self.distance_metric {
-            DistanceMetric::CIE76 => delta_e::cie76(&a.1, &b.1),
-            DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a.1, &b.1),
+            DistanceMetric::CIE76 => delta_e::cie76(&a, &b),


Could you please fix the clippy warning here: "this expression creates a reference which is immediately dereferenced by the compiler"

sharkdp · 2022-05-29T09:50:58Z

src/distinct.rs

-            DistanceMetric::CIE76 => delta_e::cie76(&a.1, &b.1),
-            DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a.1, &b.1),
+            DistanceMetric::CIE76 => delta_e::cie76(&a, &b),
+            DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a, &b),


… and here.

sharkdp · 2022-05-29T09:51:20Z

Now I can reproduce those results - thank you!

yyzdtccjdtc · 2022-05-30T06:22:16Z

I changed all labs to lab_values and fixed the clippy warning. I also replaced the derive copy with clone() function at the same time.

The struct contains 4 64bit floats, so it's probably cheaper to usually pass it by reference, not by value, right?

I also looked into this issue a bit. I want to state in advance that this is a compiler sensitive issue.According to the output file of objdump, I found that if we use the reference like this
let at_lab = &lab_values[color], it will load the address of lab_values[color] first, then push this address one the stack, and finally it will first load the address from the stack and then load the three values from this address (because cie76 and ciede2000 functions only need the first three elements of Lab struct to do the calculation). So a total of 6 memory operations were performed.

But if we use copy() or clone() let at_lab = lab_values[color].clone();, it will load the values directly with two movupd instruction which can load 128 bits at a time. It then does the same three load operations as the reference to do the calculation. There are a total of 5 memory operations here, one less than reference.

And the execution time also proves this, the average time of reference is 0.68s (1.06x speedup), while the average time of clone is 0.64s (1.12x speedup).

If you think this modification is not so generic, I can also remove it and keep only the code related to the optimization of the separation of colors and lab_values vectors.

yyzdtccjdtc · 2022-06-03T17:09:47Z

@sharkdp Please tell me if there are any other changes I need to make?

sharkdp · 2022-06-04T12:16:42Z

No, looks good. Thank you very much!

yyzdtccjdtc added 4 commits May 22, 2022 13:48

optimization with redundant memory operations

5281d93

optimization for eliminating redundant memory operations

7c87d53

optimization for eliminating redundant memory operations

67528c1

optimization for eliminating redundant memory operations

a37303f

yyzdtccjdtc added 3 commits May 28, 2022 16:46

seperate Lab and Color as two vectors to improve performance

b1bdcf0

seperate Lab and Color as two vectors to improve performance

6d7ebe5

seperate Lab and Color as two vectors to improve performance

ab0bb16

sharkdp reviewed May 29, 2022

View reviewed changes

change labs to lab_values and use clone() instead of copy

b2ab910

sharkdp merged commit ed35893 into sharkdp:master Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for eliminating redundant memory operations #165

Optimization for eliminating redundant memory operations #165

yyzdtccjdtc commented May 24, 2022

sharkdp commented May 24, 2022

yyzdtccjdtc commented May 24, 2022

sharkdp commented May 26, 2022

yyzdtccjdtc commented May 26, 2022

sharkdp commented May 26, 2022 •

edited

Loading

yyzdtccjdtc commented May 29, 2022 •

edited

Loading

sharkdp May 29, 2022

sharkdp May 29, 2022

sharkdp May 29, 2022

sharkdp May 29, 2022

sharkdp commented May 29, 2022

yyzdtccjdtc commented May 30, 2022 •

edited

Loading

yyzdtccjdtc commented Jun 3, 2022

sharkdp commented Jun 4, 2022

Optimization for eliminating redundant memory operations #165

Optimization for eliminating redundant memory operations #165

Conversation

yyzdtccjdtc commented May 24, 2022

sharkdp commented May 24, 2022

yyzdtccjdtc commented May 24, 2022

sharkdp commented May 26, 2022

yyzdtccjdtc commented May 26, 2022

sharkdp commented May 26, 2022 • edited Loading

yyzdtccjdtc commented May 29, 2022 • edited Loading

sharkdp May 29, 2022

Choose a reason for hiding this comment

sharkdp May 29, 2022

Choose a reason for hiding this comment

sharkdp May 29, 2022

Choose a reason for hiding this comment

sharkdp May 29, 2022

Choose a reason for hiding this comment

sharkdp commented May 29, 2022

yyzdtccjdtc commented May 30, 2022 • edited Loading

yyzdtccjdtc commented Jun 3, 2022

sharkdp commented Jun 4, 2022

sharkdp commented May 26, 2022 •

edited

Loading

yyzdtccjdtc commented May 29, 2022 •

edited

Loading

yyzdtccjdtc commented May 30, 2022 •

edited

Loading