-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for eliminating redundant memory operations #165
Conversation
Nice. Can you tell us a bit more about how you benchmarked this? |
I tested it with the command line 'pastel distinct 2000 >/dev/null' and then took the average execution time. According to my test results, the running time was reduced from 9.54 seconds to 8.41 seconds, which has a 13.4% speedup. Also I compiled it with the opt-level = 3. |
Unfortunately, I can not reproduce these benchmark results:
These results were created using a normal
|
I believe if you try to change it to distinct 2000, you can see the difference. |
|
By carefully looking into the code I found the problem. The Lab structures in the Lines 82 to 85 in 47f9ddd
So my get_labs function to get the Lab structure out is a redundant operation.
My current approach is to generate two separate labs and colors vectors directly at the time of this SimulatedAnnealing structure creation. According to my tests with |
src/distinct.rs
Outdated
@@ -65,7 +65,8 @@ pub struct SimulationParameters { | |||
} | |||
|
|||
pub struct SimulatedAnnealing<R: Rng> { | |||
colors: Vec<(Color, Lab)>, | |||
colors: Vec<Color>, | |||
labs: Vec<Lab>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe call it lab_values
everywhere, instead of labs
?
src/lib.rs
Outdated
@@ -938,7 +938,7 @@ impl fmt::Display for LMS { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq)] | |||
#[derive(Debug, Clone, PartialEq, Copy)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to derive copy? I'd prefer if we explicitly call .clone()
when we really want it. The struct contains 4 64bit floats, so it's probably cheaper to usually pass it by reference, not by value, right?
src/distinct.rs
Outdated
match self.distance_metric { | ||
DistanceMetric::CIE76 => delta_e::cie76(&a.1, &b.1), | ||
DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a.1, &b.1), | ||
DistanceMetric::CIE76 => delta_e::cie76(&a, &b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please fix the clippy warning here: "this expression creates a reference which is immediately dereferenced by the compiler"
src/distinct.rs
Outdated
DistanceMetric::CIE76 => delta_e::cie76(&a.1, &b.1), | ||
DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a.1, &b.1), | ||
DistanceMetric::CIE76 => delta_e::cie76(&a, &b), | ||
DistanceMetric::CIEDE2000 => delta_e::ciede2000(&a, &b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and here.
Now I can reproduce those results - thank you! |
I changed all labs to lab_values and fixed the clippy warning. I also replaced the derive copy with clone() function at the same time.
I also looked into this issue a bit. I want to state in advance that this is a compiler sensitive issue.According to the output file of objdump, I found that if we use the reference like this But if we use copy() or clone() And the execution time also proves this, the average time of reference is 0.68s (1.06x speedup), while the average time of clone is 0.64s (1.12x speedup). If you think this modification is not so generic, I can also remove it and keep only the code related to the optimization of the separation of colors and lab_values vectors. |
@sharkdp Please tell me if there are any other changes I need to make? |
No, looks good. Thank you very much! |
Optimization for eliminating redundant memory operations related to this issue.
#163 (comment)