-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proof of Concept] Initial support for parallelization using Rayon #74
[Proof of Concept] Initial support for parallelization using Rayon #74
Conversation
- Will be used for parallelization
- Parallelized impl FPVector for Vec<f64> using Rayon. - Added Sync+Send traits are needed
- Used rayon to parallelize vector operations like innerprod, crossprod
- Used rayon to parallelize impl Matrix
- Added rayon support for Matrix for traits like LinearOps, Normed etc
- Used rayon to parallelize Functional Programming for Matrix
- Added parallel version of required traits for vec and matrix - Adjusted commented sections of the code
- Using Rayon to parallelize further traits for matrix and vector
src/structure/vector.rs
Outdated
{ | ||
self.iter().fold(init.into(), |x, &y| f(x, y)) | ||
|
||
// Parallel version unimplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling to make a parallel version of reduce, to be honest. The commented out part of the code is my attempt, but the tests fail.
src/structure/matrix.rs
Outdated
// .all(|(x, y)| nearly_eq(x, y)) | ||
// && self.row == other.row | ||
|
||
// Note: the sequential version in this case can be replaced by the parallel version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented part above is the sequential part that we currently have. The below part is the parallelized part. How can we have impl PartialEq for Matrix
and have both seq and par versions?
If it is not possible, then can we have the parallel version by default??
src/structure/matrix.rs
Outdated
// Note: Parallel version - how to add the implementation below? | ||
// let mut result = par_matrix(self.data.clone(), self.row, self.col, self.shape); | ||
// result | ||
// .data | ||
// .par_iter_mut() | ||
// .enumerate() | ||
// .for_each(|(idx, value)| { | ||
// let i = idx / self.col; | ||
// let j = idx % self.col; | ||
|
||
// *value += other[(i, j)]; | ||
// }); | ||
// result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented part above is the parallel version. The above commented part is the parallelized part. How can we have impl Add<Matrix> for Matrix
and have both seq and par versions?
If it is not possible, then can we have the parallel version by default??
This applies to Neg
and Sub
too.
Hi @soumyasen1809. Sorry to intrude, but I had a look at this PR (I find your parallelization proposal very interesting) and I would advise you to start benchmarking your results before doing further work. Some (not a lot, to be fair) experience with trying to parallelize Rust operations using rayon tells me that parallelizing such low-level stuff as element-wise assignments, additions and comparisons will in fact worsen performance by a very large factor, instead of improving it. The operations you should be parallelizing are higher-level ones. At the very minimum, you should make sure that each operation being done in parallel is made up of many (and I stress many) lower-level operations carried out in series, so that the time spent by a single core in executing each job is much larger than the time spent overall by the CPU to set up the parallel computation. This set up effort has a very non-negligible cost. Of course I may be wrong, in which case I'm more than happy to be, since I'd like to see some form of parallelization in Peroxide too :-) |
Dear @soumyasen1809, Thank you for your continued efforts on parallelizing operations in Peroxide. Your dedication to improving the library is truly appreciated. I've been out all day and haven't had the chance to review everything in detail, but I've looked at several files and would like to share some thoughts. Firstly, regarding your comments about implementing both sequential and parallel versions:
Given these considerations, I believe the best approach would be to merge only those changes that demonstrate actual performance improvements through benchmarking. While it may require extra effort, using tools like hyperfine, criterion, or others to perform benchmarks before merging changes would be beneficial for Peroxide in the long run. If you find implementing benchmarks challenging, I'd be happy to assist with this when I have some free time. I want to express my sincere gratitude for your continued contributions. Your commitment to improving Peroxide is truly valuable. However, I noticed that the current commit history includes many unrelated modifications, making it challenging to identify the core changes. While I take responsibility for not having set up Cargo fmt earlier, it appears that your text editor might be automatically formatting files upon saving, leading to changes even in files like Cargo.toml. To facilitate easier review and collaboration, it would be incredibly helpful if you could refine the pull request to include only the necessary changes related to the parallelization work. This would allow us to focus on the core improvements you've made. Once again, thank you for your hard work and dedication to Peroxide. I look forward to seeing the refined version of your pull request and potentially incorporating these performance enhancements into the library. |
- Benchmarking Rayon usage for matrix
- Adding benchmark for rayon usage in matrix
- Added benchmark for Rayon usage in matrix - Saved results in benches/data
Added benchmarks for rayon
Hi @GComitini @Axect Thank you for your suggestions. You are right. I should have benchmarked first to check if Rayon is really helping. Sorry for that. I have pushed a You were right. The matrix operations do not benefit from Rayon. Only I am really sorry for pushing the changes which are of no use. @Axect regarding the |
@Axect 1 small administrative request. Can you assign this PR or issue to me. It will be easy for me to track. Thanks :) |
Hi @Axect Also, regarding the |
Thank you for your contribution. I've reviewed most of the code changes you've made, and I have some feedback to share. There are both major and minor issues I'd like to address. Major Issues:
Minor Issues:
Could you please address these issues? Once these changes are made, I believe we'll be in a good position to merge this PR and significantly enhance Peroxide's capabilities. Let me know if you have any questions or need clarification on any points. |
- Removed methods in ParallelAlgorithm trait since they don't seem to offer significant benefits over their serial counterparts in the current implementation. - Removed some ParallelFPVector trait methods for same reason as above
- Implemented par_reduce operation using reduce_with () from rayon crate
- Added rayon and criterion as dependencies
- par_matrix removed from matrix.rs and benchmark
Hi @Axect
Done
Thank you for this. Yes, I was able to solve it using
I agree too with this approach.
Addressed in 863ad6a
I reverted it back in the commit: 3737099 |
Thank you for your incredibly swift feedback! I'm impressed by how quickly you addressed these issues. Your consistent and rapid contributions are truly appreciated. I've reviewed all the changes you've made, and I must say, everything looks well-organized and clean. You've done an excellent job of implementing the suggested modifications. Given that all the major points have been addressed satisfactorily, I believe we're now ready to move forward. I'll proceed with merging these changes into the main branch. As for the next step, I'll be taking on the task of introducing the 'parallel' feature flag myself. This will allow us to cleanly separate the parallel functionality and make it optional for users who don't require it. Once again, thank you for your diligence and responsiveness. Your work is significantly improving Peroxide's capabilities and usability. If you have any final thoughts or concerns before we merge, please don't hesitate to share them. Otherwise, I look forward to integrating these enhancements into our project. |
@Axect I guess the PR is in a good shape to be merged. Next, you can add the |
Initial commit for issue #72