Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proof of Concept] Initial support for parallelization using Rayon #74

Merged

Conversation

soumyasen1809
Copy link
Contributor

Initial commit for issue #72

  • Used rayon for parallelization.
  • Used Parallel traits.
  • Added parallel version of some functions in Vector and Matrix.
  • Added doctests accordingly.

- Will be used for parallelization
- Parallelized impl FPVector for Vec<f64>
using Rayon.
- Added Sync+Send traits are needed
- Used rayon to parallelize vector operations
like innerprod, crossprod
- Used rayon to parallelize impl Matrix
- Added rayon support for Matrix
for traits like LinearOps, Normed etc
- Used rayon to parallelize Functional
Programming for Matrix
- Added parallel version of required traits
for vec and matrix
- Adjusted commented sections of the code
- Using Rayon to parallelize further traits
for matrix and vector
{
self.iter().fold(init.into(), |x, &y| f(x, y))

// Parallel version unimplemented
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to make a parallel version of reduce, to be honest. The commented out part of the code is my attempt, but the tests fail.

// .all(|(x, y)| nearly_eq(x, y))
// && self.row == other.row

// Note: the sequential version in this case can be replaced by the parallel version?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented part above is the sequential part that we currently have. The below part is the parallelized part. How can we have impl PartialEq for Matrix and have both seq and par versions?
If it is not possible, then can we have the parallel version by default??

Comment on lines 2382 to 2394
// Note: Parallel version - how to add the implementation below?
// let mut result = par_matrix(self.data.clone(), self.row, self.col, self.shape);
// result
// .data
// .par_iter_mut()
// .enumerate()
// .for_each(|(idx, value)| {
// let i = idx / self.col;
// let j = idx % self.col;

// *value += other[(i, j)];
// });
// result
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented part above is the parallel version. The above commented part is the parallelized part. How can we have impl Add<Matrix> for Matrix and have both seq and par versions?
If it is not possible, then can we have the parallel version by default??

This applies to Neg and Sub too.

@GComitini
Copy link
Contributor

GComitini commented Oct 4, 2024

Hi @soumyasen1809. Sorry to intrude, but I had a look at this PR (I find your parallelization proposal very interesting) and I would advise you to start benchmarking your results before doing further work. Some (not a lot, to be fair) experience with trying to parallelize Rust operations using rayon tells me that parallelizing such low-level stuff as element-wise assignments, additions and comparisons will in fact worsen performance by a very large factor, instead of improving it. The operations you should be parallelizing are higher-level ones. At the very minimum, you should make sure that each operation being done in parallel is made up of many (and I stress many) lower-level operations carried out in series, so that the time spent by a single core in executing each job is much larger than the time spent overall by the CPU to set up the parallel computation. This set up effort has a very non-negligible cost.

Of course I may be wrong, in which case I'm more than happy to be, since I'd like to see some form of parallelization in Peroxide too :-)

@Axect
Copy link
Owner

Axect commented Oct 5, 2024

Dear @soumyasen1809,

Thank you for your continued efforts on parallelizing operations in Peroxide. Your dedication to improving the library is truly appreciated. I've been out all day and haven't had the chance to review everything in detail, but I've looked at several files and would like to share some thoughts.

Firstly, regarding your comments about implementing both sequential and parallel versions:

  1. As we discussed previously, setting parallel operations as the default might lead to inefficient computation flows. Users could unknowingly use parallel processing for tasks where it's unnecessary, potentially depleting resources needed for more demanding operations.

  2. I agree with @GComitini's observation that parallelizing low-level operations like element-wise assignments, additions, and comparisons might actually degrade performance significantly. It's crucial to focus on parallelizing higher-level operations where the benefits can outweigh the overhead of setting up parallel computation.

Given these considerations, I believe the best approach would be to merge only those changes that demonstrate actual performance improvements through benchmarking. While it may require extra effort, using tools like hyperfine, criterion, or others to perform benchmarks before merging changes would be beneficial for Peroxide in the long run. If you find implementing benchmarks challenging, I'd be happy to assist with this when I have some free time.

I want to express my sincere gratitude for your continued contributions. Your commitment to improving Peroxide is truly valuable. However, I noticed that the current commit history includes many unrelated modifications, making it challenging to identify the core changes. While I take responsibility for not having set up Cargo fmt earlier, it appears that your text editor might be automatically formatting files upon saving, leading to changes even in files like Cargo.toml.

To facilitate easier review and collaboration, it would be incredibly helpful if you could refine the pull request to include only the necessary changes related to the parallelization work. This would allow us to focus on the core improvements you've made.

Once again, thank you for your hard work and dedication to Peroxide. I look forward to seeing the refined version of your pull request and potentially incorporating these performance enhancements into the library.

soumyasen1809 and others added 4 commits October 5, 2024 17:20
- Benchmarking Rayon usage for matrix
- Adding benchmark for rayon usage in matrix
- Added benchmark for Rayon usage in matrix
- Saved results in benches/data
@soumyasen1809
Copy link
Contributor Author

Hi @GComitini @Axect

Thank you for your suggestions. You are right. I should have benchmarked first to check if Rayon is really helping. Sorry for that.

I have pushed a matrix_benchmark.rs in the benches directory (see here in the current PR). The results are saved in benches/data directory (see here).

You were right. The matrix operations do not benefit from Rayon. Only norm_lpq, norm_l1 and inner_prod benefit. For all my benchmarking I have used a 1000x1000 matrix and the criterion library.

I am really sorry for pushing the changes which are of no use.

@Axect regarding the rust-fmt, I am not sure how to go back. Is it possible for you to add the rust-fmt to the current repo and then everyone rebases on it. In my opinion, we should be using rust-fmt sooner or later. I guess there should be a single command or something that formats the entire repo.

@soumyasen1809
Copy link
Contributor Author

@Axect 1 small administrative request. Can you assign this PR or issue to me. It will be easy for me to track. Thanks :)

@Axect Axect added the enhancement New feature or request label Oct 5, 2024
@soumyasen1809
Copy link
Contributor Author

Hi @Axect
So, I removed the "Rayon-parallelized" code that showed poorer benchmarking performance. Let me know if that looks alright.

Also, regarding the rust-fmt, I am not sure how to go back. Is it possible for you to add the rust-fmt to the current repo and then everyone rebases on it. In my opinion, we should be using rust-fmt sooner or later. I guess there should be a single command or something that formats the entire repo.

src/structure/matrix.rs Outdated Show resolved Hide resolved
@Axect
Copy link
Owner

Axect commented Oct 14, 2024

Thank you for your contribution. I've reviewed most of the code changes you've made, and I have some feedback to share. There are both major and minor issues I'd like to address.

Major Issues:

  1. Vector Implementation:
    I noticed you've implemented several parallel functions for Vector. While some, like par_fmap, are fundamentally efficient when parallelized, others such as par_max, par_min, par_rank, and par_argmin don't seem to offer significant benefits over their serial counterparts in the current implementation. (It's worth noting that the existing Algorithm trait was initially created as an experimental feature and isn't optimized for efficiency. This could be addressed in a separate issue later.) Furthermore, par_take and par_skip appear to have no meaningful performance advantages.

    My suggestion would be to keep only the parallel functions that offer substantial benefits, such as par_fmap, par_zip_with, and par_filter for Vector.

  2. par_reduce Implementation:
    I saw from the comments that the implementation of par_reduce was unsuccessful. I believe this could be resolved by wrapping the reduce_with function from the rayon crate.

  3. Direct Dependency on rayon:
    The current implementation introduces a direct dependency on rayon for Peroxide. This could negatively impact compile times for users who don't require parallel functionality. I propose that after merging this PR, we introduce a 'parallel' feature flag to make this dependency optional.

Minor Issues:

  1. Matrix Implementation:
    The par_matrix function still remains in the Matrix implementation. This should be addressed.

  2. Code Formatting:
    While using cargo fmt or rust-fmt is generally good practice, it can lead to unrelated changes being committed alongside actual modifications. In the future, it would be better to make formatting changes in a separate commit. For now, we'll accept the current changes, but please be mindful of this in future contributions. In particular, the changes to the toml file seem to be due to editor settings rather than rust-fmt, so please check your editor configuration.

Could you please address these issues? Once these changes are made, I believe we'll be in a good position to merge this PR and significantly enhance Peroxide's capabilities. Let me know if you have any questions or need clarification on any points.

- Removed methods in ParallelAlgorithm trait
since they don't seem to offer significant
benefits over their serial counterparts in
the current implementation.
- Removed some ParallelFPVector trait methods
for same reason as above
@soumyasen1809 soumyasen1809 marked this pull request as draft October 14, 2024 14:46
- Implemented par_reduce operation using
reduce_with () from rayon crate
- Added rayon and criterion as dependencies
- par_matrix removed from matrix.rs
and benchmark
@soumyasen1809 soumyasen1809 marked this pull request as ready for review October 14, 2024 15:23
@soumyasen1809
Copy link
Contributor Author

Hi @Axect
thank you for your feedback. Since I am still learning, these reviews help me a lot to improve.

Vector Implementation:
My suggestion would be to keep only the parallel functions that offer substantial benefits, such as par_fmap, par_zip_with, and par_filter for Vector.

Done

par_reduce Implementation:
I saw from the comments that the implementation of par_reduce was unsuccessful. I believe this could be resolved by wrapping the reduce_with function from the rayon crate.

Thank you for this. Yes, I was able to solve it using reduce_with.

Direct Dependency on rayon:
I propose that after merging this PR, we introduce a 'parallel' feature flag to make this dependency optional.

I agree too with this approach.

Matrix Implementation:
The par_matrix function still remains in the Matrix implementation. This should be addressed.

Addressed in 863ad6a

Code Formatting:
In particular, the changes to the toml file seem to be due to editor settings rather than rust-fmt, so please check your editor configuration.

I reverted it back in the commit: 3737099

@Axect
Copy link
Owner

Axect commented Oct 15, 2024

Thank you for your incredibly swift feedback! I'm impressed by how quickly you addressed these issues. Your consistent and rapid contributions are truly appreciated.

I've reviewed all the changes you've made, and I must say, everything looks well-organized and clean. You've done an excellent job of implementing the suggested modifications.

Given that all the major points have been addressed satisfactorily, I believe we're now ready to move forward. I'll proceed with merging these changes into the main branch.

As for the next step, I'll be taking on the task of introducing the 'parallel' feature flag myself. This will allow us to cleanly separate the parallel functionality and make it optional for users who don't require it.

Once again, thank you for your diligence and responsiveness. Your work is significantly improving Peroxide's capabilities and usability. If you have any final thoughts or concerns before we merge, please don't hesitate to share them. Otherwise, I look forward to integrating these enhancements into our project.

@soumyasen1809
Copy link
Contributor Author

@Axect I guess the PR is in a good shape to be merged. Next, you can add the parallel flag.

@Axect Axect merged commit fc41e3c into Axect:72-add-support-for-parallelization Oct 16, 2024
@soumyasen1809 soumyasen1809 deleted the rayon_test branch October 18, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants