Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Generate reduce #960

Merged
merged 17 commits into from
Sep 9, 2018
Merged

Generate reduce #960

merged 17 commits into from
Sep 9, 2018

Conversation

SohamTamba
Copy link
Contributor

@SohamTamba SohamTamba commented Jul 31, 2018

mapreduce expects other processors to have access to data.
Otherwise, it executes sequentially.

I used @threads instead to run it in parallel.

cc @somil55

@codecov
Copy link

codecov bot commented Jul 31, 2018

Codecov Report

Merging #960 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #960      +/-   ##
==========================================
+ Coverage   99.85%   99.85%   +<.01%     
==========================================
  Files          78       79       +1     
  Lines        2669     2677       +8     
==========================================
+ Hits         2665     2673       +8     
  Misses          4        4

@dsrivastavv
Copy link
Contributor

dsrivastavv commented Aug 1, 2018

@SohamTamba This should work for now but how about doing it in a distributed way? You might get a better performance using pmap and then reducing it at the main worker

@SohamTamba
Copy link
Contributor Author

This issue with pmap is it will use Reps*nv(g) memory.

I tried using @distributed (with 4 workers) but it does not seem to improve performance.

function generate_min_set(g::AbstractGraph{T}, gen_func::Function, Reps::Integer) where T<:Integer

    min_set::Vector{T} = @distributed ((x::Vector{T}, y::Vector{T})->length(x)<length(y) ? x : y) for i in 1:Reps
        gen_func(g)
    end

    return min_set
end

@dsrivastavv
Copy link
Contributor

ok!

@dsrivastavv
Copy link
Contributor

@sbromberger Can you please review it?

@SohamTamba
Copy link
Contributor Author

I'll make the change in generate_max_set along with the next Hueristic after this and VertexCover is accepted.
I'd prefer to merge as many PR's as possible before GSoC ends.

@SohamTamba
Copy link
Contributor Author

SohamTamba commented Aug 1, 2018

@somil55
I coded a distributed implementation using remotecall_fetch.

For Ego Twitter graph which has 80,000 vertices, the threaded implementation has the same run time as the distributed implementation when Reps = 1000.

The run-time of the distributed increases because the graph has to be copied in every process.

Benefit of distributed: The code will be more memory efficient since the processes don't share memory.

Disadvantage of distributed: Each process must copy the graph, increasing run-time.

I think the threaded implementation would be better because it seems unlikely for Reps to be more than 100.

Let me know if you want me to PR the distributed implementation.

@dsrivastavv
Copy link
Contributor

@SohamTamba In that case, threaded version seems more practical. What do you think @sbromberger?

@sbromberger
Copy link
Owner

Why not both with a kwarg in a wrapper function? parallel=: threads or something?

@SohamTamba SohamTamba changed the title Genrate reduce Generate reduce Aug 1, 2018
@SohamTamba SohamTamba mentioned this pull request Aug 1, 2018
@SohamTamba
Copy link
Contributor Author

That makes sense.

@threads is more suitable for this algorithm because it performs fast operations while iterating over the graph only once.

remotecall_fetch might be faster for other algorithms. This is the case for centrality measures. The distributed implementation already present is faster than the multi-threaded version I implemented.

@SohamTamba
Copy link
Contributor Author

@sbromberger
I added a multithreaded and distributed implementation and a wrapper function.

Btw it turns out @distributed implementation is faster than the one with the remote calls.
I just so happen that for Reps = 20, the runtime was 1.2 s and a single sequential call had a runtime of 0.06s.
However, mapeduce did not run in parallel.

src/utils.jl Outdated
@@ -61,8 +61,29 @@ end
generate_min_set(g, gen_func, Reps)
Generate a vector `Reps` times using `gen_func(g)` and return the vector with the least elements.
"""
generate_min_set(g::AbstractGraph{T}, gen_func, Reps::Integer) where T<: Integer =
mapreduce(gen_func, (x, y)->length(x)<length(y) ? x : y, Iterators.repeated(g, Reps))
function generate_min_set(g::AbstractGraph{T}, gen_func::Function, Reps::Integer) where T<: Integer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is Reps capitalized? Afaik we don't do that anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the convention I had noticed online.
I can change it to reps instead.

src/utils.jl Outdated
"""
generate_min_set(g, gen_func, Reps)
generate_min_set(g, gen_func, Reps; parallel=:threads)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Reps capitalized? AFAIK we don't do that anywhere else for variables greater than 1 or 2 chars.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, with the new Parallel module, this should probably go into Parallel/utils.jl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included it in the parallel module.
Should I replace the implementation using mapreduce from the sequential model?

@SohamTamba
Copy link
Contributor Author

I was just thinking,
instead of having 2 functions for min_set and max_set, how about I use Base.ordering and create only one function?

@SohamTamba
Copy link
Contributor Author

On another note, I noticed I tested max_set twice instead of min_set and max_set once each.

Should I create a separate PR to fix it?

Copy link
Owner

@sbromberger sbromberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but would prefer a variable rename so that all non-struct vars are lower case.

"""
generate_min_set(g, gen_func, Reps; parallel=:threads)

Generate a vector `Reps` times using `gen_func(g)` and return the vector with the least elements.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so minor, but it bothers me: can we make sure standard variables are lower case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.
I thought I had done that.

Must be an issue from rebasing.

Copy link
Owner

@sbromberger sbromberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks!

@sbromberger sbromberger merged commit df8d83a into sbromberger:master Sep 9, 2018
ChrisRackauckas and others added 11 commits September 9, 2018 23:24
* attempt 32-bit compatibility

* don't allow downsampling to Int32: introduces accuracy bugs

* add 0.7 and nightly tests

* allow integers in betweenness centrality

* attempt to fix parallel 32-bit

* work around splitrange issue

* Revert "work around splitrange issue"

This reverts commit 58cbcf8.

* splitrange overload
* Add simple_cycles_limited_length.

* Revisions after review comments.

Merging because it looks like codecov is hung up.
Fix zenodo error
* OneTo

* fixes reverse (sbromberger#994)

We will probably want to move this into SimpleGraphs at some point, but until then, I think this is good.

* Lots more doctests (sbromberger#995)

* SimpleGraph(SimpleGraph) plus tests. (sbromberger#998)

* added documentation for SimpleGraph and SimpleDiGraph constructors (sbromberger#1001)

* fixes edgeiter equality (sbromberger#1002)

* 32-bit compatibility (sbromberger#999)

* attempt 32-bit compatibility

* don't allow downsampling to Int32: introduces accuracy bugs

* add 0.7 and nightly tests

* allow integers in betweenness centrality

* attempt to fix parallel 32-bit

* work around splitrange issue

* Revert "work around splitrange issue"

This reverts commit 58cbcf8.

* splitrange overload

* MultiThreaded Centrality Measures Implementations (sbromberger#987)

* misc doc fixes (sbromberger#1003)

* Fix sbromberger#999 (sbromberger#1004)

* Parallel BFS

Generate Reduce
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants