Generate reduce #960

SohamTamba · 2018-07-31T22:07:52Z

mapreduce expects other processors to have access to data.
Otherwise, it executes sequentially.

I used @threads instead to run it in parallel.

cc @somil55

codecov · 2018-07-31T22:15:27Z

Codecov Report

Merging #960 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #960      +/-   ##
==========================================
+ Coverage   99.85%   99.85%   +<.01%     
==========================================
  Files          78       79       +1     
  Lines        2669     2677       +8     
==========================================
+ Hits         2665     2673       +8     
  Misses          4        4

dsrivastavv · 2018-08-01T09:10:43Z

@SohamTamba This should work for now but how about doing it in a distributed way? You might get a better performance using pmap and then reducing it at the main worker

SohamTamba · 2018-08-01T11:37:01Z

This issue with pmap is it will use Reps*nv(g) memory.

I tried using @distributed (with 4 workers) but it does not seem to improve performance.

function generate_min_set(g::AbstractGraph{T}, gen_func::Function, Reps::Integer) where T<:Integer

    min_set::Vector{T} = @distributed ((x::Vector{T}, y::Vector{T})->length(x)<length(y) ? x : y) for i in 1:Reps
        gen_func(g)
    end

    return min_set
end

dsrivastavv · 2018-08-01T11:41:42Z

ok!

dsrivastavv · 2018-08-01T11:42:24Z

@sbromberger Can you please review it?

SohamTamba · 2018-08-01T11:43:30Z

I'll make the change in generate_max_set along with the next Hueristic after this and VertexCover is accepted.
I'd prefer to merge as many PR's as possible before GSoC ends.

SohamTamba · 2018-08-01T12:39:41Z

@somil55
I coded a distributed implementation using remotecall_fetch.

For Ego Twitter graph which has 80,000 vertices, the threaded implementation has the same run time as the distributed implementation when Reps = 1000.

The run-time of the distributed increases because the graph has to be copied in every process.

Benefit of distributed: The code will be more memory efficient since the processes don't share memory.

Disadvantage of distributed: Each process must copy the graph, increasing run-time.

I think the threaded implementation would be better because it seems unlikely for Reps to be more than 100.

Let me know if you want me to PR the distributed implementation.

dsrivastavv · 2018-08-01T12:55:39Z

@SohamTamba In that case, threaded version seems more practical. What do you think @sbromberger?

sbromberger · 2018-08-01T13:27:42Z

Why not both with a kwarg in a wrapper function? parallel=: threads or something?

SohamTamba · 2018-08-01T15:31:48Z

That makes sense.

@threads is more suitable for this algorithm because it performs fast operations while iterating over the graph only once.

remotecall_fetch might be faster for other algorithms. This is the case for centrality measures. The distributed implementation already present is faster than the multi-threaded version I implemented.

SohamTamba · 2018-08-01T19:12:34Z

@sbromberger
I added a multithreaded and distributed implementation and a wrapper function.

Btw it turns out @distributed implementation is faster than the one with the remote calls.
I just so happen that for Reps = 20, the runtime was 1.2 s and a single sequential call had a runtime of 0.06s.
However, mapeduce did not run in parallel.

sbromberger · 2018-08-01T17:51:48Z

src/utils.jl

@@ -61,8 +61,29 @@ end
    generate_min_set(g, gen_func, Reps)
 Generate a vector `Reps` times using `gen_func(g)` and return the vector with the least elements.
 """
-generate_min_set(g::AbstractGraph{T}, gen_func, Reps::Integer) where T<: Integer =
-mapreduce(gen_func, (x, y)->length(x)<length(y) ? x : y, Iterators.repeated(g, Reps))
+function generate_min_set(g::AbstractGraph{T}, gen_func::Function, Reps::Integer) where T<: Integer 


why is Reps capitalized? Afaik we don't do that anywhere else.

That is the convention I had noticed online.
I can change it to reps instead.

sbromberger · 2018-08-01T21:42:54Z

src/utils.jl

 """
-    generate_min_set(g, gen_func, Reps)
+    generate_min_set(g, gen_func, Reps; parallel=:threads)


Why is Reps capitalized? AFAIK we don't do that anywhere else for variables greater than 1 or 2 chars.

Also, with the new Parallel module, this should probably go into Parallel/utils.jl.

I've included it in the parallel module.
Should I replace the implementation using mapreduce from the sequential model?

SohamTamba · 2018-08-02T15:08:01Z

I was just thinking,
instead of having 2 functions for min_set and max_set, how about I use Base.ordering and create only one function?

SohamTamba · 2018-08-02T15:11:19Z

On another note, I noticed I tested max_set twice instead of min_set and max_set once each.

Should I create a separate PR to fix it?

sbromberger

Looks good but would prefer a variable rename so that all non-struct vars are lower case.

sbromberger · 2018-09-08T18:19:20Z

src/Parallel/utils.jl

+"""
+    generate_min_set(g, gen_func, Reps; parallel=:threads)
+
+Generate a vector `Reps` times using `gen_func(g)` and return the vector with the least elements.


This is so minor, but it bothers me: can we make sure standard variables are lower case?

Yeah.
I thought I had done that.

Must be an issue from rebasing.

sbromberger

Looks great. Thanks!

…r#984) * Fix kruskal_mst for working with abstract graphs * Remove abstract edge type in kruskal * Fix edgetype

We will probably want to move this into SimpleGraphs at some point, but until then, I think this is good.

…bromberger#1001)

* attempt 32-bit compatibility * don't allow downsampling to Int32: introduces accuracy bugs * add 0.7 and nightly tests * allow integers in betweenness centrality * attempt to fix parallel 32-bit * work around splitrange issue * Revert "work around splitrange issue" This reverts commit 58cbcf8. * splitrange overload

* Add simple_cycles_limited_length. * Revisions after review comments. Merging because it looks like codecov is hung up.

Fix zenodo error

* OneTo * fixes reverse (sbromberger#994) We will probably want to move this into SimpleGraphs at some point, but until then, I think this is good. * Lots more doctests (sbromberger#995) * SimpleGraph(SimpleGraph) plus tests. (sbromberger#998) * added documentation for SimpleGraph and SimpleDiGraph constructors (sbromberger#1001) * fixes edgeiter equality (sbromberger#1002) * 32-bit compatibility (sbromberger#999) * attempt 32-bit compatibility * don't allow downsampling to Int32: introduces accuracy bugs * add 0.7 and nightly tests * allow integers in betweenness centrality * attempt to fix parallel 32-bit * work around splitrange issue * Revert "work around splitrange issue" This reverts commit 58cbcf8. * splitrange overload * MultiThreaded Centrality Measures Implementations (sbromberger#987) * misc doc fixes (sbromberger#1003) * Fix sbromberger#999 (sbromberger#1004) * Parallel BFS Generate Reduce

dsrivastavv approved these changes Aug 1, 2018

View reviewed changes

SohamTamba changed the title ~~Genrate reduce~~ Generate reduce Aug 1, 2018

SohamTamba mentioned this pull request Aug 1, 2018

Vertex Cover #949

Merged

SohamTamba force-pushed the genrate_reduce branch from 34aa568 to 106f08c Compare August 1, 2018 18:33

sbromberger reviewed Aug 1, 2018

View reviewed changes

SohamTamba force-pushed the genrate_reduce branch from 106f08c to d61238b Compare August 2, 2018 15:03

sbromberger added the GSOC 2018 label Aug 9, 2018

sbromberger reviewed Sep 8, 2018

View reviewed changes

SohamTamba force-pushed the genrate_reduce branch from 665fdaa to fed0aaf Compare September 9, 2018 12:32

sbromberger approved these changes Sep 9, 2018

View reviewed changes

sbromberger merged commit df8d83a into sbromberger:master Sep 9, 2018

VPetukhov and others added 6 commits September 9, 2018 23:24

Fix kruskal_mst function for working with abstract graphs (sbromberge…

46bfbf9

…r#984) * Fix kruskal_mst for working with abstract graphs * Remove abstract edge type in kruskal * Fix edgetype

fixes reverse (sbromberger#994)

ea37d58

We will probably want to move this into SimpleGraphs at some point, but until then, I think this is good.

Lots more doctests (sbromberger#995)

7bc737b

SimpleGraph(SimpleGraph) plus tests. (sbromberger#998)

4355acc

added documentation for SimpleGraph and SimpleDiGraph constructors (s…

999f432

…bromberger#1001)

fixes edgeiter equality (sbromberger#1002)

15cd090

ChrisRackauckas and others added 11 commits September 9, 2018 23:24

MultiThreaded Centrality Measures Implementations (sbromberger#987)

f181137

misc doc fixes (sbromberger#1003)

7e8bb4f

Fix sbromberger#999 (sbromberger#1004)

67a6a2a

Karger Min Cut (sbromberger#988)

bf4cf25

Fix misspelling. (sbromberger#1005)

7330c55

Fixed some bugs in simplecycles_hawick_james (sbromberger#1007)

ea4c75c

Add function to find short cycles (sbromberger#1006)

105407c

* Add simple_cycles_limited_length. * Revisions after review comments. Merging because it looks like codecov is hung up.

Update .zenodo.json

5a543c1

Fix zenodo error

Merge branch 'master' into genrate_reduce

acd6f5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate reduce #960

Generate reduce #960

SohamTamba commented Jul 31, 2018 •

edited

Loading

codecov bot commented Jul 31, 2018 •

edited

Loading

dsrivastavv commented Aug 1, 2018 •

edited

Loading

SohamTamba commented Aug 1, 2018

dsrivastavv commented Aug 1, 2018

dsrivastavv commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

SohamTamba commented Aug 1, 2018 •

edited

Loading

dsrivastavv commented Aug 1, 2018

sbromberger commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

sbromberger Aug 1, 2018

SohamTamba Aug 2, 2018

sbromberger Aug 1, 2018

sbromberger Aug 1, 2018

SohamTamba Aug 2, 2018

SohamTamba commented Aug 2, 2018

SohamTamba commented Aug 2, 2018

sbromberger left a comment

sbromberger Sep 8, 2018

SohamTamba Sep 9, 2018

sbromberger left a comment

Generate reduce #960

Generate reduce #960

Conversation

SohamTamba commented Jul 31, 2018 • edited Loading

codecov bot commented Jul 31, 2018 • edited Loading

Codecov Report

dsrivastavv commented Aug 1, 2018 • edited Loading

SohamTamba commented Aug 1, 2018

dsrivastavv commented Aug 1, 2018

dsrivastavv commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

SohamTamba commented Aug 1, 2018 • edited Loading

dsrivastavv commented Aug 1, 2018

sbromberger commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

SohamTamba commented Aug 1, 2018

sbromberger Aug 1, 2018

Choose a reason for hiding this comment

SohamTamba Aug 2, 2018

Choose a reason for hiding this comment

sbromberger Aug 1, 2018

Choose a reason for hiding this comment

sbromberger Aug 1, 2018

Choose a reason for hiding this comment

SohamTamba Aug 2, 2018

Choose a reason for hiding this comment

SohamTamba commented Aug 2, 2018

SohamTamba commented Aug 2, 2018

sbromberger left a comment

Choose a reason for hiding this comment

sbromberger Sep 8, 2018

Choose a reason for hiding this comment

SohamTamba Sep 9, 2018

Choose a reason for hiding this comment

sbromberger left a comment

Choose a reason for hiding this comment

SohamTamba commented Jul 31, 2018 •

edited

Loading

codecov bot commented Jul 31, 2018 •

edited

Loading

dsrivastavv commented Aug 1, 2018 •

edited

Loading

SohamTamba commented Aug 1, 2018 •

edited

Loading