Skip to content

Optimizing DFS in Data.Graph #882

Open
@meooow25

Description

@meooow25

dfs is the core function in Data.Graph that all other algorithms (topSort, scc, bcc, etc) are based on. It takes a Graph and generates a [Tree Vertex].

dfs makes use of generate, prune and chop to generate a [Tree Vertex] and chop it into the returned [Tree Vertex].

Other functions further transform this [Tree Vertex], for example topSort flattens the [Tree Vertex] into a [Vertex].

These intermediate trees are unnecessary.

  • The trees generated by generate are immediately chopped into other trees.
  • For topSort, the trees returned by dfs are immediately turned into a list.

There is no reason we need to create these trees in memory.

I propose combining the existing generate, prune, and chop function into a new core function that eliminates the intermediate trees from generate and also allows us to choose what to build from the DFS.

dfsGraph :: (Vertex -> a -> b) -> (b -> a -> a) -> a -> Graph -> [Vertex] -> a

Then we can define, for example

dfs = dfsGraph Node (:) []
topSort = dfsGraph <build the list directly>

Benchmarks on a random graph with 10,000 vertices and 100,000 edges:

  dfs:            OK (0.90s)
    3.51 ms ±  79 μs, 57% less than baseline
  topSort:        OK (0.47s)
    3.60 ms ± 107 μs, 59% less than baseline

Other function such as scc and bcc should also benefit, I've not implemented them yet.

Thoughts? I can clean up my code into a PR if this sounds good.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions