Skip to content

Fixing pgRouting functions so that same set of rows are returned for any ordering of input rows #62

@krashish8

Description

@krashish8

Expected behavior and actual behavior

PostgreSQL rows are a SET, so the output of any algorithm should not depend on the ordering of the input rows.
(Its a function - ONE to ONE and not a relation - ONE to MANY).

However, on executing sample queries on the already-implemented function, it has been found that around 15-20% (maybe more) of the pgRouting functions fail to satisfy this requirement. e.g.: pgr_primDFS, pgr_bdDijkstra, pgr_drivingDistance, pgr_breadthFirstSearch, pgr_edwardMoore, etc.

The expected behavior is that the output of every algorithm should be the same, irrespective of the ordering of the input rows - edges_sql.

Possible reason for this behavior

The problem arises probably because the pgRouting graph is created from the edges_sql rows "sequentially", therefore the set-like nature of the rows is lost when the graph gets created. So, for two different ordering of rows, the different graph gets created internally. (And boost algorithm gives a different result for the different graph)

Possible fix for this problem

Order the rows in a particular order before creating the graph (preferably in ascending order of the id column). e.g:

This piece of code can be written in the driver file (.cpp) before graph.insert_edges(data_edges, total_edges) is called to create the graph:

Fixing the code:

// sorting the edges in an ascending order of their id, before creating the graph
std::sort(data_edges, data_edges + total_edges,
    [](const pgr_edge_t &edge1, const pgr_edge_t &edge2) -> bool {
        return edge1.id < edge2.id;
    });

Writing pgTAP test:

Add a pgTAP test for each function that tests the output when given different ordering of the input rows.

  • One file per function in the function's folder (e.g. create prim-rows-check.sql in pgtap/spanningTree/prim/prim/ folder.
  • Add the test to check the ordering. E.g.: For pgr_prim:
    PREPARE expectedOutput AS
    SELECT edge, cost FROM pgr_prim(
        'SELECT id, source, target, cost, reverse_cost
         FROM edge_table WHERE id < 14 ORDER BY id'
    );

    PREPARE descendingOrder AS ...
    PREPARE randomOrder1 AS ...
    PREPARE randomOrder2 AS ...
    ...

    SELECT set_eq('expectedOutput', 'descendingOrder');
    SELECT set_eq('expectedOutput', 'randomOrder1');
    SELECT set_eq('expectedOutput', 'randomOrder2');
    ...
  • Add the test for all possible signatures of the function, different edge costs, etc. (random costs, or update the costs as mentioned here).

Steps to be followed

I. Adding pgTAP tests with TODO for failing functions and merging to pgRouting's master

git remote -v

gsoc         https://github.com/pgRouting/GSoC-pgRouting/ (fetch)
gsoc         https://github.com/pgRouting/GSoC-pgRouting/ (push)
krashish8    https://github.com/krashish8/pgrouting/ (fetch)
krashish8    https://github.com/krashish8/pgrouting/ (push)
origin       https://github.com/krashish8/GSoC-pgRouting/ (fetch)
origin       https://github.com/krashish8/GSoC-pgRouting/ (push)
upstream    https://github.com/pgRouting/pgrouting/ (fetch)
upstream    https://github.com/pgRouting/pgrouting/ (push)
  1. Create a branch that tests all functions aka only pgTAP tests, but failing tests wrapped with a TODO().

Branch from master of pgRouting/pgrouting

git fetch upstream
git checkout upstream/master
git checkout -b pgtap-check-rows-ordering-ashish

Do all the commits required for the pgTAP test. Then do:

git fetch upstream                              (This points to main pgRouting repo)
git rebase upstream/master

[Maybe Incomplete]

  1. Create a tag for that branch on GSoC-pgRouting for GSoC administration purposes:
git tag -a <tag-name> -m "<changes made>"
git push gsoc <tag-name>                        (gsoc points to GSoC-pgRouting repo)
  1. Squash all the commits into one, in the clone of the fork of pgRouting, not the GSoC one (If required, cherry-pick the commits using git cherry-pick)

  2. After the squash, create an issue in pgRouting mentioning the problem.

  3. Check if the main repo hasn't changed, push the branch to forked pgRouting/pgrouting repo, and make a PR to master branch of pgrouting main repo, using GitHub, fixing that issue.

[In the clone of forked pgRouting, not GSoC-pgRouting]

git fetch gsoc                                                   (Fetch from GSoC-pgRouting repo)
git checkout gsoc/pgtap-check-rows-ordering-ashish
git checkout -b pgtap-check-rows-ordering-ashish
git fetch upstream                                               (Check if main repo hasn't changed)
git rebase -i upstream/master                                    (Squash all commits into one)
git push --set-upstream origin pgtap-check-rows-ordering-ashish  (origin points to forked pgRouting repo)

II. Fixing those functions in which pgTAP tests fail

Follow similar steps as above.

  1. Create a branch per set of functions (for which the pgTAP tests fail), starting from the one which I am familiar with (e.g. prim's), then others. Branch from master of pgRouting main repo. (e.g. fix-rows-ordering-prim-ashish)

  2. Push that branch to GSoC-pgRouting, fix the issue, and tag it for administrative purposes.

  3. In the clone of forked pgRouting, squash the commits into one and push the branch to forked pgRouting repo.

  4. Create an issue in main pgRouting repo, and make a PR fixing that issue.

Do the same for every set of functions which fail the pgTAP tests.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions