Graph algorithms in Cython
Welcome to our Python library for graph algorithms. So far, the library only includes Dijkstra's algorithm but we should add a range of common path algorithms later. It is also open-source and easy to integrate with other Python libraries. To get started, simply install the library using pip, and import it into your Python project.
To install Edsger, simply use pip:
pip install edsger
If you want to install from source, you can clone the repository and install it using pip:
git clone git@github.com:aetperf/Edsger.git
cd Edsger
pip install .
To use Dijkstra's algorithm, you can import the Dijkstra
class from the path
module. The function takes a graph and a source node as input, and returns the shortest path from the source node to all other nodes in the graph.
import pandas as pd
from edsger.path import Dijkstra
# Create a DataFrame with the edges of the graph
edges = pd.DataFrame({
'tail': [0, 0, 1, 2, 2, 3],
'head': [1, 2, 2, 3, 4, 4],
'weight': [1, 4, 2, 1, 3, 1]
})
edges
tail | head | weight | |
---|---|---|---|
0 | 0 | 1 | 1 |
1 | 0 | 2 | 4 |
2 | 1 | 2 | 2 |
3 | 2 | 3 | 1 |
4 | 2 | 4 | 3 |
5 | 3 | 4 | 1 |
# Initialize the Dijkstra object
dijkstra = Dijkstra(edges)
# Run the algorithm from a source vertex
shortest_paths = dijkstra.run(vertex_idx=0)
print("Shortest paths:", shortest_paths)
Shortest paths: [0. 1. 3. 4. 5.]
We get the shortest paths from the source node 0 to all other nodes in the graph. The output is an array with the shortest path length to each node. A path length is the sum of the weights of the edges in the path.
It is also possible to use a graph with different column names for the tail, head and weight values. The column names can be specified using the tail
, head
and weight
arguments:
other_edges = pd.DataFrame({
'from': [0, 0, 1, 2, 2, 3],
'to': [1, 2, 2, 3, 4, 4],
'travel_time': [1, 4, 2, 1, 3, 1]
})
other_dijkstra = Dijkstra(edges, tail='from', head='to', weight='travel_time')
The orientation
argument (a string with a default value of 'out'
) specifies the orientation of the algorithm. It can be either 'out'
for single source shortest paths or 'in'
for single target shortest path.
dijkstra = Dijkstra(edges, orientation='in')
# Run the algorithm to a target vertex
shortest_paths = dijkstra.run(vertex_idx=0)
print("Shortest paths:", shortest_paths)
Shortest paths: [ 0. inf inf inf inf]
Once the Dijkstra is instanciated with a given graph and orientation, the run
method can be called multiple times with different source vertices.
# Run the algorithm to another target vertex
shortest_paths = dijkstra.run(vertex_idx=4)
print("Shortest paths:", shortest_paths)
Shortest paths: [5. 4. 2. 1. 0.]
The check_edges
argument (a boolean with a default value of False
) validates the given graph. When set to True
, it ensures the DataFrame is well-formed by:
- Checking for the presence of required columns for edge tail, head and weigh values
- Verifying that the data types are correct (integer for tail and head, integer or float for weight)
- Ensuring there are no missing or invalid values (e.g. negative weights)
If any of these checks fail, an appropriate error is raised.
invalid_edges = pd.DataFrame({
'tail': [0, 0, 1, 2, 2, 3],
'head': [1, 2, 2, 3, 4, 4],
'weight': [1, 4, 2, -1, 3, 1]
})
dijkstra = Dijkstra(invalid_edges, check_edges=True)
ValueError: edges['weight'] should be nonnegative
Finally, the permute
argument (boolean with a default value of False
) allows to permute the IDs of the nodes. If set to True
, the node IDs will be reindexed to start from 0 and be contiguous for the inner computations, and the output will be reindexed to the original IDs, loading the same result as if the IDs were not permuted. The permutation may save memory and computation time for large graphs, for example if a significant ratio of the nodes are not actually used in the graph.
SHIFT = 1000
shifted_edges = pd.DataFrame({
'tail': [0, 0, 1, 2, 2, 3],
'head': [1, 2, 2, 3, 4, 4],
'weight': [1, 4, 2, 1, 3, 1]
})
shifted_edges["tail"] += SHIFT
shifted_edges["head"] += SHIFT
shifted_edges.head(3)
tail | head | weight | |
---|---|---|---|
0 | 1000 | 1001 | 1 |
1 | 1000 | 1002 | 4 |
2 | 1001 | 1002 | 2 |
dijkstra = Dijkstra(shifted_edges, permute=True)
shortest_paths = dijkstra.run(vertex_idx=0 + SHIFT)
print("Shortest paths:", shortest_paths)
Shortest paths: [inf inf inf ... 3. 4. 5.]
shortest_paths[-5:]
array([0., 1., 3., 4., 5.])
The run
method can take the following arguments besides the source/target vertex index:
path_tracking
: bool, optional (default=False)
Whether to track the shortest path(s) from/to the source/target vertex to all other vertices in the graph.
dijkstra = Dijkstra(edges)
shortest_paths = dijkstra.run(vertex_idx=0, path_tracking=True)
dijkstra.get_path(vertex_idx=4)
array([4, 3, 2, 1, 0], dtype=uint32)
dijkstra.get_path(vertex_idx=0)
array([0], dtype=uint32)
The path is returned as an array of vertex indices. This is an ordered list of vertices from the source to the target vertex if orientation
is 'in'
, and from the target to the source vertex if orientation
is 'out'
. Both the source and target vertices are included in the path.
return_inf
: bool, optional (default=True)
Whether to return path lengths as infinity (np.inf) when no path exists.
dijkstra = Dijkstra(edges, orientation='in')
shortest_paths = dijkstra.run(vertex_idx=0, return_inf=False)
shortest_paths
array([0.00000000e+000, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
1.79769313e+308])
The value 1.79769313e+308 actually used in the code is the largest number that can be represented in the floating point format (np.float64
).
return_series
: bool, optional (default=False)
Instead of returning a NumPy array, the run
method may return a Pandas Series object with the path lengths as values and the vertex indices as the index.
shortest_paths = dijkstra.run(vertex_idx=4, return_series=True)
shortest_paths
vertex_idx | path_length |
---|---|
0 | 5 |
1 | 4 |
2 | 2 |
3 | 1 |
4 | 0 |
heap_length_ratio
: float, optional (default=1.0)
This is an experimental parameter that controls the size of the heap used in the algorithm. The heap is a static array that is used to store the vertices that may be visited next. A value of 1.0 means that the heap is the same size as the number of vertices, so there is no risk of overflow. Be aware that there is no guarantee that the algorithm will work with a heap length ratio smaller that 1. The lowest ratio that works for a given graph depends on the graph structure and the source vertex. For a rather sparse graph, a small ratio may work, but for a dense graph, a ratio of 1.0 is required.
We welcome contributions to the Edsger library. If you have any suggestions, bug reports, or feature requests, please open an issue on our GitHub repository.
Edsger is licensed under the MIT License. See the LICENSE file for more details.
For any questions or inquiries, please contact François Pacull at francois.pacull@architecture-performance.fr.