-
-
Notifications
You must be signed in to change notification settings - Fork 206
perf: Faster single bracket querying of a graph #1658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current Aviator status
This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:
- we don't compute
unique() - instead, we compute
adj_out <- adjacent_vertices(x, i, mode = "out")ifiis given, andadj_in <- adjacent_vertices(x, j, mode = "in")ifjis given - if none are given, we forward to a different existing routine
- if only one of
iorjis given, we're done - if both are given, we compute
vctrs::vec_set_intersect(adj_in, adj_out)
How is the test coverage for this code?
I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.
d64b7b4 to
44561c5
Compare
|
Manipulating a graph via this logic was moved to #1661 |
I have tried this logic but always ran into issues for the case of non unique indices. |
f4795c7 to
56ba6e9
Compare
krlmlr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get.adjacency.sparse() with a directed graph should involve only one or two copies. We could use that result with regular matrix subsetting and then tweak for the directed case. While this is not ideal, it may well be faster than anything we can come up in R land. Further optimizations are then possible by adding a from or to argument to as_edgelist() (which is called by get.adjacency.sparse()).
|
I am surprised myself, but the difference between the submatrix routine and pkgload::load_all("~/git/R_packages/rigraph/")
#> ℹ Loading igraph
g <- sample_gnp(5000,0.05, directed = FALSE)
bench::mark(check = FALSE,
sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
full_sparse = as_adjacency_matrix(g,sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sub_sparse 2.01ms 2.13ms 401. 5.05MB 53.9
#> 2 sub_dense 2.22ms 2.44ms 299. 8.43MB 53.8
#> 3 full_sparse 31.91ms 33.76ms 22.9 105.47MB 66.8
g <- sample_gnp(5000,0.05, directed = TRUE)
bench::mark(check = FALSE,
sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
full_sparse = as_adjacency_matrix(g, sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sub_sparse 2.03ms 2.15ms 415. 3.97MB 24.0
#> 2 sub_dense 2.21ms 2.39ms 340. 7.79MB 45.8
#> 3 full_sparse 42.55ms 49.4ms 15.1 129.2MB 35.3Created on 2025-01-23 with reprex v2.1.1 |
|
I am now convinced that this is as good as it gets. Any other approach seems to loose too mach performance |
krlmlr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, we're getting there!
|
Non unique i/j are now handled outside of |
krlmlr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do another final benchmarking round and then merge?
|
Before merging: double check the tests and add if needed |
|
Thanks! |
igraph 2.2.0 Update C core to version 0.10.17. See <https://github.com/igraph/rigraph/blob/20552ef94aed6ae4b23465ae8c7e4d3b0e558c71/src/vendor/cigraph/CHANGELOG.md> for a complete changelog, in particular the section "Breaking changes". - Generate almost all R implementations (#2047). - Expose `align_layout()` and add to `layout_nicely()` to align layout with axis automatically (#1907, #1957, #1958). - Expose `simple_cycles()` which lists all simple cycles (#1573, #1580). - Expose `is_complete()`, `is_clique()` and `is_ivs()` (#1316, #1388, #1581). - Expose `find_cycle()` (#1471, #1571). - Expose `feedback_vertex_set()` to find a minimum feedback vertex set in a graph (#1446, #1447, #1560). - Add `weights` parameter to `local_scan()` (#1082, #1448, #1982). - Add more layouts to `tkplot()` (#160, #1967). - Add `plot(mark.lwd = )` to change line width of mark.groups (#306, #1898). - Add `plot(vertex.label.angle = , vertex.label.adj = )` arguments to rotate vertex labels (#106, #1899). - Add relative size scaling to vertices in `plot()` (@gvegayon, #172). - Split `sample_bipartite()` into two functions for the G(n, m) and G(n, p) case (#630, #1692). - Implement multi attribute assignment (#55, #1916) and adding attributes via data frames (#1373, #1669, #1716). Support factors in `graph_from_data_frame()` (#34, #1829). - All `_hrg()` functions check their argument (#1074, #1699). - HRG printing with `type = "auto"` uses `"plain"` for large trees (#1879). - `get_edge_ids()` accepts data frames and matrices (#1663). - `igraph_version()` returns version of C core in an attribute (#1208, #1781). - Breaking change: change arguments default and order for `graph_from_lcf()` (#1858, #1872). - Breaking change: Subset assignment of a graph avoids addition of double edges and ignores loops unless the new `loops` argument is set to `TRUE` (#1662, #1661). - Breaking change: remove deprecated `neimode` parameter from `bfs()` and `dfs()` (#1105, #1526). - Breaking change: stricter deprecation of non-functional parameters of `layout_with_kk()` and `layout_with_fr()` (#1108, #1628). - `NA` attribute values are replaced with default values in `plot()` (#293, #1707). - `NA` checking only in from/to columns of edge data frame (#1906). - Keep vertex attribute type for `disjoint_union()` (#1640, #1909). - Error in bipartite projection if `type` is not a vertex attribute (#898, #1889). - Do not try to destroy non-initialized SIR objects upon error (#1888). - Added proper `NA` handling for matrix inputs (#917, #918, #1828). - Remove string matrix support from functions operating on biadjacency matrices (#1540, #1542, #1803). - Integer vectors are validated before transferring them to the C library (#1434, #1582). - Changed base location for `graph_from_graphdb()` and added tests (#1712, #1732). - Recycling of logical vectors when indexing into edge/vertex selectors now throws an error (#848, #1731). - Use `function()` instead of `(x)` in `arrow.mode` (#1722). - Temporarily disable generating an interface for `igraph_simple_cycles_callback()` as the framework for handling callback functions is not yet present. - Adjust loop position to vertex size in `plot()` (#1980). - Don't rescale plot coordinates to `[-1,1] x [-1,1]` by default (#1492, #1956, #1962). - Fail if `"layout"` attribute doesn't match the number of vertices (#1880). - Automatically arrange loops in `plot()` (#407, #556, #1881). - Vectorized drawing of arrows in `plot()` (#257, #1904). - Allow more than one edge label font family in `plot()` (#37, #1896). - Pie shapes now work as intended (#1882, #1883). - Loops not plotted on canvas (#1799, #1800). - Replace `NA` values in `label` attributes in `plot()` with default values (#1796, #1797). - Removed duplicated plotting of arrow heads (#640, #1709). - Correct mapping of edge label properties in plots when loops are present (#157, #1706). - Welcome Maëlle Salmon and David Schoch as authors (#1733), add author links (#1821). - Remove demos (#2008). - Add 2023 preprint (#1240, #1984). - Update allcontributors info (#1975). - Link to replacements of deprecated functions (#1823). - Add documentation of all file formats to `read_graph()` and `write_graph()` (#777, #1969). Recommend `saveRDS()` and `readRDS()` for saving and loading graphs (#1242, #1700). - Document return value of `make_clusters()` (#1794). - Clarify that `girth()` returns `Inf` for acyclic graphs (@eqmooring, #1831). - Clarify the use of weights in `layout_with_kk()`. - Refer to current latest version of R in troubleshooting page. - Fix typos in `laplacian_matrix()` documentation. - Document ellipsis in `cohesion()` (#971, #1985). - Correct the description of the `weights` parameter of `hits_scores()`. - Better describe output of `all_shortest_paths()` (#1029, #1778). - `make_graph()` now supports `"Groetzsch"` as an alias of `"Grotzsch"`. This change was implemented in the C core. - Update description of `order` parameter of `ego()` and related functions (#1746). - Added lifecycle table (#1525). - Add more about igraph.r2cdocs in the contributing guide (#1686, #1697). - Accelerate check if an index sequence corresponds to the entire list of vertices (#1427, #1818). - Faster single bracket querying of a graph (#1465, #1658).
This PR refactors single bracket querying of a graph (
g[1:3,4:6]) (closes #1465).[.igraphIn the old version, the complete adjacency matrix was computed and then a subset created. The refactored function now builds the submatrix directly. This leads to a little speedup and a lower memory footprint.
Created on 2025-01-18 with reprex v2.1.1