perf: Faster single bracket querying of a graph #1658

schochastics · 2025-01-17T12:37:31Z

This PR refactors single bracket querying of a graph (g[1:3,4:6]) (closes #1465).

`[.igraph`

In the old version, the complete adjacency matrix was computed and then a subset created. The refactored function now builds the submatrix directly. This leads to a little speedup and a lower memory footprint.

set.seed(411)
g <- sample_gnp(5000,0.1)
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.84ms   1.94ms    452.      4.26MB     24.0
#> 2 old         61.36ms 118.39ms      9.11   210.4MB     38.3

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.69ms   1.81ms     462.     3.31MB     5.99
#> 2 old         59.95ms   61.1ms      16.0  190.81MB    16.0

E(g)$weight <- runif(ecount(g))
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.88ms   1.99ms     451.      3.2MB     7.98
#> 2 old         55.34ms  62.09ms      14.5   209.7MB    23.6

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new           1.7ms    1.8ms     432.     3.31MB     8.00
#> 2 old          56.7ms   57.8ms      13.8  190.81MB    13.8

^{Created on 2025-01-18 with reprex v2.1.1}

aviator-app · 2025-01-17T12:37:33Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.

See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

krlmlr

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:

we don't compute unique()
instead, we compute adj_out <- adjacent_vertices(x, i, mode = "out") if i is given, and adj_in <- adjacent_vertices(x, j, mode = "in") if j is given
if none are given, we forward to a different existing routine
if only one of i or j is given, we're done
if both are given, we compute vctrs::vec_set_intersect(adj_in, adj_out)

How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

R/indexing.R

schochastics · 2025-01-19T06:16:19Z

Manipulating a graph via this logic was moved to #1661

schochastics · 2025-01-22T13:13:15Z

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:
* we don't compute `unique()`

* instead, we compute `adj_out <- adjacent_vertices(x, i, mode = "out")` if `i` is given, and `adj_in <- adjacent_vertices(x, j, mode = "in")` if `j` is given

* if none are given, we forward to a different existing routine

* if only one of `i` or `j` is given, we're done

* if both are given, we compute `vctrs::vec_set_intersect(adj_in, adj_out)`
How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

I have tried this logic but always ran into issues for the case of non unique indices.
I tried to simplify a few things in the current solution. If you still think it is too complex, I am happy to give this logic another try

maelle · 2025-01-23T10:15:28Z

devtools::test_coverage_active_file()

krlmlr

get.adjacency.sparse() with a directed graph should involve only one or two copies. We could use that result with regular matrix subsetting and then tweak for the directed case. While this is not ideal, it may well be faster than anything we can come up in R land. Further optimizations are then possible by adding a from or to argument to as_edgelist() (which is called by get.adjacency.sparse()).

schochastics · 2025-01-23T21:10:56Z

I am surprised myself, but the difference between the submatrix routine and get.adjacency.sparse() is quite big. I will try a bit more though to simplify/split the submatrix routine to make it more readable (without too much loss of performance).

pkgload::load_all("~/git/R_packages/rigraph/")
#> ℹ Loading igraph
g <- sample_gnp(5000,0.05, directed = FALSE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g,sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.01ms   2.13ms     401.     5.05MB     53.9
#> 2 sub_dense     2.22ms   2.44ms     299.     8.43MB     53.8
#> 3 full_sparse  31.91ms  33.76ms      22.9  105.47MB     66.8

g <- sample_gnp(5000,0.05, directed = TRUE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g, sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.03ms   2.15ms     415.     3.97MB     24.0
#> 2 sub_dense     2.21ms   2.39ms     340.     7.79MB     45.8
#> 3 full_sparse  42.55ms   49.4ms      15.1   129.2MB     35.3

^{Created on 2025-01-23 with reprex v2.1.1}

schochastics · 2025-01-27T20:02:37Z

I am now convinced that this is as good as it gets. Any other approach seems to loose too mach performance

krlmlr

Great progress, we're getting there!

tests/testthat/test-indexing.R

R/indexing.R

schochastics · 2025-01-30T20:56:22Z

Non unique i/j are now handled outside of get_adjacency_submatrix(). I think this is indeed very much cleaner.

krlmlr

Do another final benchmarking round and then merge?

R/indexing.R

schochastics · 2025-02-06T13:36:29Z

Before merging: double check the tests and add if needed

krlmlr · 2025-02-06T14:19:34Z

Thanks!

@gvegayon

igraph 2.2.0 Update C core to version 0.10.17. See <https://github.com/igraph/rigraph/blob/20552ef94aed6ae4b23465ae8c7e4d3b0e558c71/src/vendor/cigraph/CHANGELOG.md> for a complete changelog, in particular the section "Breaking changes". - Generate almost all R implementations (#2047). - Expose `align_layout()` and add to `layout_nicely()` to align layout with axis automatically (#1907, #1957, #1958). - Expose `simple_cycles()` which lists all simple cycles (#1573, #1580). - Expose `is_complete()`, `is_clique()` and `is_ivs()` (#1316, #1388, #1581). - Expose `find_cycle()` (#1471, #1571). - Expose `feedback_vertex_set()` to find a minimum feedback vertex set in a graph (#1446, #1447, #1560). - Add `weights` parameter to `local_scan()` (#1082, #1448, #1982). - Add more layouts to `tkplot()` (#160, #1967). - Add `plot(mark.lwd = )` to change line width of mark.groups (#306, #1898). - Add `plot(vertex.label.angle = , vertex.label.adj = )` arguments to rotate vertex labels (#106, #1899). - Add relative size scaling to vertices in `plot()` (@gvegayon, #172). - Split `sample_bipartite()` into two functions for the G(n, m) and G(n, p) case (#630, #1692). - Implement multi attribute assignment (#55, #1916) and adding attributes via data frames (#1373, #1669, #1716). Support factors in `graph_from_data_frame()` (#34, #1829). - All `_hrg()` functions check their argument (#1074, #1699). - HRG printing with `type = "auto"` uses `"plain"` for large trees (#1879). - `get_edge_ids()` accepts data frames and matrices (#1663). - `igraph_version()` returns version of C core in an attribute (#1208, #1781). - Breaking change: change arguments default and order for `graph_from_lcf()` (#1858, #1872). - Breaking change: Subset assignment of a graph avoids addition of double edges and ignores loops unless the new `loops` argument is set to `TRUE` (#1662, #1661). - Breaking change: remove deprecated `neimode` parameter from `bfs()` and `dfs()` (#1105, #1526). - Breaking change: stricter deprecation of non-functional parameters of `layout_with_kk()` and `layout_with_fr()` (#1108, #1628). - `NA` attribute values are replaced with default values in `plot()` (#293, #1707). - `NA` checking only in from/to columns of edge data frame (#1906). - Keep vertex attribute type for `disjoint_union()` (#1640, #1909). - Error in bipartite projection if `type` is not a vertex attribute (#898, #1889). - Do not try to destroy non-initialized SIR objects upon error (#1888). - Added proper `NA` handling for matrix inputs (#917, #918, #1828). - Remove string matrix support from functions operating on biadjacency matrices (#1540, #1542, #1803). - Integer vectors are validated before transferring them to the C library (#1434, #1582). - Changed base location for `graph_from_graphdb()` and added tests (#1712, #1732). - Recycling of logical vectors when indexing into edge/vertex selectors now throws an error (#848, #1731). - Use `function()` instead of `(x)` in `arrow.mode` (#1722). - Temporarily disable generating an interface for `igraph_simple_cycles_callback()` as the framework for handling callback functions is not yet present. - Adjust loop position to vertex size in `plot()` (#1980). - Don't rescale plot coordinates to `[-1,1] x [-1,1]` by default (#1492, #1956, #1962). - Fail if `"layout"` attribute doesn't match the number of vertices (#1880). - Automatically arrange loops in `plot()` (#407, #556, #1881). - Vectorized drawing of arrows in `plot()` (#257, #1904). - Allow more than one edge label font family in `plot()` (#37, #1896). - Pie shapes now work as intended (#1882, #1883). - Loops not plotted on canvas (#1799, #1800). - Replace `NA` values in `label` attributes in `plot()` with default values (#1796, #1797). - Removed duplicated plotting of arrow heads (#640, #1709). - Correct mapping of edge label properties in plots when loops are present (#157, #1706). - Welcome Maëlle Salmon and David Schoch as authors (#1733), add author links (#1821). - Remove demos (#2008). - Add 2023 preprint (#1240, #1984). - Update allcontributors info (#1975). - Link to replacements of deprecated functions (#1823). - Add documentation of all file formats to `read_graph()` and `write_graph()` (#777, #1969). Recommend `saveRDS()` and `readRDS()` for saving and loading graphs (#1242, #1700). - Document return value of `make_clusters()` (#1794). - Clarify that `girth()` returns `Inf` for acyclic graphs (@eqmooring, #1831). - Clarify the use of weights in `layout_with_kk()`. - Refer to current latest version of R in troubleshooting page. - Fix typos in `laplacian_matrix()` documentation. - Document ellipsis in `cohesion()` (#971, #1985). - Correct the description of the `weights` parameter of `hits_scores()`. - Better describe output of `all_shortest_paths()` (#1029, #1778). - `make_graph()` now supports `"Groetzsch"` as an alias of `"Grotzsch"`. This change was implemented in the C core. - Update description of `order` parameter of `ego()` and related functions (#1746). - Added lifecycle table (#1525). - Add more about igraph.r2cdocs in the contributing guide (#1686, #1697). - Accelerate check if an index sequence corresponds to the entire list of vertices (#1427, #1818). - Faster single bracket querying of a graph (#1465, #1658).

schochastics marked this pull request as draft January 17, 2025 12:37

schochastics changed the title ~~refactor: speedup single bracket querying of a graph (#1465)~~ refactor: single bracket querying/manipulating of a graph (#1465) Jan 17, 2025

schochastics marked this pull request as ready for review January 18, 2025 18:26

schochastics requested review from krlmlr and maelle January 18, 2025 18:26

krlmlr reviewed Jan 18, 2025

View reviewed changes

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

schochastics added 3 commits January 19, 2025 07:08

faster adjacency matrix quering

e48d50f

replaced clean_indices with as_igraph_vs

e5e828b

unify helper function for querying

44561c5

schochastics force-pushed the indexing branch from d64b7b4 to 44561c5 Compare January 19, 2025 06:10

schochastics changed the title ~~refactor: single bracket querying/manipulating of a graph (#1465)~~ refactor: single bracket querying of a graph (#1465) Jan 19, 2025

schochastics added 3 commits January 22, 2025 11:39

refactor handling of unique and edge_list creation

ff45667

added tests

f6912f8

remove purrr

56ba6e9

schochastics force-pushed the indexing branch from f4795c7 to 56ba6e9 Compare January 22, 2025 14:18

added tests for duplicated i/j

ece42d0

krlmlr reviewed Jan 23, 2025

View reviewed changes

schochastics marked this pull request as draft January 23, 2025 19:52

schochastics mentioned this pull request Jan 23, 2025

feat: get_edge_ids() accepts data frames and matrices #1663

Merged

schochastics and others added 2 commits January 23, 2025 21:45

make dense case as.matrix(sparse)

7227626

Merge branch 'main' into indexing

ff0508d

Merge branch 'main' into indexing

13dd05b

schochastics marked this pull request as ready for review January 27, 2025 19:53

krlmlr reviewed Jan 30, 2025

View reviewed changes

tests/testthat/test-indexing.R Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

schochastics and others added 3 commits January 30, 2025 21:39

pulled duplication handling out of sumatrix function

e5baaa5

Merge branch 'main' into indexing

a94ea07

adjusted for new get_edge_ids()

39348a1

schochastics added 2 commits February 2, 2025 20:43

added skip for old Matrix versions in tests

a1f1409

added Matrix version requirement

2c6b070

krlmlr reviewed Feb 6, 2025

View reviewed changes

R/indexing.R Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

schochastics added 2 commits February 6, 2025 14:14

removed sparse parameter from subroutine and droped its drop argument

4113e8b

more efficient handling of missing i and j

11ed118

krlmlr changed the title ~~refactor: single bracket querying of a graph (#1465)~~ perf: Faster single bracket querying of a graph Feb 6, 2025

krlmlr merged commit 723e7d1 into igraph:main Feb 6, 2025
22 checks passed

schochastics added a commit to schochastics/rigraph that referenced this pull request Feb 8, 2025

perf: Faster single bracket querying of a graph (igraph#1658)

c859fc5

Uh oh!

perf: Faster single bracket querying of a graph #1658

perf: Faster single bracket querying of a graph #1658

Uh oh!

Conversation

schochastics commented Jan 17, 2025 • edited by krlmlr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[.igraph

Uh oh!

aviator-app bot commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Aviator status

Uh oh!

krlmlr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schochastics commented Jan 19, 2025

Uh oh!

schochastics commented Jan 22, 2025

Uh oh!

maelle commented Jan 23, 2025

Uh oh!

krlmlr left a comment

Choose a reason for hiding this comment

Uh oh!

schochastics commented Jan 23, 2025

Uh oh!

schochastics commented Jan 27, 2025

Uh oh!

krlmlr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schochastics commented Jan 30, 2025

Uh oh!

krlmlr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schochastics commented Feb 6, 2025

Uh oh!

Uh oh!

krlmlr commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

schochastics commented Jan 17, 2025 •

edited by krlmlr

Loading

`[.igraph`

aviator-app bot commented Jan 17, 2025 •

edited

Loading

krlmlr left a comment •

edited

Loading