Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use as.numeric(seq_len(n)) in favor of seq(1, n) #1427

Open
krlmlr opened this issue Jul 5, 2024 · 10 comments · May be fixed by #1433
Open

Use as.numeric(seq_len(n)) in favor of seq(1, n) #1427

krlmlr opened this issue Jul 5, 2024 · 10 comments · May be fixed by #1433
Assignees

Comments

@krlmlr
Copy link
Contributor

krlmlr commented Jul 5, 2024

What happens, and what did you expect instead?

#1330 (comment)

We also need the by argument to ensure that the output is a numeric.

Perhaps add a function seq_numeric() that has a special case for zero-length output?

To reproduce

seq(by = 1) seems to be the fastest, look at the units. Maybe it's worth arguing with R core to support seq(start, start - by, by) without error and to return an empty vector in this case?

x <- seq(1, 1, 1)
class(x)
#> [1] "numeric"
x
#> [1] 1
seq(1, 0, 1)
#> Error in seq.default(1, 0, 1): wrong sign in 'by' argument
bench::mark(
  sum(seq(1, 100000, 1)),
  sum(as.numeric(rlang::seq2(1, 100000))),
  sum(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+05, 1))                628µs  791µs     1233.    1.91MB     46.8
#> 2 sum(as.numeric(rlang::seq2(1, 1e+05… 861ns  984ns   898993.        0B      0  
#> 3 sum(as.numeric(seq_len(1e+05)))      164ns  246ns  3139792.        0B    314.

Created on 2024-07-05 with reprex v2.1.0

System information

No response

@maelle
Copy link
Contributor

maelle commented Jul 5, 2024

I wasn't able to get to this today 😢

@clpippel
Copy link
Contributor

clpippel commented Jul 8, 2024

seq(1, vcount(graph)) does not work for empty vectors.
Alternatives are: seq_along, seq_len.

library(bench)
n <- 1
graph <- make_ring(n)
bench:: mark(
  "seq"   = seq(1, vcount(graph)),
  "along" = seq_along(V(graph)),
  "len"   = seq_len(vcount(graph))
)

The above fails if n = 0.
vcount() is faster then V(g).

@krlmlr
Copy link
Contributor Author

krlmlr commented Jul 8, 2024

Yeah, my seq_numeric() would special-case for the n == 0 edge case.

@clpippel
Copy link
Contributor

clpippel commented Jul 8, 2024

What is the point of a special function seq_numeric()?
Isn't it simpler to replace seq(1, x) with seq_len(x)?

@clpippel
Copy link
Contributor

clpippel commented Jul 8, 2024

I also wonder: how bad is it if we omit the test index_is_natural_sequence()? How likely is it that the test does indeed pass?

@krlmlr
Copy link
Contributor Author

krlmlr commented Jul 8, 2024

seq(1, x) is much faster, as the benchmark shows.

Because the natural sequence is the default argument for many input functions, I suspect that this case is pretty frequent and worth optimizing for, but I haven't done any measurements.

@szhorvat
Copy link
Member

szhorvat commented Jul 8, 2024

Your benchmark seems to show that seq_len (246 ns) is much faster than seq (791 µs), exactly the opposite of what you say above @krlmlr. What am I missing?

@krlmlr
Copy link
Contributor Author

krlmlr commented Jul 8, 2024

🤦 🤦

You're right, of course. But the test isn't representative either:

bench::mark(
  sum(seq(1, 10000000, 1)),
  sum(as.numeric(rlang::seq2(1, 10000000))),
  sum(as.numeric(seq_len(10000000)))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+07, 1))             77.6ms  93.34ms      10.6     191MB     16.0
#> 2 sum(as.numeric(rlang::seq2(1, 1…   902ns   1.02µs  800320.         0B      0  
#> 3 sum(as.numeric(seq_len(1e+07)))    164ns 246.04ns 2861548.         0B      0

Created on 2024-07-08 with reprex v2.1.0

The second and third example probably uses Gauss's formula. Let's try var() :

bench::mark(
  var(seq(1, 100000, 1)),
  var(as.numeric(rlang::seq2(1, 100000))),
  var(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 var(seq(1, 1e+05, 1))                818µs  960µs      883.    1.92MB     32.0
#> 2 var(as.numeric(rlang::seq2(1, 1e+05… 434µs  496µs     2002.   781.3KB     26.9
#> 3 var(as.numeric(seq_len(1e+05)))      432µs  495µs     2016.   781.3KB     29.4

Created on 2024-07-08 with reprex v2.1.0

This is more realistic but also indicates that as.numeric(seq_len()) is faster. I stand corrected.

@clpippel
Copy link
Contributor

clpippel commented Jul 8, 2024

A thought: avoid seq() in any case.

@clpippel
Copy link
Contributor

clpippel commented Jul 8, 2024

I tried make_ring(1E5). Optimization saves ca. 50%, 3.72ms v.s. 6.42ms.

@krlmlr krlmlr changed the title Avoid seq(1, 0) edge case Use as.numeric(seq_len(n)) in favor of seq(1, n) Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants