Skip to content

Conversation

@mgirlich
Copy link
Contributor

Closes #579.

I think this early exit is less problematic than the one in #1591.

x <- 1:10e3 + 0L

bench::mark(
  one_x = vctrs::vec_in(1L, x),
  empty_x = vctrs::vec_in(integer(), x),
  x_one = vctrs::vec_in(x, 1L),
  x_empty = vctrs::vec_in(x, integer()),
  check = FALSE
)

# Before
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x          82µs  104.4µs     8941.    2.05MB     15.1
#> 2 empty_x      81.8µs   91.6µs     9809.  103.17KB     16.3
#> 3 x_one        38.1µs   43.8µs    20656.   78.23KB     32.5
#> 4 x_empty      31.6µs   36.4µs    24931.   78.23KB     34.5

# After
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x       79.59µs 103.57µs     7803.    2.05MB     14.2
#> 2 empty_x      1.98µs   2.23µs   342233.        0B      0  
#> 3 x_one       38.63µs  45.63µs    18206.   78.23KB     24.9
#> 4 x_empty      4.76µs   7.63µs    90908.   39.11KB     63.7

bench::mark(
  one_x = vctrs::vec_match(1L, x),
  empty_x = vctrs::vec_match(integer(), x),
  x_one = vctrs::vec_match(x, 1L),
  x_empty = vctrs::vec_match(x, integer()),
  check = FALSE
)

# Before
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x          81µs   89.1µs    10280.   106.8KB     21.9
#> 2 empty_x      81.2µs   88.8µs    10510.   103.2KB     18.4
#> 3 x_one        34.8µs   39.8µs    23007.    78.2KB     36.2
#> 4 x_empty      29.4µs   34.4µs    26560.    78.2KB     38.6

# After
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 one_x       81.26µs  90.84µs     9594.   106.8KB     21.1
#> 2 empty_x      2.02µs   2.27µs   310860.        0B      0  
#> 3 x_one       34.77µs  40.86µs    19220.    78.2KB     26.5
#> 4 x_empty      4.88µs   7.65µs    87792.    39.1KB     70.3

Created on 2022-08-31 with reprex v2.0.2

&_);
PROTECT_N(type, &nprot);
// can exit early here. Following operations cannot fail:
// * `vec_cast()` must work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a chance that vec_cast() might fail, e.g. in case of lossy coercions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought vec_ptype2() returns the richer type so that we can safely cast. In particular I would not expect to get a lossy coercion there. Is this not the case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right in general, at least for lossy coercions. But in principle there might be types where casting to the common type might fail. E.g. a strict factor type that where ptype(strict_factor, character) == strict_factor, and then vec_cast() might fail.

That said, it might be a good reason to decide that strict types shouldn't be coercible this way, and either fail (making it less practical for users) or lose their strictness (but that's counter to the idea of a safe strict type).

@lionel-
Copy link
Member

lionel- commented Sep 1, 2022

In principle I'm not sure early exits are worth it when it isn't immediately clear that they are safe/correct, since they only speed up the empty case.

@lionel-
Copy link
Member

lionel- commented Sep 20, 2022

Hi @mgirlich. Thanks for looking into this and the other 2 early exit PRs. However we are a bit uncomfortable with implementing these early exits as they make the code harder to reason about and require more unit tests to make sure the different code paths do not diverge.

It's partly my fault for having opened the issue #579. Sorry for closing this and the other PRs and thanks again for the effort put into these PRs.

@mgirlich
Copy link
Contributor Author

Don't worry. I learned something while looking at the vctrs code and I also learned about the issues with early exits. It might be nice to make other developers aware that there are no early exits and patterns to use instead.

@DavisVaughan
Copy link
Member

Just want to second Lionel that we really appreciate your efforts in trying to improve vctrs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fast paths in vec_in() and vec_match() when one input is empty

3 participants