Description
I've been working on an ALTREP "view" class, and have discovered a number of small places where I think base R itself could be a little more ALTREP friendly. I'd be interested in working on a few of these next week if there is support for it, I think it could be a really tractable win for r-dev-day with small and well scoped issues to tackle!
In particular, there are a few places I've found so far where R uses, say, LOGICAL()
where it could instead use LOGICAL_RO()
. These are cases where it seems like R only needs a read only pointer into the data. This would be more friendly to ALTREP classes that can provide a readonly pointer easily, but would have to materialize to provide a writable pointer (like an ALTREP "view").
duplicate1()
, used byRf_duplicate()
copyVector()
R_compute_identical()
, used by the R levelidentical()
All 3 of these currently force a materialization of my ALTREP class due to requesting a writable pointer, when I think they only need a readonly one.
In addition to these, I think that ExtractSubset()
(used by the R level [
, and many other places) could be modified to have really nice efficiency gains for ALTREP types. It currently calls the corresponding *_ELT()
method on each index. I was thinking that alternatively it could try to see if a call to Dataptr_or_null()
returns a cheap read only dataptr that it could use to extract from instead. I think that would make subset extraction way more efficient for ALTREP classes that can provide a readonly pointer easily (again, like a view).
I do realize there is an ALTREP Extract_subset
method that I could implement for this last case, but subsetting is tricky to get exactly right for all cases, and I'd much rather just hand R a read only dataptr and let it handle the finer details of the subsetting!
Link: https://github.com/wch/r-source/blob/b9c83de5f7c9bb0b6dbe96b6d61a04d195b0cf2e/src/main/subset.c#L137