Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcopy handles formular symbols #524

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

schlichtanders
Copy link

fixes #507

Copy link

codecov bot commented May 2, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 77.87%. Comparing base (1517803) to head (36b0174).

Files Patch % Lines
src/convert/formula.jl 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
- Coverage   80.40%   77.87%   -2.53%     
==========================================
  Files          26       26              
  Lines        1684     1686       +2     
==========================================
- Hits         1354     1313      -41     
- Misses        330      373      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@palday
Copy link
Collaborator

palday commented May 6, 2024

@ararslan thoughts on the behavior here? Essentially it transforms all non-standard evaluation in R (here: implicit conversion of a bare symbol into an expression whose value is that symbol) into a Julia expression. I have the distinct feeling that there are some edge cases where this behavior could be quite wonky.

@schlichtanders if @ararslan is onboard, we would still need tests of this behaviour. ❤️

@@ -38,6 +38,11 @@ end

function rcopy(::Type{FormulaTerm}, l::Ptr{LangSxp})
expr = rcopy(Expr, l)
# special case of a simple variable, like in aes(x, y)
if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means expr is Expr(:call, :~, x), i.e. :(~x), for some x. So this is keeping one-sided formulas from being passed to @formula, which can't handle one-sided formulas anyway. What I don't understand though is the accompanying comment here: "special case of a simple variable, like in aes(x, y)." I don't see how that example applies to this situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the concrete example which fails for me in current RCall. Having an expression like aes(x, y) runs into this formula problem that simple symbols like x and y in this case are not yet supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this example, aes(x, y) appears inside of an R formula? Is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more like aes(x, y) is "lowered" to aes_(~x, ~y) via non standard evaluation because R uses one sided formulas to represent symbols/expressions. (The lowering rule is part of the function definition in ggplot, so it's maybe better described as an implicit macro.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be honest, I don't actually know how RCall works internally. Is the following more or less accurate for this situation? The R surface syntax is aes(x, y) which gets evaluated in R as aes_(~x, ~y), so by the time this expression gets to Julia, Julia sees Expr(:call, :aes_, Expr(:call, :~, :x), Expr(:call, :~, :y)) and tries to turn it into aes_(::FormulaTerm, ::FormulaTerm), but that fails because FormulaTerm doesn't support one-sided formulas. Assuming that's accurate, this change seems safe to me and is arguably a bugfix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the behavior is now

julia> using RCall

R> install.packages("^C

R> library("ggplot2")

julia> R"aes(x, y"^C

R> df <- data.frame(x=1,y=1)

julia> R"~x"
RObject{LangSxp}
~x


julia> rcopy(R"~x")
:(~x)

julia> rcopy(R"~y")
:(~y)

julia> rcopy(R"aes(df, x, y)")
OrderedCollections.OrderedDict{Symbol, Any} with 3 entries:
  :x         => :(~df)
  :y         => :(~x)
  Symbol("") => :(~y)

julia> R"aes(df, x, y)"
RObject{VecSxp}
Aesthetic mapping: 
* `x` -> `df`
* `y` -> `x`
* ``  -> `y`

julia> rcopy(R"y~x")
FormulaTerm
Response:
  y(unknown)
Predictors:
  x(unknown)

julia> rcopy(R"~x+y")
:(~(x + y))

So this PR converts a one-sided formula into a symbol representation of that formula. @ararslan do you have an opinion on whether it would be better to strip the tilde? I'm a bit conflicted myself and could argue for either direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples in mind of when stripping the tilde would not be desirable? If not then stripping it seems sensible to me, but otherwise would there still be a way to recover the original behavior in those cases after this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there are cases where you would want to preserve a raw one-sided formula as an explicit one-sided formula. I can't think of any:

  • you usually don't want to evaluate in R something already copied into Julia without thinking about how to convert it back. In other words, preserving the original literal expression isn't really valuable
  • if you want to create the closest Julia analog to a one-sided formula (either 0 or the left-hand side or a MatrixTerm), then the tilde needs to be stripped to programmatically manipulate the symbols anyway.

Going the other way: the tilde in one-sided formulae in R is generally analogous to Expr in Julia, so it seems having it be Expr in Julia eliminates the need for the tilde.

I guess we can solve the dilemma by adding a note to the documentation that one-sided formulae in R are converted to Expr in Julia without a tilde as a representation of unevaluated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the explanation, that all sounds good to me. I think the documentation aspect will be particularly important in case anybody ends up surprised by this behavior for whatever reason, however unlikely. Consider me on board with this change!

@palday
Copy link
Collaborator

palday commented May 16, 2024

@schlichtanders tests for this can go in https://github.com/JuliaInterop/RCall.jl/blob/master/test/convert/formula.jl.

You can just use my examples from the discussion thread as tests. 😄 If you want to include the ggplot2 bit, then you should do something like

R"""if(!require(ggplot2)){
install.packages("ggplot2")
library(ggplot2)
}""" 

before the any tests depending on ggplot2. This will only install ggplot2 if it's not already present on the system/

@@ -38,6 +38,11 @@ end

function rcopy(::Type{FormulaTerm}, l::Ptr{LangSxp})
expr = rcopy(Expr, l)
# special case of a simple variable, like in aes(x, y)
if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
return expr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return expr
return expr.args[2]

(so that we drop the tilde)

if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
return expr
end
# complex formular
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# complex formular
# complex formula

@@ -38,6 +38,11 @@ end

function rcopy(::Type{FormulaTerm}, l::Ptr{LangSxp})
expr = rcopy(Expr, l)
# special case of a simple variable, like in aes(x, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# special case of a simple variable, like in aes(x, y)
# special case of a non-standard evaluation / a "bare" expression / (implicit) one-sided formula in R, like in `aes(x, y)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ArgumentError: malformed expression in formula
3 participants