-
Notifications
You must be signed in to change notification settings - Fork 1k
Update Documentation for env
Parameter Usage*
#6360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
LGTM |
I removed the DT definition since there are already other DT definitions that we can use for this example. |
Thanks for the PR! Actually, for addressing #6323 I had in mind extending the programming vignette, rather than Although as Toby points out, it looks like this is the first example using |
sure, i m updating the programming vignette now. |
Direct usage of I can imagine it could even potentially fail on trying to build big expression in j. The only reason to use function name as a symbol rather than string is when we want to inject function body into expression. I see no reasonable use case for that. Definitly worth to add some examples to manual as well. Now it was only redirecting to vignette and substitute2 manual. |
by the way Jan it is really super convenient how you can substitute names via substitute2 / env, thanks for implementing that. |
@Nj221102 please also add discussion/example to programming vignette, maybe under |
Yes that was the "new" thing vs. substitute or tidy way of meta programming. Afair that was the only part that required to be coded in C. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6360 +/- ##
=========================================
Coverage ? 98.34%
=========================================
Files ? 81
Lines ? 14927
Branches ? 0
=========================================
Hits ? 14680
Misses ? 247
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Elaborated in the original issue (#6323) on why I continue to advocate for direct usage of For now, I've opted to just put a minimal mention of that to resolve this issue. Beyond that it's something we can continue to take into consideration as we observe user reports/our own usage. |
I think there was some misconception at the start of this PR.
#6026 says
For a good reason, because 99.5% of the time we will not want to substitute data.table expression using body of the function.
Therefore I don't think it is reasonable to document that as a way to do it. I will submit an edit to this PR in a moment which should make everything more clear. |
Mention of f = function() sys.call()
do.call("f",list())
#f()
do.call(f,list())
#(function() sys.call())() and imagine your |
As written now, I hope it will prevent any misconception on that matter in future. |
|
||
If function name to be substituted needs to be namespace-qualified then namespace and function name can be substituted as any other symbol in the expression: | ||
```{r substitute_fun3, result='hide'} | ||
DT[, ns::fun(Sepal.Length), env = list(ns="base", fun="sqrt"), verbose=TRUE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one in particular feels quite unnatural to me -- almost never do we want to separate the namespace from its export. I can't think of any use case for that. and it gets us even further from one of my main points about do.call()
-- using symbols makes the dependency structure of the code much clearer.
let's drop this example. I think the above section is a nice addition -- we can revisit the point about namespace-qualified inputs as we gather more organic use cases in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can as well be base::fun(col), env=list(fun="sqrt")
And also nsfun(col), env=call("::", as.name("base"), as.name("sqrt"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even parse(text="base::sqrt") which I would prefer not to mention in our docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using symbols makes the dependency structure of the code much clearer.
The problem is that it changes the code.
Symbols should be used, and should be preferred but then why to use them in env? They don't need substation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nsfun(col), env=call("::", as.name("base"), as.name("sqrt"))
Or even parse(text="base::sqrt") which I would prefer not to mention in our docs
these days str2lang("base::sqrt")
would be better and is sort of "accepted"; anyway, now I think you're hitting on something more interesting: fun(col); env = list(fun = quote(base::sqrt))
is pretty natural and I think solves all of my concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a situation when an i
/j
/by
-expression would need to be evaluated in a completely different context from where it had been written? Would a user normally want to access a function or a piece of data from an environment that's neither the data.table
itself nor the frame of the caller of [
? Similar to what groupingsets
does, hiding its own frame from [
but making the caller's frame available, except with the query itself, not the overall call to [
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DT[, fun(col), env = list(fun = quote(base::sqrt))]
It is then more correct to write
DT[, base::sqrt(col)]
As there is nothing to substitute. No advantage of using env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aitap don't really know now without a nice example showing the issue, env is force evaluated, i/j/by after substation should in theory behave as written by hand, so scoping should not make any difference than [ written by hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is then more correct to write
Right, in the simple case... I am still fairly certain this is needed for more generality. I will let this percolate for some time. In the meantime, let's just drop the ugly ns::fun
example which I don't think is useful & we can merge this?
closes #6323
Description:
This PR updates the documentation for the
env
parameter in thedata.table
package to include new functionality regarding function injection. Initially, the documentation update was focused solely on using function names as strings (e.g.,"sum"
), as requested in issue #6323. However, following the merge of PR #6026 which allows the direct use of function objects (e.g.,sum
). This PR updates the documentation to reflect both approaches.Changes:
Documentation Update:
env
parameter:"sum"
).sum
).Example Block Addition: