-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add sample function for DataFrames #997
Conversation
src/other/sample.jl
Outdated
""" | ||
sample(df[, N]) | ||
|
||
Returns a (random) sample of a DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer something like "Returns a (random) sample of rows from a DataFrame" to be explicit about the selección of rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but say "of N
rows".
Thanks for doing this @scls19fr ! :) Test are missing, It would be great to have some tests for this functions. |
src/other/sample.jl
Outdated
``` | ||
julia> using RDatasets | ||
julia> iris = dataset("datasets", "iris") | ||
julia> sample(iris, 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a call to srand(1)
to ensure the results are always the same.
Thanks. Please add tests for the new feature, and add the function to the list of exports. |
I haven't add sample to |
src/other/sample.jl
Outdated
@@ -0,0 +1,29 @@ | |||
import StatsBase: sample | |||
|
|||
function sample(df::AbstractDataFrame; replace::Bool=true, ordered::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do N::Integer=1
in the definition below, and you'll get this one for free.
86b4c36
to
dc436ee
Compare
src/other/sample.jl
Outdated
""" | ||
sample(df[, n]) | ||
|
||
Draw a random sample of `n` rows from a data frame `df` and return the result as a data frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ending dot.
Any opinions about the opportunity of adding this function? |
│ 5 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ "virginica" │ | ||
``` | ||
""" | ||
function sample(df::AbstractDataFrame, n::Integer=1; replace::Bool=true, ordered::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should extend the function from StatsBase, i.e. function StatsBase.sample(...)
I think this could be useful, but needs a rebase. |
I am closing this since we removed StatsBase.jl dependency in DataFrames.jl. Given the relation of cost of materializing a Please reopen if you disagree. |
It would still be kind of nice to be able to do |
Exactly, |
What about for groupby objects?
|
There are two issues here:
In short to sample subgroups now you should write |
IIRC we concluded that subtyping |
Thank you. I just wanted to make sure with |
Might fix JuliaStats/StatsBase.jl#170