-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: AssertionError: length(res) > 0 #3217
Comments
The current code is a bug, but let us discuss what we prefer:
Then for CC @nalimilan |
I think this might be a clearer case (better matching my actual use) where there's maybe a way it could reasonable work: julia> df = DataFrame(:a => Int[]);
julia> f(x) = [(; x=sum(x))]
f (generic function with 1 method)
julia> f(Int[])
1-element Vector{NamedTuple{(:x,), Tuple{Int64}}}:
(x = 0,)
julia> combine(groupby(df, :a), :a => f => AsTable)
ERROR: AssertionError: length(res) > 0
Stacktrace:
[1] _combine(gd::GroupedDataFrame{DataFrame}, cs_norm::Vector{Any}, optional_transform::Vector{Bool}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
@ DataFrames ~/.julia/packages/DataFrames/bza1S/src/groupeddataframe/splitapplycombine.jl:739 Based on #2297 I thought this would evaluate julia> combine(groupby(df, :a), :a => sum)
0×2 DataFrame
Row │ a a_sum
│ Int64 Int64
─────┴──────────────
Anyway, my vote would be for a solution that does not rely on type inference, and throws a clear errors when the output columns cannot be determined. However it would be nice if there was a way to write code like this that can propagate columns in the empty case. Maybe something like calling https://tables.juliadata.org/dev/#Tables.schema to see if the schema can be determined. |
It does. However, normally it is used to derive type of the column, while for Also the issue is that with So this means that we have to hardcode cases when this is doable + decide what to do when this is not doable (throw an error or create zero columns - as in my proposed code above). What do you think? |
Alternatively we could throw an error when we cannot determine target columns:
In general problematic cases are code like this:
In summary: to get it fully right there are many corner cases to consider. For this reason initially we assumed that something should be returned and made an assertion. @nalimilan - this lead me to thinking that the decision in #2297 sometimes can be questionable, e.g.:
|
Relying on inference for this sounds problematic, as a change in the compiler could make code suddenly error (and we don't even provide a workaround to specify expected output types manually). Yet the last example you give shows that using the type of the result when called on an empty group is also problematic. We could add an argument to use that value instead of throwing an error if the user so wishes -- that's pretty obscure, but the error could mention it. I don't really know what else we can do otherwise. I wish there would be a way to detect when literal column names are used to build a named tuple, as it's a common case. DataFramesMeta can special case this kind of pattern as it can access the syntax.
This case throws an error. Am I missing something? |
Now it errors. It would work if we relied on inference. The reason is that then we would identify that column names are What I think we can do currently:
So in summary I propose two actions:
The reason for such a proposal is that I expect that aggregation of empty |
Yeah, I do not require this, I was just confused by the error and thought something was wrong. I refactored my code already to manually handle the empty case. Perhaps this can be added as a caveat for |
If you mean this:
then it will still error. The reason is that we need to capture not only column names but also column types. So to make it clear: empty The question is: is it OK that it errors for multi column output always or should we try to handle this case? |
Ah, I see, OK. I think it is OK and a clear error is better than an obscure one, but it would be a nice to have some clean way to handle the empty case. What if we could do combine(groupby(df, :a), :a => (x -> [(a=x, b=x)]) => AsTable(schema)) where |
After thinking about it more the issue can be cleanly resolved I think using the following code:
The logic is:
This means that we do not rely on type inference, but follow the current documentation: create columns that have keys following keys of the returned values (so in particular if we have no groups and also "dummy" I will need to think about it a bit more and test. |
Fixed in #3222 |
OK, so the idea is that even when we pass zero rows, we pass columns with the right type, so we expect that either the user-provided function returns values of the same type whether there are zero or more rows, or the user doesn't care that the result may have a different type when the input |
Yes - this was the assumption previously (when doing a single column output PR) and we keep it. In other words: we assume that the function the user provided behaves in the same way if it gets data from a group (passed vectors will have positive length) and when it there are no groups (passed vectors will have zero length). |
I see the following on both DataFrames 1.3.6 and 1.4.2:
If the input is wrong, I don't think this should be an AssertionError. Maybe an
ArgumentError
, orErrorException
; assertions should be only for logical errors in the program, not input based errors. xref https://docs.julialang.org/en/v1/base/base/#Base.@assertAlthough in this case I am not sure if it is an actual logic error being caught or an incorrect use of
@assert
and the error should be something else.The text was updated successfully, but these errors were encountered: