Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflow with stackdf #1251

Closed
cjprybol opened this issue Oct 11, 2017 · 10 comments
Closed

StackOverflow with stackdf #1251

cjprybol opened this issue Oct 11, 2017 · 10 comments

Comments

@cjprybol
Copy link
Contributor

julia> using DataFrames, CSV

julia> iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"));

julia> d = stackdf(iris);
ERROR: StackOverflowError:
Stacktrace:
 [1] zero(::Type{Any}) at /Users/Cameron/.julia/v0.6/Nulls/src/Nulls.jl:70 (repeats 80000 times)
@nalimilan
Copy link
Member

Works here. Are you on master of both DataFrames and Nulls?

@nalimilan nalimilan added this to the 0.11 milestone Oct 11, 2017
@cjprybol
Copy link
Contributor Author

Yea, still the same issue

julia> Pkg.update()
... skipping for brevity...

julia> Pkg.status("Nulls")
 - Nulls                         0.1.1              master

julia> Pkg.status("DataFrames")
 - DataFrames                    0.10.1+            master

julia> using DataFrames, CSV
INFO: Recompiling stale cache file /Users/Cameron/.julia/lib/v0.6/DataFrames.ji for module DataFrames.
INFO: Recompiling stale cache file /Users/Cameron/.julia/lib/v0.6/CSV.ji for module CSV.

julia> iris = CSV.read(joinpath(Pkg.dir("DataFrames"), "test/data/iris.csv"));

julia> d = stackdf(iris);
ERROR: StackOverflowError:
Stacktrace:
 [1] zero(::Type{Any}) at /Users/Cameron/.julia/v0.6/Nulls/src/Nulls.jl:70 (repeats 80000 times)

@nalimilan
Copy link
Member

nalimilan commented Oct 11, 2017

OK, that only happens when using CSV from master, because it happens when there is no numeric column in the data frame. MWE:

show(DataFrames.StackedVector(Any[]))
# Or:
show(DataFrames.StackedVector(Int[]))

EDIT: even better

size(DataFrames.StackedVector(Int[]))
sum(map(length, DataFrames.StackedVector(Int[]).components))

@nalimilan
Copy link
Member

OK, that's not new. With 0.10.1:

julia> stackdf(df)
ERROR: StackOverflowError:
Stacktrace:
 [1] zero(::Type{Any}) at /home/milan/.julia/DataArrays/src/natype.jl:71 (repeats 80000 times)

We just need to ensure we always return 0 even when the array is empty. Something like this could do the trick:

Base.length(v::StackedVector) = mapreduce(length, +, 0, v.components)

@nalimilan nalimilan removed this from the 0.11 milestone Oct 11, 2017
@cjprybol
Copy link
Contributor Author

cjprybol commented Oct 12, 2017

I'm not sure this is an issue with StackedVector actually?

julia> sum(Any[])
ERROR: StackOverflowError:
Stacktrace:
 [1] zero(::Type{Any}) at /Users/Cameron/.julia/v0.6/Nulls/src/Nulls.jl:70 (repeats 80000 times)

Do DataArrays and Nulls.jl have the same type piracy semantics e.g. JuliaData/Missings.jl#37 Any ideas why this is getting called even though the Array is not a Union{T, Null}? Ok, this tipped us off to another issue, see JuliaData/Missings.jl#44. I think you're original proposal that we just need to make sure we return 0 should still work

cjprybol added a commit to JuliaData/Missings.jl that referenced this issue Oct 12, 2017
Base only provides `zero(::Type{T}) where T<:Number`, not `zero(::Type{T}) where T`, and our method results in StackOverflows when called with `Any` when it should return an error. See JuliaData/DataFrames.jl#1251
@cjprybol
Copy link
Contributor Author

cjprybol commented Oct 12, 2017

Implementing

Base.length(v::StackedVector) = mapreduce(length, +, 0, v.components)

fixes all but

sum(map(length, DataFrames.StackedVector(Int[])))

and I'm not sure that we can do anything about this one because at that point we'll be using Base types (because the component Arrays won't be StackedVector)

sum(map(length, DataFrames.StackedVector(Int[]).components))

Any ideas as to why map(length, ::StackedVector) is missing a value when length(::StackedVector) gives the correct answer?

julia> length(DataFrames.StackedVector(Int[]))
0

julia> map(length, DataFrames.StackedVector(Int[]))
0-element Array{Any,1}

@nalimilan
Copy link
Member

We don't care about sum(map(length, DataFrames.StackedVector(Int[]).components)), there's no reason why it should work AFAICT. Anyway that's an internal type.

I don't see what you mean by "missing a value". The result of map looks OK.

@cjprybol
Copy link
Contributor Author

cjprybol commented Dec 1, 2017

Closing because the original issue was resolved by JuliaData/Missings.jl#44

@cjprybol cjprybol closed this as completed Dec 1, 2017
@nalimilan
Copy link
Member

Maybe we should add a test case?

@nalimilan nalimilan reopened this Dec 2, 2017
@cjprybol cjprybol self-assigned this Dec 3, 2017
@cjprybol cjprybol removed their assignment Jan 2, 2018
@bkamins
Copy link
Member

bkamins commented Dec 14, 2018

Is this still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants