Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReservoirSample with generic type #237

Merged
merged 1 commit into from
Sep 29, 2021

Conversation

brucala
Copy link
Contributor

@brucala brucala commented Sep 28, 2021

Currently ReservoirSample only accept Number types and implements a constructor based on zeros.

This PR changes ReservoirSample so it can accept any generic type T, and implements a constructor for AbstractString types using empty strings. Other types that implement the method zero (e.g. TimeTypes) will work.

@joshday
Copy link
Owner

joshday commented Sep 29, 2021

Cool, thanks!

@joshday joshday merged commit e3fd4f0 into joshday:master Sep 29, 2021
@joshday
Copy link
Owner

joshday commented Sep 29, 2021

FYI: This PR got me thinking and I just made the following change so we don't need to add constructors to manage different types:

ReservoirSample(k::Int, T::Type = Float64) = ReservoirSample(Vector{T}(undef, k), 0)

value(o::ReservoirSample) = nobs(o) < length(o.value) ? o.value[1:nobs(o)] : o.value

@brucala
Copy link
Contributor Author

brucala commented Sep 29, 2021

FYI: This PR got me thinking and I just made the following change so we don't need to add constructors to manage different types:

ReservoirSample(k::Int, T::Type = Float64) = ReservoirSample(Vector{T}(undef, k), 0)

value(o::ReservoirSample) = nobs(o) < length(o.value) ? o.value[1:nobs(o)] : o.value

Good point. Actually I initially tried a generic constructor with Vector{T}(undef, k), however I got some weird errors that I didn't understand, and went to the PR proposal.
A minimal working example to reproduce the error:

julia> Series(ReservoirSample(5, String))

I now see that the error (message copied below) has to do with the show function and that there isn't anything wrong with ReservoirSample. For instance this works:

julia> fit!(Series(ReservoirSample(5, String)), string.('a':'e'))  # filling all k elements of the reservoir sample

However, I have no idea how to solve the show function for this corner case.


Error showing value of type Series{String, Tuple{ReservoirSample{String}}}:
ERROR: UndefRefError: access to undefined reference
Stacktrace:
  [1] getindex
    @ ./array.jl:801 [inlined]
  [2] iterate (repeats 2 times)
    @ ./array.jl:777 [inlined]
  [3] _zip_iterate_some
    @ ./iterators.jl:368 [inlined]
  [4] _zip_iterate_all
    @ ./iterators.jl:360 [inlined]
  [5] iterate
    @ ./iterators.jl:350 [inlined]
  [6] ==(A::Vector{String}, B::Vector{String})
    @ Base ./abstractarray.jl:1996
  [7] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [8] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [9] #19
    @ ./broadcast.jl:1098 [inlined]
 [10] ntuple
    @ ./ntuple.jl:49 [inlined]
 [11] copy(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(==), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{ReservoirSample{String}}, Tuple{Symbol, Symbol}}}, Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{ReservoirSample{String}}, Tuple{Symbol, Symbol}}}}})
    @ Base.Broadcast ./broadcast.jl:1098
 [12] materialize
    @ ./broadcast.jl:883 [inlined]
 [13] ==(o1::ReservoirSample{String}, o2::ReservoirSample{String})
    @ OnlineStatsBase ~/.julia/packages/OnlineStatsBase/flenY/src/OnlineStatsBase.jl:67
 [14] _eq(t1::Tuple{ReservoirSample{String}}, t2::Tuple{ReservoirSample{String}})
    @ Base ./tuple.jl:366
 [15] ==
    @ ./tuple.jl:362 [inlined]
 [16] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
 [17] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
 [18] #19
    @ ./broadcast.jl:1098 [inlined]
 [19] ntuple
    @ ./ntuple.jl:48 [inlined]
 [20] copy(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(==), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{Series{String, Tuple{ReservoirSample{String}}}}, Tuple{Symbol}}}, Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{Series{String, Tuple{ReservoirSample{String}}}}, Tuple{Symbol}}}}})
    @ Base.Broadcast ./broadcast.jl:1098
 [21] materialize
    @ ./broadcast.jl:883 [inlined]
 [22] ==(o1::Series{String, Tuple{ReservoirSample{String}}}, o2::Series{String, Tuple{ReservoirSample{String}}})
    @ OnlineStatsBase ~/.julia/packages/OnlineStatsBase/flenY/src/OnlineStatsBase.jl:67
 [23] !=(x::Series{String, Tuple{ReservoirSample{String}}}, y::Series{String, Tuple{ReservoirSample{String}}})
    @ Base ./operators.jl:204
 [24] _print_tree(printnode::typeof(AbstractTrees.printnode), io::IOContext{Base.TTY}, tree::Series{String, Tuple{ReservoirSample{String}}}; maxdepth::Int64, indicate_truncation::Bool, depth::Int64, active_levels::Vector{Int64}, charset::AbstractTrees.TreeCharSet, withinds::Bool, inds::Vector{Any}, from::Nothing, to::Nothing, roottree::Series{String, Tuple{ReservoirSample{String}}})
    @ AbstractTrees ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:132
 [25] _print_tree
    @ ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:127 [inlined]
 [26] #print_tree#5
    @ ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:173 [inlined]
 [27] print_tree
    @ ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:173 [inlined]
 [28] #print_tree#7
    @ ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:180 [inlined]
 [29] print_tree
    @ ~/.julia/packages/AbstractTrees/oHb1F/src/printing.jl:180 [inlined]
 [30] show
    @ ~/.julia/packages/OnlineStatsBase/flenY/src/OnlineStatsBase.jl:40 [inlined]
 [31] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, x::Series{String, Tuple{ReservoirSample{String}}})
    @ Base.Multimedia ./multimedia.jl:47
 [32] display(d::REPL.REPLDisplay{REPL.LineEditREPL}, mime::MIME{Symbol("text/plain")}, x::Series{String, Tuple{ReservoirSample{String}}})
    @ OhMyREPL ~/.julia/packages/OhMyREPL/07uNa/src/output_prompt_overwrite.jl:8
 [33] display(d::REPL.REPLDisplay, x::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:225
 [34] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [35] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [36] invokelatest
    @ ./essentials.jl:706 [inlined]
 [37] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:247
 [38] (::REPL.var"#40#41"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:231
 [39] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:462
 [40] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:229
 [41] (::REPL.var"#do_respond#61"{Bool, Bool, REPL.var"#72#82"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:798
 [42] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [43] invokelatest
    @ ./essentials.jl:706 [inlined]
 [44] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
 [45] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
 [46] (::REPL.var"#44#49"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:411

@brucala
Copy link
Contributor Author

brucala commented Sep 29, 2021

And another issue created by the undef constructor:

julia> ReservoirSample(5, String) == ReservoirSample(5, String)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
  [1] getindex
    @ ./array.jl:801 [inlined]
  [2] iterate (repeats 2 times)
    @ ./array.jl:777 [inlined]
  [3] _zip_iterate_some
    @ ./iterators.jl:368 [inlined]
  [4] _zip_iterate_all
    @ ./iterators.jl:360 [inlined]
  [5] iterate
    @ ./iterators.jl:350 [inlined]
  [6] ==(A::Vector{String}, B::Vector{String})
    @ Base ./abstractarray.jl:1996
  [7] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [8] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [9] #19
    @ ./broadcast.jl:1098 [inlined]
 [10] ntuple
    @ ./ntuple.jl:49 [inlined]
 [11] copy(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(==), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{ReservoirSample{String}}, Tuple{Symbol, Symbol}}}, Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getfield), Tuple{Base.RefValue{ReservoirSample{String}}, Tuple{Symbol, Symbol}}}}})
    @ Base.Broadcast ./broadcast.jl:1098
 [12] materialize
    @ ./broadcast.jl:883 [inlined]
 [13] ==(o1::ReservoirSample{String}, o2::ReservoirSample{String})
    @ OnlineStatsBase ~/.julia/packages/OnlineStatsBase/flenY/src/OnlineStatsBase.jl:67
 [14] top-level scope
    @ REPL[3]:1

@joshday
Copy link
Owner

joshday commented Sep 29, 2021

Ah, thanks for the info!

Looks like the issue is with AbstractTrees, which is used to print Series. I just pushed some new changes to remove undef altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants