Skip to content
This repository was archived by the owner on May 21, 2022. It is now read-only.

Fixes compatibility with StatsBase 0.4 #49

Merged
merged 6 commits into from
Oct 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLDataPattern"
uuid = "9920b226-0b2a-5f5f-9153-9aa70a013f8b"
authors = ["Christof Stocker <stocker.christof@gmail.com>"]
version = "0.5.3"
version = "0.5.4"

[deps]
LearnBase = "7f8f8fb0-2700-5f03-b4bd-41f8cfc144b6"
Expand All @@ -12,7 +12,7 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
DataFrames = "0.20"
LearnBase = "0.2, 0.3"
LearnBase = "0.2, 0.3, 0.4"
MLLabelUtils = "0.4, 0.5"
StatsBase = "0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32, 0.33"
julia = "0.7, 1"
Expand Down
4 changes: 2 additions & 2 deletions docs/documentation/datasubset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ not implement the required interface. We can change that however.

julia> LearnBase.getobs(df::DataFrame, idx) = df[idx,:]

julia> LearnBase.nobs(df::DataFrame) = nrow(df)
julia> StatsBase.nobs(df::DataFrame) = nrow(df)

With those two methods defined, every ``DataFrame`` is a fully
qualified data container. This means that it can now be
Expand Down Expand Up @@ -596,7 +596,7 @@ however, we will also implement a custom method for

julia> using DataTables, LearnBase

julia> LearnBase.nobs(dt::AbstractDataTable) = nrow(dt)
julia> StatsBase.nobs(dt::AbstractDataTable) = nrow(dt)

julia> LearnBase.getobs(dt::AbstractDataTable, idx) = dt[idx,:]

Expand Down
4 changes: 2 additions & 2 deletions docs/documentation/targets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -507,7 +507,7 @@ targets are available in some member variable.

LearnBase.getobs(::DummyDirImageSource, i) = error("expensive computation triggered")

LearnBase.nobs(data::DummyDirImageSource) = length(data.targets)
StatsBase.nobs(data::DummyDirImageSource) = length(data.targets)

Naturally, we would like to avoid calling :func:`getobs` if at
all possible. While we can't avoid calling :func:`getobs` when we
Expand Down Expand Up @@ -571,7 +571,7 @@ Example 1). Furthermore, each observation is itself also a

LearnBase.getobs(df::DataFrame, idx) = df[idx,:]

LearnBase.nobs(df::DataFrame) = nrow(df)
StatsBase.nobs(df::DataFrame) = nrow(df)

Here we are fine with :func:`getobs` being called, since we need
to access the actual ``DataFrame`` anyway. However, we still need
Expand Down
4 changes: 3 additions & 1 deletion src/MLDataPattern.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ using LearnBase
using MLLabelUtils

using LearnBase: ObsDimension
import LearnBase: nobs, getobs, getobs!, gettarget, gettargets, targets, datasubset, default_obsdim
import StatsBase: nobs
import LearnBase: getobs, getobs!, gettarget, gettargets, targets, datasubset, default_obsdim, DataView,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these unexported by LearnBase 0.4?

DataView, AbstractObsView, AbstractBatchView, DataIterator, AbstractDataIterator, ObsIterator, BatchIterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK yes

julia> using LearnBase
julia> DataView
ERROR: UndefVarError: DataView not defined

julia> AbstractObsView
ERROR: UndefVarError: AbstractObsView not defined

julia> AbstractBatchView
ERROR: UndefVarError: AbstractBatchView not defined

julia> DataIterator
ERROR: UndefVarError: DataIterator not defined

julia> AbstractDataIterator
ERROR: UndefVarError: AbstractDataIterator not defined

julia> ObsIterator
ERROR: UndefVarError: ObsIterator not defined

julia> BatchIterator
ERROR: UndefVarError: BatchIterator not defined

julia> LearnBase.BatchIterator
LearnBase.BatchIterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was quite a lot of unexporting in JuliaML/LearnBase.jl@v0.3.0...v0.4.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can export these in JuliaML/LearnBase.jl#44 if that is the preferred end result. Seems like a lot got lost in translation between 0.3 and 0.4.

AbstractObsView, AbstractBatchView, DataIterator, AbstractDataIterator, ObsIterator, BatchIterator

using Base.Cartesian
using Random
Expand Down
2 changes: 1 addition & 1 deletion src/datasubset.jl
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ For `DataSubset` to work on some data structure, the desired type
In what form is up to the user.
Note that `idx` can be of type `Int` or `AbstractVector`.

- `LearnBase.nobs(data::MyType, [obsdim::ObsDimension])` :
- `StatsBase.nobs(data::MyType, [obsdim::ObsDimension])` :
Should return the total number of observations in `data`

The following methods can also be provided and are optional:
Expand Down
5 changes: 2 additions & 3 deletions src/folds.jl
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ For `FoldsView` to work on some data structure, the desired type
In what form is up to the user.
Note that `idx` can be of type `Int` or `AbstractVector`.

- `LearnBase.nobs(data::MyType, [obsdim::ObsDimension])` :
- `StatsBase.nobs(data::MyType, [obsdim::ObsDimension])` :
Should return the total number of observations in `data`

Author(s)
Expand Down Expand Up @@ -167,8 +167,7 @@ end
default_obsdim(iter::FoldsView) = iter.obsdim

function Base.summary(A::FoldsView)
string(length(A), "-fold ", typeof(A).name,
" of ", nobs(A), " observations")
string(length(A), "-fold FoldsView of ", nobs(A), " observations")
end

_datastr(data) = summary(data)
Expand Down
4 changes: 2 additions & 2 deletions src/resample.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ code allows `oversample` to work on a `DataTable`.
```julia
# Make DataTables.jl work
LearnBase.getobs(data::DataTable, i) = data[i,:]
LearnBase.nobs(data::DataTable) = nrow(data)
StatsBase.nobs(data::DataTable) = nrow(data)
```

You can use the parameter `f` to specify how to extract or
Expand Down Expand Up @@ -173,7 +173,7 @@ code allows `undersample` to work on a `DataTable`.
```julia
# Make DataTables.jl work
LearnBase.getobs(data::DataTable, i) = data[i,:]
LearnBase.nobs(data::DataTable) = nrow(data)
StatsBase.nobs(data::DataTable) = nrow(data)
```

You can use the parameter `f` to specify how to extract or
Expand Down
2 changes: 1 addition & 1 deletion src/stratifiedobs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ code allows `stratifiedobs` to work on a `DataTable`.
```julia
# Make DataTables.jl work
LearnBase.getobs(data::DataTable, i) = data[i,:]
LearnBase.nobs(data::DataTable) = nrow(data)
StatsBase.nobs(data::DataTable) = nrow(data)
```

You can use the parameter `f` to specify how to extract or
Expand Down
2 changes: 1 addition & 1 deletion src/targets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ require [`nobs`](@ref) and [`getobs`](@ref) to be defined.
```julia
julia> LearnBase.getobs(data::DataFrame, i) = data[i,:]

julia> LearnBase.nobs(data::DataFrame) = nrow(data)
julia> StatsBase.nobs(data::DataFrame) = nrow(data)

julia> data = DataFrame(X1=rand(3), X2=rand(3), Y=[:a,:b,:a])
3×3 DataFrames.DataFrame
Expand Down
31 changes: 28 additions & 3 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,29 +34,54 @@ Y1 = collect(1:150)
struct EmptyType end

struct CustomType end
LearnBase.nobs(::CustomType) = 100
StatsBase.nobs(::CustomType) = 100
LearnBase.getobs(::CustomType, i::Int) = i
LearnBase.getobs(::CustomType, i::AbstractVector) = collect(i)
LearnBase.gettargets(::CustomType, i::Int) = "obs $i"
LearnBase.gettargets(::CustomType, i::AbstractVector) = "batch $i"

struct CustomStorage end
struct CustomObs{T}; data::T end
LearnBase.nobs(::CustomStorage) = 2
StatsBase.nobs(::CustomStorage) = 2
LearnBase.getobs(::CustomStorage, i) = CustomObs(i)
LearnBase.gettarget(str::String, obs::CustomObs) = "$str - obs $(obs.data)"
LearnBase.gettarget(obs::CustomObs) = "obs $(obs.data)"

struct ObsDimTriggeredException <: Exception end
struct MetaDataStorage end
LearnBase.nobs(::MetaDataStorage) = 3
StatsBase.nobs(::MetaDataStorage) = 3
LearnBase.getobs(::MetaDataStorage, i) = throw(ObsDimTriggeredException())
LearnBase.gettargets(::MetaDataStorage) = "full"
LearnBase.gettargets(::MetaDataStorage, i::Int) = "obs $i"
LearnBase.gettargets(::MetaDataStorage, i::AbstractVector) = "batch $i"

# --------------------------------------------------------------------

function matrix_compat_isequal(ref, actual)
# a over-verbose collection of patterns that we want to ignore during test
patterns = [
# Julia v1.6
"Normed{UInt8,8}" => "N0f8",
r"Array{(\w+),2}" => s"Matrix{\1}",
r"Array{(\w+),1}" => s"Vector{\1}",

# https://github.com/JuliaGraphics/ColorTypes.jl/pull/206
# r"Gray{\w+}\(([\w\.]+)\)" => s"\1",
# r"RGB{\w+}\(([\w\.,]+)\)" => s"RGB(\1)",
]

for p in patterns
actual = replace(actual, p)
ref = replace(ref, p)
end

# Julia v1.4
ref = join(map(strip, split(ref, "\n")), "\n")
actual = join(map(strip, split(actual, "\n")), "\n")

isequal(ref, actual)
end

tests = [
"tst_container.jl"
"tst_datasubset.jl"
Expand Down
2 changes: 1 addition & 1 deletion test/tst_dataframes.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
@testset "DataFrame integration" begin
LearnBase.getobs(df::DataFrame, idx) = df[idx,:]
LearnBase.nobs(df::DataFrame) = nrow(df)
StatsBase.nobs(df::DataFrame) = nrow(df)

@testset "targets" begin
y = [:a,:a,:b,:a,:b]
Expand Down
14 changes: 7 additions & 7 deletions test/tst_dataiterator.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
@test TIter <: LearnBase.AbstractDataIterator
@test TIter <: LearnBase.AbstractObsIterator
end
@test_reference "references/RandomObs1.txt" RandomObs(X)
@test_reference "references/RandomObs2.txt" RandomObs(X, 10)
@test_reference "references/BalancedObs1.txt" BalancedObs(X)
@test_reference "references/BalancedObs2.txt" BalancedObs(X, 10)
@test_reference "references/RandomObs1.txt" RandomObs(X) by=matrix_compat_isequal
@test_reference "references/RandomObs2.txt" RandomObs(X, 10) by=matrix_compat_isequal
@test_reference "references/BalancedObs1.txt" BalancedObs(X) by=matrix_compat_isequal
@test_reference "references/BalancedObs2.txt" BalancedObs(X, 10) by=matrix_compat_isequal

for TIter in (RandomObs, BalancedObs)
@testset "constructor for $TIter" begin
Expand Down Expand Up @@ -130,8 +130,8 @@ end
@test RandomBatches <: LearnBase.BatchIterator
@test RandomBatches <: LearnBase.AbstractDataIterator
@test RandomBatches <: LearnBase.AbstractBatchIterator
@test_reference "references/RandomBatches1.txt" RandomBatches(X)
@test_reference "references/RandomBatches2.txt" RandomBatches(X,10,10)
@test_reference "references/RandomBatches1.txt" RandomBatches(X) by=matrix_compat_isequal
@test_reference "references/RandomBatches2.txt" RandomBatches(X,10,10) by=matrix_compat_isequal

@testset "constructor" begin
A = @inferred RandomBatches(rand(2,5), 10)
Expand Down Expand Up @@ -259,7 +259,7 @@ end
@testset "BufferGetObs" begin
@testset "ObsView" begin
A = BufferGetObs(ObsView(X))
@test_reference "references/BufferGetObs1.txt" A
@test_reference "references/BufferGetObs1.txt" A by=matrix_compat_isequal

@test size(A.buffer) == (4,)
@test typeof(A.buffer) <: Array{Float64,1}
Expand Down
4 changes: 2 additions & 2 deletions test/tst_datasubset.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@
@testset "Array, SubArray, SparseArray" begin
@test nobs(DataSubset(X, obsdim = 1)) == 4
@test nobs(DataSubset(X, 1:3, obsdim = 1)) == 3
@test_reference "references/DataSubset1.txt" DataSubset(X, Int64(1):Int64(nobs(X)))
@test_reference "references/DataSubset2.txt" @io2str showcompact(::IO, DataSubset(X))
@test_reference "references/DataSubset1.txt" DataSubset(X, Int64(1):Int64(nobs(X))) by=matrix_compat_isequal
@test_reference "references/DataSubset2.txt" @io2str(showcompact(::IO, DataSubset(X))) by=matrix_compat_isequal
for var in (Xs, ys, vars...)
subset = @inferred(DataSubset(var))
@test subset.data === var
Expand Down
16 changes: 8 additions & 8 deletions test/tst_dataview.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
@testset "ObsView" begin
@test ObsView <: AbstractVector
@test ObsView <: DataView
@test ObsView <: AbstractObsView
@test ObsView <: AbstractObsIterator
@test ObsView <: AbstractDataIterator
@test ObsView <: LearnBase.DataView
@test ObsView <: LearnBase.AbstractObsView
@test ObsView <: LearnBase.AbstractObsIterator
@test ObsView <: LearnBase.AbstractDataIterator
@test obsview === ObsView

@testset "constructor" begin
Expand Down Expand Up @@ -169,10 +169,10 @@ end

@testset "BatchView" begin
@test BatchView <: AbstractVector
@test BatchView <: DataView
@test BatchView <: AbstractBatchView
@test BatchView <: AbstractBatchIterator
@test BatchView <: AbstractDataIterator
@test BatchView <: LearnBase.DataView
@test BatchView <: LearnBase.AbstractBatchView
@test BatchView <: LearnBase.AbstractBatchIterator
@test BatchView <: LearnBase.AbstractDataIterator
@test batchview == BatchView
@test_throws MethodError oversample(BatchView(X))
@test_throws MethodError undersample(BatchView(X))
Expand Down
6 changes: 3 additions & 3 deletions test/tst_folds.jl
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ println("<HEARTBEAT>")

@testset "FoldsView constructor" begin
@test FoldsView <: AbstractVector
@test FoldsView <: DataView
@test FoldsView{Tuple} <: DataView{Tuple}
@test FoldsView <: LearnBase.DataView
@test FoldsView{Tuple} <: LearnBase.DataView{Tuple}

@test_reference "references/FoldsView.txt" @io2str show(::IO, MIME"text/plain"(), kfolds(rand(10),k=5))
@test_reference "references/FoldsView.txt" @io2str(show(::IO, MIME"text/plain"(), kfolds(rand(10),k=5))) by=matrix_compat_isequal

@testset "Illegal arguments" begin
# fold indices out of bounds for the given data
Expand Down
12 changes: 6 additions & 6 deletions test/tst_slidingwindow.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
@testset "UnlabeledSlidingWindow" begin
@test_throws UndefVarError UnlabeledSlidingWindow
@test MLDataPattern.UnlabeledSlidingWindow <: AbstractVector
@test MLDataPattern.UnlabeledSlidingWindow <: DataView
@test !(MLDataPattern.UnlabeledSlidingWindow <: AbstractObsIterator)
@test !(MLDataPattern.UnlabeledSlidingWindow <: AbstractBatchIterator)
@test MLDataPattern.UnlabeledSlidingWindow <: LearnBase.DataView
@test !(MLDataPattern.UnlabeledSlidingWindow <: LearnBase.AbstractObsIterator)
@test !(MLDataPattern.UnlabeledSlidingWindow <: LearnBase.AbstractBatchIterator)

@testset "constructor" begin
@test_throws DimensionMismatch slidingwindow((rand(2,10),rand(9)), 1)
Expand Down Expand Up @@ -180,9 +180,9 @@ end
@testset "LabeledSlidingWindow" begin
@test_throws UndefVarError LabeledSlidingWindow
@test MLDataPattern.LabeledSlidingWindow <: AbstractVector
@test MLDataPattern.LabeledSlidingWindow <: DataView
@test !(MLDataPattern.LabeledSlidingWindow <: AbstractObsIterator)
@test !(MLDataPattern.LabeledSlidingWindow <: AbstractBatchIterator)
@test MLDataPattern.LabeledSlidingWindow <: LearnBase.DataView
@test !(MLDataPattern.LabeledSlidingWindow <: LearnBase.AbstractObsIterator)
@test !(MLDataPattern.LabeledSlidingWindow <: LearnBase.AbstractBatchIterator)

@testset "constructor" begin
@test_throws DimensionMismatch slidingwindow(i->i, (rand(2,10),rand(9)), 1)
Expand Down