From c9d1d0e0d87914c088ecf897d1b6d4e41e742112 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Thu, 25 Apr 2024 07:44:46 +0000 Subject: [PATCH] build based on 4e25f43 --- dev/counts/index.html | 8 ++++---- dev/cov/index.html | 4 ++-- dev/deviation/index.html | 2 +- dev/empirical/index.html | 4 ++-- dev/index.html | 2 +- dev/misc/index.html | 8 ++++---- dev/multivariate/index.html | 2 +- dev/ranking/index.html | 2 +- dev/robust/index.html | 4 ++-- dev/sampling/index.html | 4 ++-- dev/scalarstats/index.html | 18 +++++++++--------- dev/search/index.html | 2 +- dev/search_index.js | 2 +- dev/signalcorr/index.html | 2 +- dev/statmodels/index.html | 2 +- dev/transformations/index.html | 6 +++--- dev/weights/index.html | 6 +++--- 17 files changed, 39 insertions(+), 39 deletions(-) diff --git a/dev/counts/index.html b/dev/counts/index.html index 5415835c..aee252b1 100644 --- a/dev/counts/index.html +++ b/dev/counts/index.html @@ -1,7 +1,7 @@ Counting Functions · StatsBase.jl

Counting Functions

The package provides functions to count the occurrences of distinct values.

Counting over an Integer Range

StatsBase.countsFunction
counts(x, [wv::AbstractWeights])
 counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
-counts(x, k::Integer, [wv::AbstractWeights])

Count the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

The output is a vector of length length(levels).

source
StatsBase.proportionsFunction
proportions(x, levels=span(x), [wv::AbstractWeights])

Return the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
proportions(x, k::Integer, [wv::AbstractWeights])

Return the proportion of integers in 1 to k that occur in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])

Add the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

source

Counting over arbitrary distinct values

StatsBase.countmapFunction
countmap(x; alg = :auto)
-countmap(x::AbstractVector, wv::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its number of occurrences.

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source
StatsBase.proportionmapFunction
proportionmap(x)
-proportionmap(x::AbstractVector, w::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its proportion in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(dict, x; alg = :auto)
-addcounts!(dict, x, wv)

Add counts based on x to a count map. New entries will be added if new values come up.

If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source
+counts(x, k::Integer, [wv::AbstractWeights])

Count the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

The output is a vector of length length(levels).

source
StatsBase.proportionsFunction
proportions(x, levels=span(x), [wv::AbstractWeights])

Return the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
proportions(x, k::Integer, [wv::AbstractWeights])

Return the proportion of integers in 1 to k that occur in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])

Add the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

source

Counting over arbitrary distinct values

StatsBase.countmapFunction
countmap(x; alg = :auto)
+countmap(x::AbstractVector, wv::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its number of occurrences.

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source
StatsBase.proportionmapFunction
proportionmap(x)
+proportionmap(x::AbstractVector, w::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its proportion in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source
StatsBase.addcounts!Method
addcounts!(dict, x; alg = :auto)
+addcounts!(dict, x, wv)

Add counts based on x to a count map. New entries will be added if new values come up.

If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

  • :auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.

  • :radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.

  • :dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.

source
diff --git a/dev/cov/index.html b/dev/cov/index.html index d1f29057..16113e6d 100644 --- a/dev/cov/index.html +++ b/dev/cov/index.html @@ -1,5 +1,5 @@ -Scatter Matrix and Covariance · StatsBase.jl

Scatter Matrix and Covariance

This package implements functions for computing scatter matrix, as well as weighted covariance matrix.

StatsBase.scattermatFunction
scattermat(X, [wv::AbstractWeights]; mean=nothing, dims=1)

Compute the scatter matrix, which is an unnormalized covariance matrix. A weighting vector wv can be specified to weight the estimate.

Arguments

  • mean=nothing: a known mean value. nothing indicates that the mean is unknown, and the function will compute the mean. Specifying mean=0 indicates that the data are centered and hence there's no need to subtract the mean.
  • dims=1: the dimension along which the variables are organized. When dims = 1, the variables are considered columns with observations in rows; when dims = 2, variables are in rows with observations in columns.
source
Statistics.covFunction
cov(X, w::AbstractWeights, vardim=1; mean=nothing, corrected=false)

Compute the weighted covariance matrix. Similar to var and std the biased covariance matrix (corrected=false) is computed by multiplying scattermat(X, w) by $\frac{1}{\sum{w}}$ to normalize. However, the unbiased covariance matrix (corrected=true) is dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
Statistics.covMethod
cov(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute a variance estimate from the observation vector x using the estimator ce.

source
Statistics.covMethod
cov(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)

Compute the covariance of the vectors x and y using estimator ce.

source
Statistics.covMethod
cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights]; mean=nothing, dims::Int=1)

Compute the covariance matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:

  • nothing (default) in which case the mean is estimated and subtracted from the data X,
  • a precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:
    • when dims=1, an AbstractMatrix of size (1,M),
    • when dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).
source
Statistics.varMethod
var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the variance of the vector x using the estimator ce.

source
Statistics.stdMethod
std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the standard deviation of the vector x using the estimator ce.

source
Statistics.corFunction
cor(X, w::AbstractWeights, dims=1)

Compute the Pearson correlation matrix of X along the dimension dims with a weighting w .

source
cor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)

Compute the correlation of the vectors x and y using estimator ce.

source
cor(
+Scatter Matrix and Covariance · StatsBase.jl        
         
 

Scatter Matrix and Covariance

This package implements functions for computing scatter matrix, as well as weighted covariance matrix.

StatsBase.scattermatFunction
scattermat(X, [wv::AbstractWeights]; mean=nothing, dims=1)

Compute the scatter matrix, which is an unnormalized covariance matrix. A weighting vector wv can be specified to weight the estimate.

Arguments

  • mean=nothing: a known mean value. nothing indicates that the mean is unknown, and the function will compute the mean. Specifying mean=0 indicates that the data are centered and hence there's no need to subtract the mean.
  • dims=1: the dimension along which the variables are organized. When dims = 1, the variables are considered columns with observations in rows; when dims = 2, variables are in rows with observations in columns.
source
Statistics.covFunction
cov(X, w::AbstractWeights, vardim=1; mean=nothing, corrected=false)

Compute the weighted covariance matrix. Similar to var and std the biased covariance matrix (corrected=false) is computed by multiplying scattermat(X, w) by $\frac{1}{\sum{w}}$ to normalize. However, the unbiased covariance matrix (corrected=true) is dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
Statistics.covMethod
cov(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute a variance estimate from the observation vector x using the estimator ce.

source
Statistics.covMethod
cov(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)

Compute the covariance of the vectors x and y using estimator ce.

source
Statistics.covMethod
cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights]; mean=nothing, dims::Int=1)

Compute the covariance matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:

  • nothing (default) in which case the mean is estimated and subtracted from the data X,
  • a precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:
    • when dims=1, an AbstractMatrix of size (1,M),
    • when dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).
source
Statistics.varMethod
var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the variance of the vector x using the estimator ce.

source
Statistics.stdMethod
std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the standard deviation of the vector x using the estimator ce.

source
Statistics.corFunction
cor(X, w::AbstractWeights, dims=1)

Compute the Pearson correlation matrix of X along the dimension dims with a weighting w .

source
cor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)

Compute the correlation of the vectors x and y using estimator ce.

source
cor(
     ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];
     mean=nothing, dims::Int=1
-)

Compute the correlation matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:

  • nothing (default) in which case the mean is estimated and subtracted from the data X,
  • a precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:
    • when dims=1, an AbstractMatrix of size (1,M),
    • when dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).
source
StatsBase.mean_and_covFunction
mean_and_cov(x, [wv::AbstractWeights,] vardim=1; corrected=false) -> (mean, cov)

Return the mean and covariance matrix as a tuple. A weighting vector wv can be specified. vardim that designates whether the variables are columns in the matrix (1) or rows (2). Finally, bias correction is applied to the covariance calculation if corrected=true. See cov documentation for more details.

source
StatsBase.cov2corFunction
cov2cor(C::AbstractMatrix, [s::AbstractArray])

Compute the correlation matrix from the covariance matrix C and, optionally, a vector of standard deviations s. Use StatsBase.cov2cor! for an in-place version.

source
StatsBase.cor2covFunction
cor2cov(C, s)

Compute the covariance matrix from the correlation matrix C and a vector of standard deviations s. Use StatsBase.cor2cov! for an in-place version.

source
StatsBase.SimpleCovarianceType
SimpleCovariance(;corrected::Bool=false)

Simple covariance estimator. Estimation calls cov(x; corrected=corrected), cov(x, y; corrected=corrected) or cov(X, w, dims; corrected=corrected) where x, y are vectors, X is a matrix and w is a weighting vector.

source
+)

Compute the correlation matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:

  • nothing (default) in which case the mean is estimated and subtracted from the data X,
  • a precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:
    • when dims=1, an AbstractMatrix of size (1,M),
    • when dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).
source
StatsBase.mean_and_covFunction
mean_and_cov(x, [wv::AbstractWeights,] vardim=1; corrected=false) -> (mean, cov)

Return the mean and covariance matrix as a tuple. A weighting vector wv can be specified. vardim that designates whether the variables are columns in the matrix (1) or rows (2). Finally, bias correction is applied to the covariance calculation if corrected=true. See cov documentation for more details.

source
StatsBase.cov2corFunction
cov2cor(C::AbstractMatrix, [s::AbstractArray])

Compute the correlation matrix from the covariance matrix C and, optionally, a vector of standard deviations s. Use StatsBase.cov2cor! for an in-place version.

source
StatsBase.cor2covFunction
cor2cov(C, s)

Compute the covariance matrix from the correlation matrix C and a vector of standard deviations s. Use StatsBase.cor2cov! for an in-place version.

source
StatsBase.SimpleCovarianceType
SimpleCovariance(;corrected::Bool=false)

Simple covariance estimator. Estimation calls cov(x; corrected=corrected), cov(x, y; corrected=corrected) or cov(X, w, dims; corrected=corrected) where x, y are vectors, X is a matrix and w is a weighting vector.

source
diff --git a/dev/deviation/index.html b/dev/deviation/index.html index 655deda9..7855cbd3 100644 --- a/dev/deviation/index.html +++ b/dev/deviation/index.html @@ -1,2 +1,2 @@ -Computing Deviations · StatsBase.jl

Computing Deviations

This package provides functions to compute various deviations between arrays in a variety of ways:

StatsBase.counteqFunction
counteq(a, b)

Count the number of indices at which the elements of the arrays a and b are equal.

source
StatsBase.countneFunction
countne(a, b)

Count the number of indices at which the elements of the arrays a and b are not equal.

source
StatsBase.sqL2distFunction
sqL2dist(a, b)

Compute the squared L2 distance between two arrays: $\sum_{i=1}^n |a_i - b_i|^2$. Efficient equivalent of sum(abs2, a - b).

source
StatsBase.L2distFunction
L2dist(a, b)

Compute the L2 distance between two arrays: $\sqrt{\sum_{i=1}^n |a_i - b_i|^2}$. Efficient equivalent of sqrt(sum(abs2, a - b)).

source
StatsBase.L1distFunction
L1dist(a, b)

Compute the L1 distance between two arrays: $\sum_{i=1}^n |a_i - b_i|$. Efficient equivalent of sum(abs, a - b).

source
StatsBase.LinfdistFunction
Linfdist(a, b)

Compute the L∞ distance, also called the Chebyshev distance, between two arrays: $\max_{i\in1:n} |a_i - b_i|$. Efficient equivalent of maxabs(a - b).

source
StatsBase.gkldivFunction
gkldiv(a, b)

Compute the generalized Kullback-Leibler divergence between two arrays: $\sum_{i=1}^n (a_i \log(a_i/b_i) - a_i + b_i)$. Efficient equivalent of sum(a*log(a/b)-a+b).

source
StatsBase.meanadFunction
meanad(a, b)

Return the mean absolute deviation between two arrays: mean(abs, a - b).

source
StatsBase.maxadFunction
maxad(a, b)

Return the maximum absolute deviation between two arrays: maxabs(a - b).

source
StatsBase.msdFunction
msd(a, b)

Return the mean squared deviation between two arrays: mean(abs2, a - b).

source
StatsBase.rmsdFunction
rmsd(a, b; normalize=false)

Return the root mean squared deviation between two optionally normalized arrays. The root mean squared deviation is computed as sqrt(msd(a, b)).

source
StatsBase.psnrFunction
psnr(a, b, maxv)

Compute the peak signal-to-noise ratio between two arrays a and b. maxv is the maximum possible value either array can take. The PSNR is computed as 10 * log10(maxv^2 / msd(a, b)).

source
Note

All these functions are implemented in a reasonably efficient way without creating any temporary arrays in the middle.

+Computing Deviations · StatsBase.jl

Computing Deviations

This package provides functions to compute various deviations between arrays in a variety of ways:

StatsBase.counteqFunction
counteq(a, b)

Count the number of indices at which the elements of the arrays a and b are equal.

source
StatsBase.countneFunction
countne(a, b)

Count the number of indices at which the elements of the arrays a and b are not equal.

source
StatsBase.sqL2distFunction
sqL2dist(a, b)

Compute the squared L2 distance between two arrays: $\sum_{i=1}^n |a_i - b_i|^2$. Efficient equivalent of sum(abs2, a - b).

source
StatsBase.L2distFunction
L2dist(a, b)

Compute the L2 distance between two arrays: $\sqrt{\sum_{i=1}^n |a_i - b_i|^2}$. Efficient equivalent of sqrt(sum(abs2, a - b)).

source
StatsBase.L1distFunction
L1dist(a, b)

Compute the L1 distance between two arrays: $\sum_{i=1}^n |a_i - b_i|$. Efficient equivalent of sum(abs, a - b).

source
StatsBase.LinfdistFunction
Linfdist(a, b)

Compute the L∞ distance, also called the Chebyshev distance, between two arrays: $\max_{i\in1:n} |a_i - b_i|$. Efficient equivalent of maxabs(a - b).

source
StatsBase.gkldivFunction
gkldiv(a, b)

Compute the generalized Kullback-Leibler divergence between two arrays: $\sum_{i=1}^n (a_i \log(a_i/b_i) - a_i + b_i)$. Efficient equivalent of sum(a*log(a/b)-a+b).

source
StatsBase.meanadFunction
meanad(a, b)

Return the mean absolute deviation between two arrays: mean(abs, a - b).

source
StatsBase.maxadFunction
maxad(a, b)

Return the maximum absolute deviation between two arrays: maxabs(a - b).

source
StatsBase.msdFunction
msd(a, b)

Return the mean squared deviation between two arrays: mean(abs2, a - b).

source
StatsBase.rmsdFunction
rmsd(a, b; normalize=false)

Return the root mean squared deviation between two optionally normalized arrays. The root mean squared deviation is computed as sqrt(msd(a, b)).

source
StatsBase.psnrFunction
psnr(a, b, maxv)

Compute the peak signal-to-noise ratio between two arrays a and b. maxv is the maximum possible value either array can take. The PSNR is computed as 10 * log10(maxv^2 / msd(a, b)).

source
Note

All these functions are implemented in a reasonably efficient way without creating any temporary arrays in the middle.

diff --git a/dev/empirical/index.html b/dev/empirical/index.html index ef0ef0ef..3ff90a1c 100644 --- a/dev/empirical/index.html +++ b/dev/empirical/index.html @@ -39,7 +39,7 @@ closed: left isdensity: true -julia> # observe isdensity = true and weights tells us the number of observation per binsize in each binsource

Histograms can be fitted to data using the fit method.

StatsAPI.fitMethod
fit(Histogram, data[, weight][, edges]; closed=:left[, nbins])

Fit a histogram to data.

Arguments

  • data: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).

  • weight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.

  • edges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximately nbins bins of equal width are constructed along each dimension.

Note

In most cases, the number of bins will be nbins. However, to ensure that the bins have equal width, more or fewer than nbins bins may be used.

Keyword arguments

  • closed: if :left (the default), the bin intervals are left-closed [a,b); if :right, intervals are right-closed (a,b].

  • nbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e. ceil(log2(length(n))) + 1 with n the number of data points.

Examples

# Univariate
+julia> # observe isdensity = true and weights tells us the number of observation per binsize in each bin
source

Histograms can be fitted to data using the fit method.

StatsAPI.fitMethod
fit(Histogram, data[, weight][, edges]; closed=:left[, nbins])

Fit a histogram to data.

Arguments

  • data: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).

  • weight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.

  • edges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximately nbins bins of equal width are constructed along each dimension.

Note

In most cases, the number of bins will be nbins. However, to ensure that the bins have equal width, more or fewer than nbins bins may be used.

Keyword arguments

  • closed: if :left (the default), the bin intervals are left-closed [a,b); if :right, intervals are right-closed (a,b].

  • nbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e. ceil(log2(length(n))) + 1 with n the number of data points.

Examples

# Univariate
 h = fit(Histogram, rand(100))
 h = fit(Histogram, rand(100), 0:0.1:1.0)
 h = fit(Histogram, rand(100), nbins=10)
@@ -49,4 +49,4 @@
 
 # Multivariate
 h = fit(Histogram, (rand(100),rand(100)))
-h = fit(Histogram, (rand(100),rand(100)),nbins=10)
source

Additional methods

Base.merge!Function
merge!(target::Histogram, others::Histogram...)

Update histogram target by merging it with the histograms others. See merge(histogram::Histogram, others::Histogram...) for details.

source
Base.mergeFunction
merge(h::Histogram, others::Histogram...)

Construct a new histogram by merging h with others. All histograms must have the same binning, shape of weights and properties (closed and isdensity). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h.

source
LinearAlgebra.normFunction
norm(h::Histogram)

Calculate the norm of histogram h as the absolute value of its integral.

source
LinearAlgebra.normalizeFunction
normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h.

Valid values for mode are:

  • :pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.
  • :density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1).
  • :probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.
  • :none: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.

Successive application of both :probability and :density normalization (in any order) is equivalent to :pdf normalization.

source
normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h and rescales one or more auxiliary weight arrays at the same time (aux_weights may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.

source
LinearAlgebra.normalize!Function
normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}

Normalize the histogram h and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize for details. Returns h.

source
Base.zeroFunction
zero(h::Histogram)

Create a new histogram with the same binning, type and shape of weights and the same properties (closed and isdensity) as h, with all weights set to zero.

source

Empirical Cumulative Distribution Function

StatsBase.ecdfFunction
ecdf(X; weights::AbstractWeights)

Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X. Optionally providing weights returns a weighted ECDF.

Note: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.

extrema, minimum, and maximum are supported to for obtaining the range over which function is inside the interval $(0,1)$; the function is defined for the whole real line.

source
+h = fit(Histogram, (rand(100),rand(100)),nbins=10)source

Additional methods

Base.merge!Function
merge!(target::Histogram, others::Histogram...)

Update histogram target by merging it with the histograms others. See merge(histogram::Histogram, others::Histogram...) for details.

source
Base.mergeFunction
merge(h::Histogram, others::Histogram...)

Construct a new histogram by merging h with others. All histograms must have the same binning, shape of weights and properties (closed and isdensity). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h.

source
LinearAlgebra.normFunction
norm(h::Histogram)

Calculate the norm of histogram h as the absolute value of its integral.

source
LinearAlgebra.normalizeFunction
normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h.

Valid values for mode are:

  • :pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.
  • :density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1).
  • :probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.
  • :none: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.

Successive application of both :probability and :density normalization (in any order) is equivalent to :pdf normalization.

source
normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h and rescales one or more auxiliary weight arrays at the same time (aux_weights may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.

source
LinearAlgebra.normalize!Function
normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}

Normalize the histogram h and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize for details. Returns h.

source
Base.zeroFunction
zero(h::Histogram)

Create a new histogram with the same binning, type and shape of weights and the same properties (closed and isdensity) as h, with all weights set to zero.

source

Empirical Cumulative Distribution Function

StatsBase.ecdfFunction
ecdf(X; weights::AbstractWeights)

Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X. Optionally providing weights returns a weighted ECDF.

Note: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.

extrema, minimum, and maximum are supported to for obtaining the range over which function is inside the interval $(0,1)$; the function is defined for the whole real line.

source
diff --git a/dev/index.html b/dev/index.html index 4ed6fa46..bc37ebde 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,3 +1,3 @@ Getting Started · StatsBase.jl
+Pkg.add("StatsBase")

To load the package, use the command:

using StatsBase

Available Features

diff --git a/dev/misc/index.html b/dev/misc/index.html index 4a8c45dd..720ab649 100644 --- a/dev/misc/index.html +++ b/dev/misc/index.html @@ -2,12 +2,12 @@ Miscellaneous Functions · StatsBase.jl

Miscellaneous Functions

StatsBase.rleFunction
rle(v) -> (vals, lens)

Return the run-length encoding of a vector as a tuple. The first element of the tuple is a vector of values of the input and the second is the number of consecutive occurrences of each element.

Examples

julia> using StatsBase
 
 julia> rle([1,1,1,2,2,3,3,3,3,2,2,2])
-([1, 2, 3, 2], [3, 2, 4, 3])
source
StatsBase.inverse_rleFunction
inverse_rle(vals, lens)

Reconstruct a vector from its run-length encoding (see rle). vals is a vector of the values and lens is a vector of the corresponding run lengths.

source
StatsBase.levelsmapFunction
levelsmap(a)

Construct a dictionary that maps each of the n unique values in a to a number between 1 and n.

source
StatsBase.indexmapFunction
indexmap(a)

Construct a dictionary that maps each unique value in a to the index of its first occurrence in a.

source
StatsBase.indicatormatFunction
indicatormat(x, k::Integer; sparse=false)

Construct a boolean matrix I of size (k, length(x)) such that I[x[i], i] = true and all other elements are set to false. If sparse is true, the output will be a sparse matrix, otherwise it will be dense (default).

Examples

julia> using StatsBase
+([1, 2, 3, 2], [3, 2, 4, 3])
source
StatsBase.inverse_rleFunction
inverse_rle(vals, lens)

Reconstruct a vector from its run-length encoding (see rle). vals is a vector of the values and lens is a vector of the corresponding run lengths.

source
StatsBase.levelsmapFunction
levelsmap(a)

Construct a dictionary that maps each of the n unique values in a to a number between 1 and n.

source
StatsBase.indexmapFunction
indexmap(a)

Construct a dictionary that maps each unique value in a to the index of its first occurrence in a.

source
StatsBase.indicatormatFunction
indicatormat(x, k::Integer; sparse=false)

Construct a boolean matrix I of size (k, length(x)) such that I[x[i], i] = true and all other elements are set to false. If sparse is true, the output will be a sparse matrix, otherwise it will be dense (default).

Examples

julia> using StatsBase
 
 julia> indicatormat([1 2 2], 2)
 2×3 Matrix{Bool}:
  1  0  0
- 0  1  1
source
indicatormat(x, c=sort(unique(x)); sparse=false)

Construct a boolean matrix I of size (length(c), length(x)). Let ci be the index of x[i] in c. Then I[ci, i] = true and all other elements are false.

source
StatsAPI.pairwiseFunction
pairwise(f, x[, y];
+ 0  1  1
source
indicatormat(x, c=sort(unique(x)); sparse=false)

Construct a boolean matrix I of size (length(c), length(x)). Let ci be the index of x[i] in c. Then I[ci, i] = true and all other elements are false.

source
StatsAPI.pairwiseFunction
pairwise(f, x[, y];
          symmetric::Bool=false, skipmissing::Symbol=:none)

Return a matrix holding the result of applying f to all possible pairs of entries in iterators x and y. Rows correspond to entries in x and columns to entries in y. If y is omitted then a square matrix crossing x with itself is returned.

As a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.

Keyword arguments

  • symmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.
  • skipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.

Examples

julia> using StatsBase, Statistics
 
 julia> x = [1 3 7
@@ -30,7 +30,7 @@
 3×3 Matrix{Float64}:
   1.0        0.928571  -0.866025
   0.928571   1.0       -1.0
- -0.866025  -1.0        1.0
source
StatsAPI.pairwise!Function
pairwise!(f, dest::AbstractMatrix, x[, y];
           symmetric::Bool=false, skipmissing::Symbol=:none)

Store in matrix dest the result of applying f to all possible pairs of entries in iterators x and y, and return it. Rows correspond to entries in x and columns to entries in y, and dest must therefore be of size length(x) × length(y). If y is omitted then x is crossed with itself.

As a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.

Keyword arguments

  • symmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.
  • skipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.

Examples

julia> using StatsBase, Statistics
 
 julia> dest = zeros(3, 3);
@@ -59,4 +59,4 @@
 3×3 Matrix{Float64}:
   1.0        0.928571  -0.866025
   0.928571   1.0       -1.0
- -0.866025  -1.0        1.0
source
+ -0.866025 -1.0 1.0source diff --git a/dev/multivariate/index.html b/dev/multivariate/index.html index bba577c9..9f8085b3 100644 --- a/dev/multivariate/index.html +++ b/dev/multivariate/index.html @@ -1,2 +1,2 @@ -Multivariate Summary Statistics · StatsBase.jl

Multivariate Summary Statistics

This package provides a few methods for summarizing multivariate data.

Partial Correlation

StatsBase.partialcorFunction
partialcor(x, y, Z)

Compute the partial correlation of the vectors x and y given Z, which can be a vector or matrix.

source

Generalizations of Variance

StatsBase.genvarFunction
genvar(X)

Compute the generalized sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the determinant of the covariance matrix of X.

Note

The generalized sample variance will be 0 if the columns of the matrix of deviations are linearly dependent.

source
StatsBase.totalvarFunction
totalvar(X)

Compute the total sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the sum of the diagonal elements of the covariance matrix of X.

source
+Multivariate Summary Statistics · StatsBase.jl

Multivariate Summary Statistics

This package provides a few methods for summarizing multivariate data.

Partial Correlation

StatsBase.partialcorFunction
partialcor(x, y, Z)

Compute the partial correlation of the vectors x and y given Z, which can be a vector or matrix.

source

Generalizations of Variance

StatsBase.genvarFunction
genvar(X)

Compute the generalized sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the determinant of the covariance matrix of X.

Note

The generalized sample variance will be 0 if the columns of the matrix of deviations are linearly dependent.

source
StatsBase.totalvarFunction
totalvar(X)

Compute the total sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the sum of the diagonal elements of the covariance matrix of X.

source
diff --git a/dev/ranking/index.html b/dev/ranking/index.html index a5e3dfc5..24e9cae0 100644 --- a/dev/ranking/index.html +++ b/dev/ranking/index.html @@ -1,2 +1,2 @@ -Rankings and Rank Correlations · StatsBase.jl

Rankings and Rank Correlations

This package implements various strategies for computing ranks and rank correlations.

StatsBase.ordinalrankFunction
ordinalrank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the ordinal ranking ("1234" ranking) of an array. Supports the same keyword arguments as the sort function. All items in x are given distinct, successive ranks based on their position in the sorted vector. Missing values are assigned rank missing.

source
StatsBase.competerankFunction
competerank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the standard competition ranking ("1224" ranking) of an array. Supports the same keyword arguments as the sort function. Equal ("tied") items are given the same rank, and the next rank comes after a gap that is equal to the number of tied items - 1. Missing values are assigned rank missing.

source
StatsBase.denserankFunction
denserank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the dense ranking ("1223" ranking) of an array. Supports the same keyword arguments as the sort function. Equal items receive the same rank, and the next subsequent rank is assigned with no gap. Missing values are assigned rank missing.

source
StatsBase.tiedrankFunction
tiedrank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the tied ranking, also called fractional or "1 2.5 2.5 4" ranking, of an array. Supports the same keyword arguments as the sort function. Equal ("tied") items receive the mean of the ranks they would have been assigned under the ordinal ranking (see ordinalrank). Missing values are assigned rank missing.

source
StatsBase.corspearmanFunction
corspearman(x, y=x)

Compute Spearman's rank correlation coefficient. If x and y are vectors, the output is a float, otherwise it's a matrix corresponding to the pairwise correlations of the columns of x and y.

source
StatsBase.corkendallFunction
corkendall(x, y=x)

Compute Kendall's rank correlation coefficient, τ. x and y must both be either matrices or vectors.

source
+Rankings and Rank Correlations · StatsBase.jl

Rankings and Rank Correlations

This package implements various strategies for computing ranks and rank correlations.

StatsBase.ordinalrankFunction
ordinalrank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the ordinal ranking ("1234" ranking) of an array. Supports the same keyword arguments as the sort function. All items in x are given distinct, successive ranks based on their position in the sorted vector. Missing values are assigned rank missing.

source
StatsBase.competerankFunction
competerank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the standard competition ranking ("1224" ranking) of an array. Supports the same keyword arguments as the sort function. Equal ("tied") items are given the same rank, and the next rank comes after a gap that is equal to the number of tied items - 1. Missing values are assigned rank missing.

source
StatsBase.denserankFunction
denserank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the dense ranking ("1223" ranking) of an array. Supports the same keyword arguments as the sort function. Equal items receive the same rank, and the next subsequent rank is assigned with no gap. Missing values are assigned rank missing.

source
StatsBase.tiedrankFunction
tiedrank(x; lt=isless, by=identity, rev::Bool=false, ...)

Return the tied ranking, also called fractional or "1 2.5 2.5 4" ranking, of an array. Supports the same keyword arguments as the sort function. Equal ("tied") items receive the mean of the ranks they would have been assigned under the ordinal ranking (see ordinalrank). Missing values are assigned rank missing.

source
StatsBase.corspearmanFunction
corspearman(x, y=x)

Compute Spearman's rank correlation coefficient. If x and y are vectors, the output is a float, otherwise it's a matrix corresponding to the pairwise correlations of the columns of x and y.

source
StatsBase.corkendallFunction
corkendall(x, y=x)

Compute Kendall's rank correlation coefficient, τ. x and y must both be either matrices or vectors.

source
diff --git a/dev/robust/index.html b/dev/robust/index.html index cb182283..bad6f682 100644 --- a/dev/robust/index.html +++ b/dev/robust/index.html @@ -3,10 +3,10 @@ 3-element Array{Int64,1}: 2 4 - 3source
StatsBase.trim!Function
trim!(x::AbstractVector; prop=0.0, count=0)

A variant of trim that modifies x in place.

source
StatsBase.winsorFunction
winsor(x::AbstractVector; prop=0.0, count=0)

Return an iterator of all elements of x that replaces either count or proportion prop of the highest elements with the previous-highest element and an equal number of the lowest elements with the next-lowest element.

The number of replaced elements could be smaller than specified if several elements equal the lower or upper bound.

To compute the Winsorized mean of x use mean(winsor(x)).

Example

julia> collect(winsor([5,2,3,4,1], prop=0.2))
+ 3
source
StatsBase.trim!Function
trim!(x::AbstractVector; prop=0.0, count=0)

A variant of trim that modifies x in place.

source
StatsBase.winsorFunction
winsor(x::AbstractVector; prop=0.0, count=0)

Return an iterator of all elements of x that replaces either count or proportion prop of the highest elements with the previous-highest element and an equal number of the lowest elements with the next-lowest element.

The number of replaced elements could be smaller than specified if several elements equal the lower or upper bound.

To compute the Winsorized mean of x use mean(winsor(x)).

Example

julia> collect(winsor([5,2,3,4,1], prop=0.2))
 5-element Array{Int64,1}:
  4
  2
  3
  4
- 2
source
StatsBase.winsor!Function
winsor!(x::AbstractVector; prop=0.0, count=0)

A variant of winsor that modifies vector x in place.

source
StatsBase.trimvarFunction
trimvar(x; prop=0.0, count=0)

Compute the variance of the trimmed mean of x. This function uses the Winsorized variance, as described in Wilcox (2010).

source
+ 2source
StatsBase.winsor!Function
winsor!(x::AbstractVector; prop=0.0, count=0)

A variant of winsor that modifies vector x in place.

source
StatsBase.trimvarFunction
trimvar(x; prop=0.0, count=0)

Compute the variance of the trimmed mean of x. This function uses the Winsorized variance, as described in Wilcox (2010).

source
diff --git a/dev/sampling/index.html b/dev/sampling/index.html index 57533182..e64b52b3 100644 --- a/dev/sampling/index.html +++ b/dev/sampling/index.html @@ -1,5 +1,5 @@ -Sampling from Population · StatsBase.jl

Sampling from Population

Sampling API

The package provides functions for sampling from a given population (with or without replacement).

StatsBase.sampleFunction
sample([rng], a, [wv::AbstractWeights])

Select a single random element of a. Sampling probabilities are proportional to the weights given in wv, if provided.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)

Select a random, optionally weighted sample of size n from an array a using a polyalgorithm. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)

Select a random, optionally weighted sample from an array a specifying the dimensions dims of the output array. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], wv::AbstractWeights)

Select a single random integer in 1:length(wv) with probabilities proportional to the weights given in wv.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.sample!Function
sample!([rng], a, [wv::AbstractWeights], x; replace=true, ordered=false)

Draw a random sample of length(x) elements from an array a and store the result in x. A polyalgorithm is used for sampling. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

Output array a must not be the same object as x or wv nor share memory with them, or the result may be incorrect.

source
StatsBase.wsampleFunction
wsample([rng], [a], w)

Select a weighted random sample of size 1 from a with probabilities proportional to the weights given in w. If a is not present, select a random weight from w.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
wsample([rng], [a], w, n::Integer; replace=true, ordered=false)

Select a weighted random sample of size n from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
wsample([rng], [a], w, dims::Dims; replace=true, ordered=false)

Select a weighted random sample from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. The dimensions of the output are given by dims.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.wsample!Function
wsample!([rng], a, w, x; replace=true, ordered=false)

Select a weighted sample from an array a and store the result in x. Sampling probabilities are proportional to the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source

Algorithms

Internally, this package implements multiple algorithms, and the sample (and sample!) methods integrate them into a poly-algorithm, which chooses a specific algorithm based on inputs.

Note that the choices made in sample are decided based on extensive benchmarking (see perf/sampling.jl and perf/wsampling.jl). It performs reasonably fast for most cases. That being said, if you know that a certain algorithm is particularly suitable for your context, directly calling an internal algorithm function might be slightly more efficient.

Here are a list of algorithms implemented in the package. The functions below are not exported (one can still import them from StatsBase via using though).

Notations

  • a: source array representing the population
  • x: the destination array
  • wv: the weight vector (of type AbstractWeights), for weighted sampling
  • n: the length of a
  • k: the length of x. For sampling without replacement, k must not exceed n.
  • rng: optional random number generator (defaults to Random.default_rng() on Julia >= 1.3 and Random.GLOBAL_RNG on Julia < 1.3)

All following functions write results to x (pre-allocated) and return x.

Sampling Algorithms (Non-Weighted)

StatsBase.direct_sample!Method
direct_sample!([rng], a::AbstractArray, x::AbstractArray)

Direct sampling: for each j in 1:k, randomly pick i from 1:n, and set x[j] = a[i], with n=length(a) and k=length(x).

This algorithm consumes k random numbers.

source
StatsBase.samplepairFunction
samplepair([rng], n)

Draw a pair of distinct integers between 1 and n without replacement.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
samplepair([rng], a)

Draw a pair of distinct elements from the array a without replacement.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.knuths_sample!Function
knuths_sample!([rng], a, x)

Knuth's Algorithm S for random sampling without replacement.

Reference: D. Knuth. The Art of Computer Programming. Vol 2, 3.4.2, p.142.

This algorithm consumes length(a) random numbers. It requires no additional memory space. Suitable for the case where memory is tight.

source
StatsBase.fisher_yates_sample!Function
fisher_yates_sample!([rng], a::AbstractArray, x::AbstractArray)

Fisher-Yates shuffling (with early termination).

Pseudo-code:

n = length(a)
+Sampling from Population · StatsBase.jl        
         
 

Sampling from Population

Sampling API

The package provides functions for sampling from a given population (with or without replacement).

StatsBase.sampleFunction
sample([rng], a, [wv::AbstractWeights])

Select a single random element of a. Sampling probabilities are proportional to the weights given in wv, if provided.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)

Select a random, optionally weighted sample of size n from an array a using a polyalgorithm. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)

Select a random, optionally weighted sample from an array a specifying the dimensions dims of the output array. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
sample([rng], wv::AbstractWeights)

Select a single random integer in 1:length(wv) with probabilities proportional to the weights given in wv.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.sample!Function
sample!([rng], a, [wv::AbstractWeights], x; replace=true, ordered=false)

Draw a random sample of length(x) elements from an array a and store the result in x. A polyalgorithm is used for sampling. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

Output array a must not be the same object as x or wv nor share memory with them, or the result may be incorrect.

source
StatsBase.wsampleFunction
wsample([rng], [a], w)

Select a weighted random sample of size 1 from a with probabilities proportional to the weights given in w. If a is not present, select a random weight from w.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
wsample([rng], [a], w, n::Integer; replace=true, ordered=false)

Select a weighted random sample of size n from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
wsample([rng], [a], w, dims::Dims; replace=true, ordered=false)

Select a weighted random sample from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. The dimensions of the output are given by dims.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.wsample!Function
wsample!([rng], a, w, x; replace=true, ordered=false)

Select a weighted sample from an array a and store the result in x. Sampling probabilities are proportional to the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source

Algorithms

Internally, this package implements multiple algorithms, and the sample (and sample!) methods integrate them into a poly-algorithm, which chooses a specific algorithm based on inputs.

Note that the choices made in sample are decided based on extensive benchmarking (see perf/sampling.jl and perf/wsampling.jl). It performs reasonably fast for most cases. That being said, if you know that a certain algorithm is particularly suitable for your context, directly calling an internal algorithm function might be slightly more efficient.

Here are a list of algorithms implemented in the package. The functions below are not exported (one can still import them from StatsBase via using though).

Notations

  • a: source array representing the population
  • x: the destination array
  • wv: the weight vector (of type AbstractWeights), for weighted sampling
  • n: the length of a
  • k: the length of x. For sampling without replacement, k must not exceed n.
  • rng: optional random number generator (defaults to Random.default_rng() on Julia >= 1.3 and Random.GLOBAL_RNG on Julia < 1.3)

All following functions write results to x (pre-allocated) and return x.

Sampling Algorithms (Non-Weighted)

StatsBase.direct_sample!Method
direct_sample!([rng], a::AbstractArray, x::AbstractArray)

Direct sampling: for each j in 1:k, randomly pick i from 1:n, and set x[j] = a[i], with n=length(a) and k=length(x).

This algorithm consumes k random numbers.

source
StatsBase.samplepairFunction
samplepair([rng], n)

Draw a pair of distinct integers between 1 and n without replacement.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
samplepair([rng], a)

Draw a pair of distinct elements from the array a without replacement.

Optionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).

source
StatsBase.knuths_sample!Function
knuths_sample!([rng], a, x)

Knuth's Algorithm S for random sampling without replacement.

Reference: D. Knuth. The Art of Computer Programming. Vol 2, 3.4.2, p.142.

This algorithm consumes length(a) random numbers. It requires no additional memory space. Suitable for the case where memory is tight.

source
StatsBase.fisher_yates_sample!Function
fisher_yates_sample!([rng], a::AbstractArray, x::AbstractArray)

Fisher-Yates shuffling (with early termination).

Pseudo-code:

n = length(a)
 k = length(x)
 
 # Create an array of the indices
@@ -8,4 +8,4 @@
 for i = 1:k
     # swap element `i` with another random element in inds[i:n]
     # set element `i` in `x`
-end

This algorithm consumes k=length(x) random numbers. It uses an integer array of length n=length(a) internally to maintain the shuffled indices. It is considerably faster than Knuth's algorithm especially when n is greater than k. It is $O(n)$ for initialization, plus $O(k)$ for random shuffling

source
StatsBase.self_avoid_sample!Function
self_avoid_sample!([rng], a::AbstractArray, x::AbstractArray)

Self-avoid sampling: use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one.

This algorithm consumes about (or slightly more than) k=length(x) random numbers, and requires $O(k)$ memory to store the set of sampled indices. Very fast when $n >> k$, with n=length(a).

However, if k is large and approaches $n$, the rejection rate would increase drastically, resulting in poorer performance.

source
StatsBase.seqsample_a!Function
seqsample_a!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm A described in the following paper (page 714): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(n)$ random numbers, with n=length(a). The outputs are ordered.

source
StatsBase.seqsample_c!Function
seqsample_c!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm C described in the following paper (page 715): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(k^2)$ random numbers, with k=length(x). The outputs are ordered.

source
StatsBase.seqsample_d!Function
seqsample_d!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm D described in the following paper (page 716-17): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(k)$ random numbers, with k=length(x). The outputs are ordered.

source

Weighted Sampling Algorithms

StatsBase.direct_sample!Method
direct_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Direct sampling.

Draw each sample by scanning the weight vector.

Noting k=length(x) and n=length(a), this algorithm:

  • consumes k random numbers
  • has time complexity $O(n k)$, as scanning the weight vector each time takes $O(n)$
  • requires no additional memory space.
source
StatsBase.alias_sample!Function
alias_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Alias method.

Build an alias table, and sample therefrom.

Reference: Walker, A. J. "An Efficient Method for Generating Discrete Random Variables with General Distributions." ACM Transactions on Mathematical Software 3 (3): 253, 1977.

Noting k=length(x) and n=length(a), this algorithm takes $O(n)$ time for building the alias table, and then $O(1)$ to draw each sample. It consumes $k$ random numbers.

source
StatsBase.naive_wsample_norep!Function
naive_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Naive implementation of weighted sampling without replacement.

It makes a copy of the weight vector at initialization, and sets the weight to zero when the corresponding sample is picked.

Noting k=length(x) and n=length(a), this algorithm consumes $O(k)$ random numbers, and has overall time complexity $O(n k)$.

source
StatsBase.efraimidis_a_wsample_norep!Function
efraimidis_a_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Weighted sampling without replacement using Efraimidis-Spirakis A algorithm.

Reference: Efraimidis, P. S., Spirakis, P. G. "Weighted random sampling with a reservoir." Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.

Noting k=length(x) and n=length(a), this algorithm takes $O(n + k \log k)$ processing time to draw $k$ elements. It consumes $n$ random numbers.

source
StatsBase.efraimidis_ares_wsample_norep!Function
efraimidis_ares_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Implementation of weighted sampling without replacement using Efraimidis-Spirakis A-Res algorithm.

Reference: Efraimidis, P. S., Spirakis, P. G. "Weighted random sampling with a reservoir." Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.

Noting k=length(x) and n=length(a), this algorithm takes $O(k \log(k) \log(n / k))$ processing time to draw $k$ elements. It consumes $n$ random numbers.

source
+end

This algorithm consumes k=length(x) random numbers. It uses an integer array of length n=length(a) internally to maintain the shuffled indices. It is considerably faster than Knuth's algorithm especially when n is greater than k. It is $O(n)$ for initialization, plus $O(k)$ for random shuffling

source
StatsBase.self_avoid_sample!Function
self_avoid_sample!([rng], a::AbstractArray, x::AbstractArray)

Self-avoid sampling: use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one.

This algorithm consumes about (or slightly more than) k=length(x) random numbers, and requires $O(k)$ memory to store the set of sampled indices. Very fast when $n >> k$, with n=length(a).

However, if k is large and approaches $n$, the rejection rate would increase drastically, resulting in poorer performance.

source
StatsBase.seqsample_a!Function
seqsample_a!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm A described in the following paper (page 714): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(n)$ random numbers, with n=length(a). The outputs are ordered.

source
StatsBase.seqsample_c!Function
seqsample_c!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm C described in the following paper (page 715): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(k^2)$ random numbers, with k=length(x). The outputs are ordered.

source
StatsBase.seqsample_d!Function
seqsample_d!([rng], a::AbstractArray, x::AbstractArray)

Random subsequence sampling using algorithm D described in the following paper (page 716-17): Jeffrey Scott Vitter. "Faster Methods for Random Sampling". Communications of the ACM, 27 (7), July 1984.

This algorithm consumes $O(k)$ random numbers, with k=length(x). The outputs are ordered.

source

Weighted Sampling Algorithms

StatsBase.direct_sample!Method
direct_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Direct sampling.

Draw each sample by scanning the weight vector.

Noting k=length(x) and n=length(a), this algorithm:

  • consumes k random numbers
  • has time complexity $O(n k)$, as scanning the weight vector each time takes $O(n)$
  • requires no additional memory space.
source
StatsBase.alias_sample!Function
alias_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Alias method.

Build an alias table, and sample therefrom.

Reference: Walker, A. J. "An Efficient Method for Generating Discrete Random Variables with General Distributions." ACM Transactions on Mathematical Software 3 (3): 253, 1977.

Noting k=length(x) and n=length(a), this algorithm takes $O(n)$ time for building the alias table, and then $O(1)$ to draw each sample. It consumes $k$ random numbers.

source
StatsBase.naive_wsample_norep!Function
naive_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Naive implementation of weighted sampling without replacement.

It makes a copy of the weight vector at initialization, and sets the weight to zero when the corresponding sample is picked.

Noting k=length(x) and n=length(a), this algorithm consumes $O(k)$ random numbers, and has overall time complexity $O(n k)$.

source
StatsBase.efraimidis_a_wsample_norep!Function
efraimidis_a_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Weighted sampling without replacement using Efraimidis-Spirakis A algorithm.

Reference: Efraimidis, P. S., Spirakis, P. G. "Weighted random sampling with a reservoir." Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.

Noting k=length(x) and n=length(a), this algorithm takes $O(n + k \log k)$ processing time to draw $k$ elements. It consumes $n$ random numbers.

source
StatsBase.efraimidis_ares_wsample_norep!Function
efraimidis_ares_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)

Implementation of weighted sampling without replacement using Efraimidis-Spirakis A-Res algorithm.

Reference: Efraimidis, P. S., Spirakis, P. G. "Weighted random sampling with a reservoir." Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.

Noting k=length(x) and n=length(a), this algorithm takes $O(k \log(k) \log(n / k))$ processing time to draw $k$ elements. It consumes $n$ random numbers.

source
diff --git a/dev/scalarstats/index.html b/dev/scalarstats/index.html index d3e7d748..cb757777 100644 --- a/dev/scalarstats/index.html +++ b/dev/scalarstats/index.html @@ -1,13 +1,13 @@ -Scalar Statistics · StatsBase.jl

Scalar Statistics

The package implements functions for computing various statistics over an array of scalar real numbers.

Weighted sum and mean

Base.sumFunction
sum(v::AbstractArray, w::AbstractWeights{<:Real}; [dims])

Compute the weighted sum of an array v with weights w, optionally over the dimension dims.

source
Base.sum!Function
sum!(R::AbstractArray, A::AbstractArray,
+Scalar Statistics · StatsBase.jl        
         
 

Scalar Statistics

The package implements functions for computing various statistics over an array of scalar real numbers.

Weighted sum and mean

Base.sumFunction
sum(v::AbstractArray, w::AbstractWeights{<:Real}; [dims])

Compute the weighted sum of an array v with weights w, optionally over the dimension dims.

source
Base.sum!Function
sum!(R::AbstractArray, A::AbstractArray,
      w::AbstractWeights{<:Real}, dim::Int;
-     init::Bool=true)

Compute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.

source
StatsBase.wsumFunction
wsum(v, w::AbstractVector, [dim])

Compute the weighted sum of an array v with weights w, optionally over the dimension dim.

source
StatsBase.wsum!Function
wsum!(R::AbstractArray, A::AbstractArray,
+     init::Bool=true)

Compute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.

source
StatsBase.wsumFunction
wsum(v, w::AbstractVector, [dim])

Compute the weighted sum of an array v with weights w, optionally over the dimension dim.

source
StatsBase.wsum!Function
wsum!(R::AbstractArray, A::AbstractArray,
       w::AbstractVector, dim::Int;
-      init::Bool=true)

Compute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.

source
Statistics.meanFunction
mean(A::AbstractArray, w::AbstractWeights[, dims::Int])

Compute the weighted mean of array A with weight vector w (of type AbstractWeights). If dim is provided, compute the weighted mean along dimension dims.

Examples

n = 20
+      init::Bool=true)

Compute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.

source
Statistics.meanFunction
mean(A::AbstractArray, w::AbstractWeights[, dims::Int])

Compute the weighted mean of array A with weight vector w (of type AbstractWeights). If dim is provided, compute the weighted mean along dimension dims.

Examples

n = 20
 x = rand(n)
 w = rand(n)
-mean(x, weights(w))
source
Statistics.mean!Function
mean!(R::AbstractArray, A::AbstractArray, w::AbstractWeights[; dims=nothing])

Compute the weighted mean of array A with weight vector w (of type AbstractWeights) along dimension dims, and write results to R.

source

Means

The package provides functions to compute means of different kinds.

StatsBase.genmeanFunction
genmean(a, p)

Return the generalized/power mean with exponent p of a real-valued array, i.e. $\left( \frac{1}{n} \sum_{i=1}^n a_i^p \right)^{\frac{1}{p}}$, where n = length(a). It is taken to be the geometric mean when p == 0.

source

Moments and cumulants

Statistics.varFunction
var(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)

Compute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:

\[\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - μ}\right)^2 }\]

where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the variance of the vector x using the estimator ce.

source
Statistics.stdFunction
std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)

Compute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:

\[\sqrt{\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - μ}\right)^2 }}\]

where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the standard deviation of the vector x using the estimator ce.

source
StatsBase.mean_and_varFunction
mean_and_var(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, var)

Return the mean and variance of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.

source
StatsBase.mean_and_stdFunction
mean_and_std(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, std)

Return the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.

source
StatsBase.skewnessFunction
skewness(v, [wv::AbstractWeights], m=mean(v))

Compute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.kurtosisFunction
kurtosis(v, [wv::AbstractWeights], m=mean(v))

Compute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.momentFunction
moment(v, k, [wv::AbstractWeights], m=mean(v))

Return the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.cumulantFunction
cumulant(v, k, [wv::AbstractWeights], m=mean(v))

Return the kth order cumulant of a real-valued array v, optionally specifying a weighting vector wv and a pre-computed mean m.

If k is a range of Integers, then return all the cumulants of orders in this range as a vector.

This quantity is calculated using a recursive definition on lower-order cumulants and central moments.

Reference: Smith, P. J. 1995. A Recursive Formulation of the Old Problem of Obtaining Moments from Cumulants and Vice Versa. The American Statistician, 49(2), 217–218. https://doi.org/10.2307/2684642

source

Measurements of Variation

StatsBase.spanFunction
span(x)

Return the span of a collection, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one pass using extrema.

source
StatsBase.variationFunction
variation(x, m=mean(x); corrected=true)

Return the coefficient of variation of collection x, optionally specifying a precomputed mean m, and the optional correction parameter corrected. The coefficient of variation is the ratio of the standard deviation to the mean. If corrected is false, then std is calculated with denominator n. Else, the std is calculated with denominator n-1.

source
StatsBase.semFunction
sem(x; mean=nothing)
-sem(x::AbstractArray[, weights::AbstractWeights]; mean=nothing)

Return the standard error of the mean for a collection x. A pre-computed mean may be provided.

When not using weights, this is the (sample) standard deviation divided by the sample size. If weights are used, the variance of the sample mean is calculated as follows:

  • AnalyticWeights: Not implemented.
  • FrequencyWeights: $\frac{\sum_{i=1}^n w_i (x_i - \bar{x_i})^2}{(\sum w_i) (\sum w_i - 1)}$
  • ProbabilityWeights: $\frac{n}{n-1} \frac{\sum_{i=1}^n w_i^2 (x_i - \bar{x_i})^2}{\left( \sum w_i \right)^2}$

The standard error is then the square root of the above quantities.

References

Carl-Erik Särndal, Bengt Swensson, Jan Wretman (1992). Model Assisted Survey Sampling. New York: Springer. pp. 51-53.

source
StatsBase.madFunction
mad(x; center=median(x), normalize=true)

Compute the median absolute deviation (MAD) of collection x around center (by default, around the median).

If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.

source
StatsBase.mad!Function
StatsBase.mad!(x; center=median!(x), normalize=true)

Compute the median absolute deviation (MAD) of array x around center (by default, around the median), overwriting x in the process.

If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.

source

Z-scores

StatsBase.zscoreFunction
zscore(X, [μ, σ])

Compute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.

μ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.

source
StatsBase.zscore!Function
zscore!([Z], X, μ, σ)

Compute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.

If a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.

source
StatsBase.entropyFunction
entropy(p, [b])

Compute the entropy of a collection of probabilities p, optionally specifying a real number b such that the entropy is scaled by 1/log(b). Elements with probability 0 or 1 add 0 to the entropy.

source
StatsBase.crossentropyFunction
crossentropy(p, q, [b])

Compute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).

source
StatsBase.kldivergenceFunction
kldivergence(p, q, [b])

Compute the Kullback-Leibler divergence from q to p, also called the relative entropy of p with respect to q, that is the sum pᵢ * log(pᵢ / qᵢ). Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).

source
StatsBase.iqrFunction
iqr(x)

Compute the interquartile range (IQR) of collection x, i.e. the 75th percentile minus the 25th percentile.

source
StatsBase.nquantileFunction
nquantile(x, n::Integer)

Return the n-quantiles of collection x, i.e. the values which partition v into n subsets of nearly equal size.

Equivalent to quantile(x, [0:n]/n). For example, nquantiles(x, 5) returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].

source
Statistics.quantileFunction
quantile(v, w::AbstractWeights, p)

Compute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.

With FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.

With non FrequencyWeights, denote $N$ the length of the vector, $w$ the vector of weights, $h = p (\sum_{i<= N} w_i - w_1) + w_1$ the cumulative weight corresponding to the probability $p$ and $S_k = \sum_{i<=k} w_i$ the cumulative weight for each observation, define $v_{k+1}$ the smallest element of v such that $S_{k+1}$ is strictly superior to $h$. The weighted $p$ quantile is given by $v_k + \gamma (v_{k+1} - v_k)$ with $\gamma = (h - S_k)/(S_{k+1} - S_k)$. In particular, when all weights are equal, the function returns the same result as the unweighted quantile.

source
Statistics.medianMethod
median(v::AbstractVector{<:Real}, w::AbstractWeights)

Compute the weighted median of v with weights w (of type AbstractWeights). See the documentation for quantile for more details.

source
StatsBase.quantilerankFunction
quantilerank(itr, value; method=:inc)

Compute the quantile position in the [0, 1] interval of value relative to collection itr.

Different definitions can be chosen via the method keyword argument. Let count_less be the number of elements of itr that are less than value, count_equal the number of elements of itr that are equal to value, n the length of itr, greatest_smaller the highest value below value and smallest_greater the lowest value above value. Then method supports the following definitions:

  • :inc (default): Return a value in the range 0 to 1 inclusive.

Return count_less / (n - 1) if value ∈ itr, otherwise apply interpolation based on definition 7 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK and PERCENTRANK.INC). This definition corresponds to the lower semi-continuous inverse of quantile with its default parameters.

  • :exc: Return a value in the range 0 to 1 exclusive.

Return (count_less + 1) / (n + 1) if value ∈ itr otherwise apply interpolation based on definition 6 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK.EXC).

  • :compete: Return count_less / (n - 1) if value ∈ itr, otherwise

return (count_less - 1) / (n - 1), without interpolation (equivalent to MariaDB PERCENT_RANK, dplyr percent_rank).

  • :tied: Return (count_less + count_equal/2) / n, without interpolation.

Based on the definition in Roscoe, J. T. (1975) (equivalent to "mean" kind of SciPy percentileofscore).

  • :strict: Return count_less / n, without interpolation

(equivalent to "strict" kind of SciPy percentileofscore).

  • :weak: Return (count_less + count_equal) / n, without interpolation

(equivalent to "weak" kind of SciPy percentileofscore).

Note

An ArgumentError is thrown if itr contains NaN or missing values or if itr contains fewer than two elements.

References

Roscoe, J. T. (1975). Fundamental Research Statistics for the Behavioral Sciences", 2nd ed., New York : Holt, Rinehart and Winston.

Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365.

Examples

julia> using StatsBase
+mean(x, weights(w))
source
Statistics.mean!Function
mean!(R::AbstractArray, A::AbstractArray, w::AbstractWeights[; dims=nothing])

Compute the weighted mean of array A with weight vector w (of type AbstractWeights) along dimension dims, and write results to R.

source

Means

The package provides functions to compute means of different kinds.

StatsBase.genmeanFunction
genmean(a, p)

Return the generalized/power mean with exponent p of a real-valued array, i.e. $\left( \frac{1}{n} \sum_{i=1}^n a_i^p \right)^{\frac{1}{p}}$, where n = length(a). It is taken to be the geometric mean when p == 0.

source

Moments and cumulants

Statistics.varFunction
var(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)

Compute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:

\[\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - μ}\right)^2 }\]

where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the variance of the vector x using the estimator ce.

source
Statistics.stdFunction
std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)

Compute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:

\[\sqrt{\frac{1}{\sum{w}} \sum_{i=1}^n {w_i\left({x_i - μ}\right)^2 }}\]

where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:

  • AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$
  • FrequencyWeights: $\frac{1}{\sum{w} - 1}$
  • ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equals count(!iszero, w)
  • Weights: ArgumentError (bias correction not supported)
source
std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)

Compute the standard deviation of the vector x using the estimator ce.

source
StatsBase.mean_and_varFunction
mean_and_var(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, var)

Return the mean and variance of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.

source
StatsBase.mean_and_stdFunction
mean_and_std(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, std)

Return the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.

source
StatsBase.skewnessFunction
skewness(v, [wv::AbstractWeights], m=mean(v))

Compute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.kurtosisFunction
kurtosis(v, [wv::AbstractWeights], m=mean(v))

Compute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.momentFunction
moment(v, k, [wv::AbstractWeights], m=mean(v))

Return the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.

source
StatsBase.cumulantFunction
cumulant(v, k, [wv::AbstractWeights], m=mean(v))

Return the kth order cumulant of a real-valued array v, optionally specifying a weighting vector wv and a pre-computed mean m.

If k is a range of Integers, then return all the cumulants of orders in this range as a vector.

This quantity is calculated using a recursive definition on lower-order cumulants and central moments.

Reference: Smith, P. J. 1995. A Recursive Formulation of the Old Problem of Obtaining Moments from Cumulants and Vice Versa. The American Statistician, 49(2), 217–218. https://doi.org/10.2307/2684642

source

Measurements of Variation

StatsBase.spanFunction
span(x)

Return the span of a collection, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one pass using extrema.

source
StatsBase.variationFunction
variation(x, m=mean(x); corrected=true)

Return the coefficient of variation of collection x, optionally specifying a precomputed mean m, and the optional correction parameter corrected. The coefficient of variation is the ratio of the standard deviation to the mean. If corrected is false, then std is calculated with denominator n. Else, the std is calculated with denominator n-1.

source
StatsBase.semFunction
sem(x; mean=nothing)
+sem(x::AbstractArray[, weights::AbstractWeights]; mean=nothing)

Return the standard error of the mean for a collection x. A pre-computed mean may be provided.

When not using weights, this is the (sample) standard deviation divided by the square root of the sample size. If weights are used, the variance of the sample mean is calculated as follows:

  • AnalyticWeights: Not implemented.
  • FrequencyWeights: $\frac{\sum_{i=1}^n w_i (x_i - \bar{x_i})^2}{(\sum w_i) (\sum w_i - 1)}$
  • ProbabilityWeights: $\frac{n}{n-1} \frac{\sum_{i=1}^n w_i^2 (x_i - \bar{x_i})^2}{\left( \sum w_i \right)^2}$

The standard error is then the square root of the above quantities.

References

Carl-Erik Särndal, Bengt Swensson, Jan Wretman (1992). Model Assisted Survey Sampling. New York: Springer. pp. 51-53.

source
StatsBase.madFunction
mad(x; center=median(x), normalize=true)

Compute the median absolute deviation (MAD) of collection x around center (by default, around the median).

If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.

source
StatsBase.mad!Function
StatsBase.mad!(x; center=median!(x), normalize=true)

Compute the median absolute deviation (MAD) of array x around center (by default, around the median), overwriting x in the process.

If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.

source

Z-scores

StatsBase.zscoreFunction
zscore(X, [μ, σ])

Compute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.

μ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.

source
StatsBase.zscore!Function
zscore!([Z], X, μ, σ)

Compute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.

If a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.

source
StatsBase.entropyFunction
entropy(p, [b])

Compute the entropy of a collection of probabilities p, optionally specifying a real number b such that the entropy is scaled by 1/log(b). Elements with probability 0 or 1 add 0 to the entropy.

source
StatsBase.crossentropyFunction
crossentropy(p, q, [b])

Compute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).

source
StatsBase.kldivergenceFunction
kldivergence(p, q, [b])

Compute the Kullback-Leibler divergence from q to p, also called the relative entropy of p with respect to q, that is the sum pᵢ * log(pᵢ / qᵢ). Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).

source
StatsBase.iqrFunction
iqr(x)

Compute the interquartile range (IQR) of collection x, i.e. the 75th percentile minus the 25th percentile.

source
StatsBase.nquantileFunction
nquantile(x, n::Integer)

Return the n-quantiles of collection x, i.e. the values which partition v into n subsets of nearly equal size.

Equivalent to quantile(x, [0:n]/n). For example, nquantiles(x, 5) returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].

source
Statistics.quantileFunction
quantile(v, w::AbstractWeights, p)

Compute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.

With FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.

With non FrequencyWeights, denote $N$ the length of the vector, $w$ the vector of weights, $h = p (\sum_{i<= N} w_i - w_1) + w_1$ the cumulative weight corresponding to the probability $p$ and $S_k = \sum_{i<=k} w_i$ the cumulative weight for each observation, define $v_{k+1}$ the smallest element of v such that $S_{k+1}$ is strictly superior to $h$. The weighted $p$ quantile is given by $v_k + \gamma (v_{k+1} - v_k)$ with $\gamma = (h - S_k)/(S_{k+1} - S_k)$. In particular, when all weights are equal, the function returns the same result as the unweighted quantile.

source
Statistics.medianMethod
median(v::AbstractVector{<:Real}, w::AbstractWeights)

Compute the weighted median of v with weights w (of type AbstractWeights). See the documentation for quantile for more details.

source
StatsBase.quantilerankFunction
quantilerank(itr, value; method=:inc)

Compute the quantile position in the [0, 1] interval of value relative to collection itr.

Different definitions can be chosen via the method keyword argument. Let count_less be the number of elements of itr that are less than value, count_equal the number of elements of itr that are equal to value, n the length of itr, greatest_smaller the highest value below value and smallest_greater the lowest value above value. Then method supports the following definitions:

  • :inc (default): Return a value in the range 0 to 1 inclusive.

Return count_less / (n - 1) if value ∈ itr, otherwise apply interpolation based on definition 7 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK and PERCENTRANK.INC). This definition corresponds to the lower semi-continuous inverse of quantile with its default parameters.

  • :exc: Return a value in the range 0 to 1 exclusive.

Return (count_less + 1) / (n + 1) if value ∈ itr otherwise apply interpolation based on definition 6 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK.EXC).

  • :compete: Return count_less / (n - 1) if value ∈ itr, otherwise

return (count_less - 1) / (n - 1), without interpolation (equivalent to MariaDB PERCENT_RANK, dplyr percent_rank).

  • :tied: Return (count_less + count_equal/2) / n, without interpolation.

Based on the definition in Roscoe, J. T. (1975) (equivalent to "mean" kind of SciPy percentileofscore).

  • :strict: Return count_less / n, without interpolation

(equivalent to "strict" kind of SciPy percentileofscore).

  • :weak: Return (count_less + count_equal) / n, without interpolation

(equivalent to "weak" kind of SciPy percentileofscore).

Note

An ArgumentError is thrown if itr contains NaN or missing values or if itr contains fewer than two elements.

References

Roscoe, J. T. (1975). Fundamental Research Statistics for the Behavioral Sciences", 2nd ed., New York : Holt, Rinehart and Winston.

Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365.

Examples

julia> using StatsBase
 
 julia> v1 = [1, 1, 1, 2, 3, 4, 8, 11, 12, 13];
 
@@ -29,9 +29,9 @@
 julia> quantilerank.(Ref(v3), [4, 8])
 2-element Vector{Float64}:
  0.3333333333333333
- 0.8888888888888888
source

Mode and Modes

StatsBase.modeFunction
mode(a, [r])
-mode(a::AbstractArray, wv::AbstractWeights)

Return the mode (most common number) of an array, optionally over a specified range r or weighted via a vector wv. If several modes exist, the first one (in order of appearance) is returned.

source
StatsBase.modesFunction
modes(a, [r])::Vector
-mode(a::AbstractArray, wv::AbstractWeights)::Vector

Return all modes (most common numbers) of an array, optionally over a specified range r or weighted via vector wv.

source

Summary Statistics

StatsBase.summarystatsFunction
summarystats(a)

Compute summary statistics for a real-valued array a. Returns a SummaryStats object containing the number of observations, number of missing observations, standard deviation, mean, minimum, 25th percentile, median, 75th percentile, and maximum.

source
DataAPI.describeFunction
describe(a)

Pretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.

source

Reliability Measures

StatsBase.cronbachalphaFunction
cronbachalpha(covmatrix::AbstractMatrix{<:Real})

Calculate Cronbach's alpha (1951) from a covariance matrix covmatrix according to the formula:

\[\rho = \frac{k}{k-1} (1 - \frac{\sum^k_{i=1} \sigma^2_i}{\sum_{i=1}^k \sum_{j=1}^k \sigma_{ij}})\]

where $k$ is the number of items, i.e. columns, $\sigma_i^2$ the item variance, and $\sigma_{ij}$ the inter-item covariance.

Returns a CronbachAlpha object that holds:

  • alpha: the Cronbach's alpha score for all items, i.e. columns, in covmatrix; and
  • dropped: a vector giving Cronbach's alpha scores if a specific item, i.e. column, is dropped from covmatrix.

Example

julia> using StatsBase
+ 0.8888888888888888
source

Mode and Modes

StatsBase.modeFunction
mode(a, [r])
+mode(a::AbstractArray, wv::AbstractWeights)

Return the mode (most common number) of an array, optionally over a specified range r or weighted via a vector wv. If several modes exist, the first one (in order of appearance) is returned.

source
StatsBase.modesFunction
modes(a, [r])::Vector
+mode(a::AbstractArray, wv::AbstractWeights)::Vector

Return all modes (most common numbers) of an array, optionally over a specified range r or weighted via vector wv.

source

Summary Statistics

StatsBase.summarystatsFunction
summarystats(a)

Compute summary statistics for a real-valued array a. Returns a SummaryStats object containing the number of observations, number of missing observations, standard deviation, mean, minimum, 25th percentile, median, 75th percentile, and maximum.

source
DataAPI.describeFunction
describe(a)

Pretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.

source

Reliability Measures

StatsBase.cronbachalphaFunction
cronbachalpha(covmatrix::AbstractMatrix{<:Real})

Calculate Cronbach's alpha (1951) from a covariance matrix covmatrix according to the formula:

\[\rho = \frac{k}{k-1} (1 - \frac{\sum^k_{i=1} \sigma^2_i}{\sum_{i=1}^k \sum_{j=1}^k \sigma_{ij}})\]

where $k$ is the number of items, i.e. columns, $\sigma_i^2$ the item variance, and $\sigma_{ij}$ the inter-item covariance.

Returns a CronbachAlpha object that holds:

  • alpha: the Cronbach's alpha score for all items, i.e. columns, in covmatrix; and
  • dropped: a vector giving Cronbach's alpha scores if a specific item, i.e. column, is dropped from covmatrix.

Example

julia> using StatsBase
 
 julia> cov_X = [10 6 6 6;
                 6 11 6 6;
@@ -45,4 +45,4 @@
 item 1: 0.7500
 item 2: 0.7606
 item 3: 0.7714
-item 4: 0.7826
source
+item 4: 0.7826
source
diff --git a/dev/search/index.html b/dev/search/index.html index 19f8e1eb..e8d03ac0 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · StatsBase.jl

Loading search...

    +Search · StatsBase.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index c975b2b0..534e6702 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"statmodels/#Abstraction-for-Statistical-Models","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"","category":"section"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"StatsAPI.jl defines an abstract type StatisticalModel, and an abstract subtype RegressionModel. They are both extended by StatsBase, and documented here.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"Particularly, instances of StatisticalModel implement the following methods.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"adjr2\naic\naicc\nbic\ncoef\ncoefnames\ncoeftable\nconfint\ndeviance\ndof\nfit\nfit!\ninformationmatrix\nisfitted\nislinear\nloglikelihood\nmss\nnobs\nnulldeviance\nnullloglikelihood\nr2\nrss\nscore\nstderror\nvcov\nweights","category":"page"},{"location":"statmodels/#StatsAPI.adjr2","page":"Abstraction for Statistical Models","title":"StatsAPI.adjr2","text":"adjr2(model::StatisticalModel)\nadjr²(model::StatisticalModel)\n\nAdjusted coefficient of determination (adjusted R-squared).\n\nFor linear models, the adjusted R² is defined as 1 - (1 - (1-R^2)(n-1)(n-p)), with R^2 the coefficient of determination, n the number of observations, and p the number of coefficients (including the intercept). This definition is generally known as the Wherry Formula I.\n\n\n\n\n\nadjr2(model::StatisticalModel, variant::Symbol)\nadjr²(model::StatisticalModel, variant::Symbol)\n\nAdjusted pseudo-coefficient of determination (adjusted pseudo R-squared). For nonlinear models, one of the several pseudo R² definitions must be chosen via variant. The only currently supported variants are :MacFadden, defined as 1 - (log (L) - k)log (L0) and :devianceratio, defined as 1 - (D(n-k))(D_0(n-1)). In these formulas, L is the likelihood of the model, L0 that of the null model (the model including only the intercept), D is the deviance of the model, D_0 is the deviance of the null model, n is the number of observations (given by nobs) and k is the number of consumed degrees of freedom of the model (as returned by dof).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.aic","page":"Abstraction for Statistical Models","title":"StatsAPI.aic","text":"aic(model::StatisticalModel)\n\nAkaike's Information Criterion, defined as -2 log L + 2k, with L the likelihood of the model, and k its number of consumed degrees of freedom (as returned by dof).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.aicc","page":"Abstraction for Statistical Models","title":"StatsAPI.aicc","text":"aicc(model::StatisticalModel)\n\nCorrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as -2 log L + 2k + 2k(k-1)(n-k-1), with L the likelihood of the model, k its number of consumed degrees of freedom (as returned by dof), and n the number of observations (as returned by nobs).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.bic","page":"Abstraction for Statistical Models","title":"StatsAPI.bic","text":"bic(model::StatisticalModel)\n\nBayesian Information Criterion, defined as -2 log L + k log n, with L the likelihood of the model, k its number of consumed degrees of freedom (as returned by dof), and n the number of observations (as returned by nobs).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coef","page":"Abstraction for Statistical Models","title":"StatsAPI.coef","text":"coef(model::StatisticalModel)\n\nReturn the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coefnames","page":"Abstraction for Statistical Models","title":"StatsAPI.coefnames","text":"coefnames(model::StatisticalModel)\n\nReturn the names of the coefficients.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coeftable","page":"Abstraction for Statistical Models","title":"StatsAPI.coeftable","text":"coeftable(model::StatisticalModel; level::Real=0.95)\n\nReturn a table with coefficients and related statistics of the model. level determines the level for confidence intervals (by default, 95%).\n\nThe returned CoefTable object implements the Tables.jl interface, and can be converted e.g. to a DataFrame via using DataFrames; DataFrame(coeftable(model)).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.confint","page":"Abstraction for Statistical Models","title":"StatsAPI.confint","text":"confint(model::StatisticalModel; level::Real=0.95)\n\nCompute confidence intervals for coefficients, with confidence level level (by default 95%).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.deviance","page":"Abstraction for Statistical Models","title":"StatsAPI.deviance","text":"deviance(model::StatisticalModel)\n\nReturn the deviance of the model relative to a reference, which is usually when applicable the saturated model. It is equal, up to a constant, to -2 log L, with L the likelihood of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.dof","page":"Abstraction for Statistical Models","title":"StatsAPI.dof","text":"dof(model::StatisticalModel)\n\nReturn the number of degrees of freedom consumed in the model, including when applicable the intercept and the distribution's dispersion parameter.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fit","page":"Abstraction for Statistical Models","title":"StatsAPI.fit","text":"Fit a statistical model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fit!","page":"Abstraction for Statistical Models","title":"StatsAPI.fit!","text":"Fit a statistical model in-place.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.informationmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.informationmatrix","text":"informationmatrix(model::StatisticalModel; expected::Bool = true)\n\nReturn the information matrix of the model. By default the Fisher information matrix is returned, while the observed information matrix can be requested with expected = false.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.isfitted","page":"Abstraction for Statistical Models","title":"StatsAPI.isfitted","text":"isfitted(model::StatisticalModel)\n\nIndicate whether the model has been fitted.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.islinear","page":"Abstraction for Statistical Models","title":"StatsAPI.islinear","text":"islinear(model::StatisticalModel)\n\nIndicate whether the model is linear.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.loglikelihood","page":"Abstraction for Statistical Models","title":"StatsAPI.loglikelihood","text":"loglikelihood(model::StatisticalModel)\nloglikelihood(model::StatisticalModel, observation)\n\nReturn the log-likelihood of the model.\n\nWith an observation argument, return the contribution of observation to the log-likelihood of model.\n\nIf observation is a Colon, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.\n\nIn general, sum(loglikehood(model, :)) == loglikelihood(model).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.mss","page":"Abstraction for Statistical Models","title":"StatsAPI.mss","text":"mss(model::StatisticalModel)\n\nReturn the model sum of squares.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nobs","page":"Abstraction for Statistical Models","title":"StatsAPI.nobs","text":"nobs(model::StatisticalModel)\n\nReturn the number of independent observations on which the model was fitted. Be careful when using this information, as the definition of an independent observation may vary depending on the model, on the format used to pass the data, on the sampling plan (if specified), etc.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nulldeviance","page":"Abstraction for Statistical Models","title":"StatsAPI.nulldeviance","text":"nulldeviance(model::StatisticalModel)\n\nReturn the deviance of the null model, obtained by dropping all independent variables present in model.\n\nIf model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nullloglikelihood","page":"Abstraction for Statistical Models","title":"StatsAPI.nullloglikelihood","text":"nullloglikelihood(model::StatisticalModel)\n\nReturn the log-likelihood of the null model, obtained by dropping all independent variables present in model.\n\nIf model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.r2","page":"Abstraction for Statistical Models","title":"StatsAPI.r2","text":"r2(model::StatisticalModel)\nr²(model::StatisticalModel)\n\nCoefficient of determination (R-squared).\n\nFor a linear model, the R² is defined as ESSTSS, with ESS the explained sum of squares and TSS the total sum of squares.\n\n\n\n\n\nr2(model::StatisticalModel, variant::Symbol)\nr²(model::StatisticalModel, variant::Symbol)\n\nPseudo-coefficient of determination (pseudo R-squared).\n\nFor nonlinear models, one of several pseudo R² definitions must be chosen via variant. Supported variants are:\n\n:MacFadden (a.k.a. likelihood ratio index), defined as 1 - log (L)log (L_0);\n:CoxSnell, defined as 1 - (L_0L)^2n;\n:Nagelkerke, defined as (1 - (L_0L)^2n)(1 - L_0^2n).\n:devianceratio, defined as 1 - DD_0.\n\nIn the above formulas, L is the likelihood of the model, L_0 is the likelihood of the null model (the model with only an intercept), D is the deviance of the model (from the saturated model), D_0 is the deviance of the null model, n is the number of observations (given by nobs).\n\nThe Cox-Snell and the deviance ratio variants both match the classical definition of R² for linear models.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.rss","page":"Abstraction for Statistical Models","title":"StatsAPI.rss","text":"rss(model::StatisticalModel)\n\nReturn the residual sum of squares of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.score","page":"Abstraction for Statistical Models","title":"StatsAPI.score","text":"score(model::StatisticalModel)\n\nReturn the score of the model, that is the gradient of the log-likelihood with respect to the coefficients.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.stderror","page":"Abstraction for Statistical Models","title":"StatsAPI.stderror","text":"stderror(model::StatisticalModel)\n\nReturn the standard errors for the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.vcov","page":"Abstraction for Statistical Models","title":"StatsAPI.vcov","text":"vcov(model::StatisticalModel)\n\nReturn the variance-covariance matrix for the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.weights","page":"Abstraction for Statistical Models","title":"StatsAPI.weights","text":"weights(model::StatisticalModel)\n\nReturn the weights used in the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"RegressionModel extends StatisticalModel by implementing the following additional methods.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"crossmodelmatrix\ndof_residual\nfitted\nleverage\ncooksdistance\nmeanresponse\nmodelmatrix\nresponse\nresponsename\npredict\npredict!\nresiduals","category":"page"},{"location":"statmodels/#StatsAPI.crossmodelmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.crossmodelmatrix","text":"crossmodelmatrix(model::RegressionModel)\n\nReturn X'X where X is the model matrix of model. This function will return a pre-computed matrix stored in model if possible.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.dof_residual","page":"Abstraction for Statistical Models","title":"StatsAPI.dof_residual","text":"dof_residual(model::RegressionModel)\n\nReturn the residual degrees of freedom of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fitted","page":"Abstraction for Statistical Models","title":"StatsAPI.fitted","text":"fitted(model::RegressionModel)\n\nReturn the fitted values of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.leverage","page":"Abstraction for Statistical Models","title":"StatsAPI.leverage","text":"leverage(model::RegressionModel)\n\nReturn the diagonal of the projection matrix of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.cooksdistance","page":"Abstraction for Statistical Models","title":"StatsAPI.cooksdistance","text":"cooksdistance(model::RegressionModel)\n\nCompute Cook's distance for each observation in linear model model, giving an estimate of the influence of each data point.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.meanresponse","page":"Abstraction for Statistical Models","title":"StatsAPI.meanresponse","text":"meanresponse(model::RegressionModel)\n\nReturn the mean of the response.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.modelmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.modelmatrix","text":"modelmatrix(model::RegressionModel)\n\nReturn the model matrix (a.k.a. the design matrix).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.response","page":"Abstraction for Statistical Models","title":"StatsAPI.response","text":"response(model::RegressionModel)\n\nReturn the model response (a.k.a. the dependent variable).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.responsename","page":"Abstraction for Statistical Models","title":"StatsAPI.responsename","text":"responsename(model::RegressionModel)\n\nReturn the name of the model response (a.k.a. the dependent variable).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.predict","page":"Abstraction for Statistical Models","title":"StatsAPI.predict","text":"predict(model::RegressionModel, [newX])\n\nForm the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.predict!","page":"Abstraction for Statistical Models","title":"StatsAPI.predict!","text":"predict!\n\nIn-place version of predict.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.residuals","page":"Abstraction for Statistical Models","title":"StatsAPI.residuals","text":"residuals(model::RegressionModel)\n\nReturn the residuals of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"An exception type is provided to signal convergence failures during model estimation:","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"ConvergenceException","category":"page"},{"location":"statmodels/#StatsBase.ConvergenceException","page":"Abstraction for Statistical Models","title":"StatsBase.ConvergenceException","text":"ConvergenceException(iters::Int, lastchange::Real=NaN, tol::Real=NaN)\n\nThe fitting procedure failed to converge in iters number of iterations, i.e. the lastchange between the cost of the final and penultimate iteration was greater than specified tolerance tol.\n\n\n\n\n\n","category":"type"},{"location":"multivariate/#Multivariate-Summary-Statistics","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"This package provides a few methods for summarizing multivariate data.","category":"page"},{"location":"multivariate/#Partial-Correlation","page":"Multivariate Summary Statistics","title":"Partial Correlation","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"partialcor","category":"page"},{"location":"multivariate/#StatsBase.partialcor","page":"Multivariate Summary Statistics","title":"StatsBase.partialcor","text":"partialcor(x, y, Z)\n\nCompute the partial correlation of the vectors x and y given Z, which can be a vector or matrix.\n\n\n\n\n\n","category":"function"},{"location":"multivariate/#Generalizations-of-Variance","page":"Multivariate Summary Statistics","title":"Generalizations of Variance","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"genvar\ntotalvar","category":"page"},{"location":"multivariate/#StatsBase.genvar","page":"Multivariate Summary Statistics","title":"StatsBase.genvar","text":"genvar(X)\n\nCompute the generalized sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the determinant of the covariance matrix of X.\n\nnote: Note\nThe generalized sample variance will be 0 if the columns of the matrix of deviations are linearly dependent.\n\n\n\n\n\n","category":"function"},{"location":"multivariate/#StatsBase.totalvar","page":"Multivariate Summary Statistics","title":"StatsBase.totalvar","text":"totalvar(X)\n\nCompute the total sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the sum of the diagonal elements of the covariance matrix of X.\n\n\n\n\n\n","category":"function"},{"location":"cov/#Scatter-Matrix-and-Covariance","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"","category":"section"},{"location":"cov/","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"This package implements functions for computing scatter matrix, as well as weighted covariance matrix.","category":"page"},{"location":"cov/","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"scattermat\ncov\ncov(::CovarianceEstimator, ::AbstractVector)\ncov(::CovarianceEstimator, ::AbstractVector, ::AbstractVector)\ncov(::CovarianceEstimator, ::AbstractMatrix)\nvar(::CovarianceEstimator, ::AbstractVector)\nstd(::CovarianceEstimator, ::AbstractVector)\ncor\nmean_and_cov\ncov2cor\ncor2cov\nCovarianceEstimator\nSimpleCovariance","category":"page"},{"location":"cov/#StatsBase.scattermat","page":"Scatter Matrix and Covariance","title":"StatsBase.scattermat","text":"scattermat(X, [wv::AbstractWeights]; mean=nothing, dims=1)\n\nCompute the scatter matrix, which is an unnormalized covariance matrix. A weighting vector wv can be specified to weight the estimate.\n\nArguments\n\nmean=nothing: a known mean value. nothing indicates that the mean is unknown, and the function will compute the mean. Specifying mean=0 indicates that the data are centered and hence there's no need to subtract the mean.\ndims=1: the dimension along which the variables are organized. When dims = 1, the variables are considered columns with observations in rows; when dims = 2, variables are in rows with observations in columns.\n\n\n\n\n\n","category":"function"},{"location":"cov/#Statistics.cov","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(X, w::AbstractWeights, vardim=1; mean=nothing, corrected=false)\n\nCompute the weighted covariance matrix. Similar to var and std the biased covariance matrix (corrected=false) is computed by multiplying scattermat(X, w) by frac1sumw to normalize. However, the unbiased covariance matrix (corrected=true) is dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\n","category":"function"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute a variance estimate from the observation vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractVector, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)\n\nCompute the covariance of the vectors x and y using estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractMatrix}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights]; mean=nothing, dims::Int=1)\n\nCompute the covariance matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:\n\nnothing (default) in which case the mean is estimated and subtracted from the data X,\na precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:\nwhen dims=1, an AbstractMatrix of size (1,M),\nwhen dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.var-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.var","text":"var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the variance of the vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.std-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.std","text":"std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the standard deviation of the vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cor","page":"Scatter Matrix and Covariance","title":"Statistics.cor","text":"cor(X, w::AbstractWeights, dims=1)\n\nCompute the Pearson correlation matrix of X along the dimension dims with a weighting w .\n\n\n\n\n\ncor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)\n\nCompute the correlation of the vectors x and y using estimator ce.\n\n\n\n\n\ncor(\n ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];\n mean=nothing, dims::Int=1\n)\n\nCompute the correlation matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:\n\nnothing (default) in which case the mean is estimated and subtracted from the data X,\na precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:\nwhen dims=1, an AbstractMatrix of size (1,M),\nwhen dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.mean_and_cov","page":"Scatter Matrix and Covariance","title":"StatsBase.mean_and_cov","text":"mean_and_cov(x, [wv::AbstractWeights,] vardim=1; corrected=false) -> (mean, cov)\n\nReturn the mean and covariance matrix as a tuple. A weighting vector wv can be specified. vardim that designates whether the variables are columns in the matrix (1) or rows (2). Finally, bias correction is applied to the covariance calculation if corrected=true. See cov documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.cov2cor","page":"Scatter Matrix and Covariance","title":"StatsBase.cov2cor","text":"cov2cor(C::AbstractMatrix, [s::AbstractArray])\n\nCompute the correlation matrix from the covariance matrix C and, optionally, a vector of standard deviations s. Use StatsBase.cov2cor! for an in-place version.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.cor2cov","page":"Scatter Matrix and Covariance","title":"StatsBase.cor2cov","text":"cor2cov(C, s)\n\nCompute the covariance matrix from the correlation matrix C and a vector of standard deviations s. Use StatsBase.cor2cov! for an in-place version.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.CovarianceEstimator","page":"Scatter Matrix and Covariance","title":"StatsBase.CovarianceEstimator","text":"CovarianceEstimator\n\nAbstract type for covariance estimators.\n\n\n\n\n\n","category":"type"},{"location":"cov/#StatsBase.SimpleCovariance","page":"Scatter Matrix and Covariance","title":"StatsBase.SimpleCovariance","text":"SimpleCovariance(;corrected::Bool=false)\n\nSimple covariance estimator. Estimation calls cov(x; corrected=corrected), cov(x, y; corrected=corrected) or cov(X, w, dims; corrected=corrected) where x, y are vectors, X is a matrix and w is a weighting vector.\n\n\n\n\n\n","category":"type"},{"location":"misc/#Miscellaneous-Functions","page":"Miscellaneous Functions","title":"Miscellaneous Functions","text":"","category":"section"},{"location":"misc/","page":"Miscellaneous Functions","title":"Miscellaneous Functions","text":"rle\ninverse_rle\nlevelsmap\nindexmap\nindicatormat\nStatsBase.midpoints\npairwise\npairwise!","category":"page"},{"location":"misc/#StatsBase.rle","page":"Miscellaneous Functions","title":"StatsBase.rle","text":"rle(v) -> (vals, lens)\n\nReturn the run-length encoding of a vector as a tuple. The first element of the tuple is a vector of values of the input and the second is the number of consecutive occurrences of each element.\n\nExamples\n\njulia> using StatsBase\n\njulia> rle([1,1,1,2,2,3,3,3,3,2,2,2])\n([1, 2, 3, 2], [3, 2, 4, 3])\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.inverse_rle","page":"Miscellaneous Functions","title":"StatsBase.inverse_rle","text":"inverse_rle(vals, lens)\n\nReconstruct a vector from its run-length encoding (see rle). vals is a vector of the values and lens is a vector of the corresponding run lengths.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.levelsmap","page":"Miscellaneous Functions","title":"StatsBase.levelsmap","text":"levelsmap(a)\n\nConstruct a dictionary that maps each of the n unique values in a to a number between 1 and n.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.indexmap","page":"Miscellaneous Functions","title":"StatsBase.indexmap","text":"indexmap(a)\n\nConstruct a dictionary that maps each unique value in a to the index of its first occurrence in a.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.indicatormat","page":"Miscellaneous Functions","title":"StatsBase.indicatormat","text":"indicatormat(x, k::Integer; sparse=false)\n\nConstruct a boolean matrix I of size (k, length(x)) such that I[x[i], i] = true and all other elements are set to false. If sparse is true, the output will be a sparse matrix, otherwise it will be dense (default).\n\nExamples\n\njulia> using StatsBase\n\njulia> indicatormat([1 2 2], 2)\n2×3 Matrix{Bool}:\n 1 0 0\n 0 1 1\n\n\n\n\n\nindicatormat(x, c=sort(unique(x)); sparse=false)\n\nConstruct a boolean matrix I of size (length(c), length(x)). Let ci be the index of x[i] in c. Then I[ci, i] = true and all other elements are false.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.midpoints","page":"Miscellaneous Functions","title":"StatsBase.midpoints","text":"StatsBase.midpoints(v)\n\nCalculate the midpoints (pairwise mean of consecutive elements).\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsAPI.pairwise","page":"Miscellaneous Functions","title":"StatsAPI.pairwise","text":"pairwise(f, x[, y];\n symmetric::Bool=false, skipmissing::Symbol=:none)\n\nReturn a matrix holding the result of applying f to all possible pairs of entries in iterators x and y. Rows correspond to entries in x and columns to entries in y. If y is omitted then a square matrix crossing x with itself is returned.\n\nAs a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.\n\nKeyword arguments\n\nsymmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.\nskipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.\n\nExamples\n\njulia> using StatsBase, Statistics\n\njulia> x = [1 3 7\n 2 5 6\n 3 8 4\n 4 6 2];\n\njulia> pairwise(cor, eachcol(x))\n3×3 Matrix{Float64}:\n 1.0 0.744208 -0.989778\n 0.744208 1.0 -0.68605\n -0.989778 -0.68605 1.0\n\njulia> y = [1 3 missing\n 2 5 6\n 3 missing 2\n 4 6 2];\n\njulia> pairwise(cor, eachcol(y), skipmissing=:pairwise)\n3×3 Matrix{Float64}:\n 1.0 0.928571 -0.866025\n 0.928571 1.0 -1.0\n -0.866025 -1.0 1.0\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsAPI.pairwise!","page":"Miscellaneous Functions","title":"StatsAPI.pairwise!","text":"pairwise!(f, dest::AbstractMatrix, x[, y];\n symmetric::Bool=false, skipmissing::Symbol=:none)\n\nStore in matrix dest the result of applying f to all possible pairs of entries in iterators x and y, and return it. Rows correspond to entries in x and columns to entries in y, and dest must therefore be of size length(x) × length(y). If y is omitted then x is crossed with itself.\n\nAs a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.\n\nKeyword arguments\n\nsymmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.\nskipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.\n\nExamples\n\njulia> using StatsBase, Statistics\n\njulia> dest = zeros(3, 3);\n\njulia> x = [1 3 7\n 2 5 6\n 3 8 4\n 4 6 2];\n\njulia> pairwise!(cor, dest, eachcol(x));\n\njulia> dest\n3×3 Matrix{Float64}:\n 1.0 0.744208 -0.989778\n 0.744208 1.0 -0.68605\n -0.989778 -0.68605 1.0\n\njulia> y = [1 3 missing\n 2 5 6\n 3 missing 2\n 4 6 2];\n\njulia> pairwise!(cor, dest, eachcol(y), skipmissing=:pairwise);\n\njulia> dest\n3×3 Matrix{Float64}:\n 1.0 0.928571 -0.866025\n 0.928571 1.0 -1.0\n -0.866025 -1.0 1.0\n\n\n\n\n\n","category":"function"},{"location":"weights/#Weight-Vectors","page":"Weight Vectors","title":"Weight Vectors","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type AbstractWeights for the purpose of representing weight vectors, which has two advantages:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"A different type AbstractWeights distinguishes the role of the weight vector from other data vectors in the input arguments.\nStatistical functions that utilize weights often need the sum of weights for various purposes. The weight vector maintains the sum of weights, so that it needn't be computed repeatedly each time the sum of weights is needed.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"note: Note\nThe weight vector is a light-weight wrapper of the input vector. The input vector is NOT copied during construction.\nThe weight vector maintains the sum of weights, which is computed upon construction. If the value of the sum is pre-computed, one can supply it as the second argument to the constructor and save the time of computing the sum again.","category":"page"},{"location":"weights/#Implementations","page":"Weight Vectors","title":"Implementations","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Several statistical weight types are provided which subtype AbstractWeights. The choice of weights impacts how bias is corrected in several methods. See the var, std and cov docstrings for more details.","category":"page"},{"location":"weights/#AnalyticWeights","page":"Weight Vectors","title":"AnalyticWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = AnalyticWeights([0.2, 0.1, 0.3])\nw = aweights([0.2, 0.1, 0.3])","category":"page"},{"location":"weights/#FrequencyWeights","page":"Weight Vectors","title":"FrequencyWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = FrequencyWeights([2, 1, 3])\nw = fweights([2, 1, 3])","category":"page"},{"location":"weights/#ProbabilityWeights","page":"Weight Vectors","title":"ProbabilityWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = ProbabilityWeights([0.2, 0.1, 0.3])\nw = pweights([0.2, 0.1, 0.3])","category":"page"},{"location":"weights/#UnitWeights","page":"Weight Vectors","title":"UnitWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Unit weights are a special case in which all observations are given a weight equal to 1. Using such weights is equivalent to computing unweighted statistics.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a UnitWeights object. This is very efficient since no weights vector is actually allocated.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = uweights(3)\nw = uweights(Float64, 3)","category":"page"},{"location":"weights/#Weights","page":"Weight Vectors","title":"Weights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"The Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights, ProbabilityWeights and UnitWeights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = Weights([1., 2., 3.])\nw = weights([1., 2., 3.])","category":"page"},{"location":"weights/#Exponential-weights:-eweights","page":"Weight Vectors","title":"Exponential weights: eweights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Exponential weights are a common form of temporal weights which assign exponentially decreasing weights to past observations.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"If t is a vector of temporal indices then for each index i we compute the weight as:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"λ (1 - λ)^1 - i","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"λ is a smoothing factor or rate parameter such that 0 λ 1. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"For example, the following call generates exponential weights for ten observations with λ = 03.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> eweights(1:10, 0.3)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.42857142857142855\n 0.6122448979591837\n 0.8746355685131197\n 1.249479383590171\n 1.7849705479859588\n 2.549957925694227\n 3.642797036706039\n 5.203995766722913\n 7.434279666747019","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Simply passing the number of observations n is equivalent to passing in 1:n.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> eweights(10, 0.3)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.42857142857142855\n 0.6122448979591837\n 0.8746355685131197\n 1.249479383590171\n 1.7849705479859588\n 2.549957925694227\n 3.642797036706039\n 5.203995766722913\n 7.434279666747019","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> t\n2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00\n\njulia> r\n2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00\n\njulia> eweights(t, r, 0.3)\n3-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.6122448979591837\n 1.249479383590171","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"NOTE: This is equivalent to eweights(something.(indexin(t, r)), 0.3), which is saying that for each value in t return the corresponding index for that value in r. Since indexin returns nothing if there is no corresponding value from t in r we use something to eliminate that possibility.","category":"page"},{"location":"weights/#Methods","page":"Weight Vectors","title":"Methods","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"AbstractWeights implements the following methods:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"eltype\nlength\nisempty\nvalues\nsum","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"The following constructors are provided:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"AnalyticWeights\nFrequencyWeights\nProbabilityWeights\nUnitWeights\nWeights\naweights\nfweights\npweights\neweights\nuweights\nweights(vs::AbstractArray{<:Real})","category":"page"},{"location":"weights/#StatsBase.AnalyticWeights","page":"Weight Vectors","title":"StatsBase.AnalyticWeights","text":"AnalyticWeights(vs, wsum=sum(vs))\n\nConstruct an AnalyticWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nAnalytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.FrequencyWeights","page":"Weight Vectors","title":"StatsBase.FrequencyWeights","text":"FrequencyWeights(vs, wsum=sum(vs))\n\nConstruct a FrequencyWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nFrequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.ProbabilityWeights","page":"Weight Vectors","title":"StatsBase.ProbabilityWeights","text":"ProbabilityWeights(vs, wsum=sum(vs))\n\nConstruct a ProbabilityWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nProbability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.UnitWeights","page":"Weight Vectors","title":"StatsBase.UnitWeights","text":"UnitWeights{T}(s)\n\nConstruct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.Weights","page":"Weight Vectors","title":"StatsBase.Weights","text":"Weights(vs, wsum=sum(vs))\n\nConstruct a Weights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nThe Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights and ProbabilityWeights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.aweights","page":"Weight Vectors","title":"StatsBase.aweights","text":"aweights(vs)\n\nConstruct an AnalyticWeights vector from array vs. See the documentation for AnalyticWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.fweights","page":"Weight Vectors","title":"StatsBase.fweights","text":"fweights(vs)\n\nConstruct a FrequencyWeights vector from a given array. See the documentation for FrequencyWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.pweights","page":"Weight Vectors","title":"StatsBase.pweights","text":"pweights(vs)\n\nConstruct a ProbabilityWeights vector from a given array. See the documentation for ProbabilityWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.eweights","page":"Weight Vectors","title":"StatsBase.eweights","text":"eweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)\neweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real; scale=false) where T\neweights(n::Integer, λ::Real; scale=false)\n\nConstruct a Weights vector which assigns exponentially decreasing weights to past observations (larger integer values i in t). The integer value n represents the number of past observations to consider. n defaults to maximum(t) - minimum(t) + 1 if only t is passed in and the elements are integers, and to length(r) if a superset range r is also passed in. If n is explicitly passed instead of t, t defaults to 1:n.\n\nIf scale is true then for each element i in t the weight value is computed as:\n\n(1 - λ)^n - i\n\nIf scale is false then each value is computed as:\n\nλ (1 - λ)^1 - i\n\nArguments\n\nt::AbstractVector: temporal indices or timestamps\nr::StepRange: a larger range to use when constructing weights from a subset of timestamps\nn::Integer: the number of past events to consider\nλ::Real: a smoothing factor or rate parameter such that 0 λ 1. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.\n\nKeyword arguments\n\nscale::Bool: Return the weights scaled to between 0 and 1 (default: false)\n\nExamples\n\njulia> eweights(1:10, 0.3; scale=true)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.04035360699999998\n 0.05764800999999997\n 0.08235429999999996\n 0.11764899999999996\n 0.16806999999999994\n 0.24009999999999995\n 0.3429999999999999\n 0.48999999999999994\n 0.7\n 1.0\n\nLinks\n\nhttps://en.wikipedia.org/wiki/Movingaverage#Exponentialmoving_average\nhttps://en.wikipedia.org/wiki/Exponential_smoothing\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.uweights","page":"Weight Vectors","title":"StatsBase.uweights","text":"uweights(s::Integer)\nuweights(::Type{T}, s::Integer) where T<:Real\n\nConstruct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.\n\nExamples\n\njulia> uweights(3)\n3-element UnitWeights{Int64}:\n 1\n 1\n 1\n\njulia> uweights(Float64, 3)\n3-element UnitWeights{Float64}:\n 1.0\n 1.0\n 1.0\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsAPI.weights-Tuple{AbstractArray{<:Real}}","page":"Weight Vectors","title":"StatsAPI.weights","text":"weights(vs::AbstractArray{<:Real})\n\nConstruct a Weights vector from array vs. See the documentation for Weights for more details.\n\n\n\n\n\n","category":"method"},{"location":"empirical/#Empirical-Estimation","page":"Empirical Estimation","title":"Empirical Estimation","text":"","category":"section"},{"location":"empirical/#Histograms","page":"Empirical Estimation","title":"Histograms","text":"","category":"section"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Histogram","category":"page"},{"location":"empirical/#StatsBase.Histogram","page":"Empirical Estimation","title":"StatsBase.Histogram","text":"Histogram <: AbstractHistogram\n\nThe Histogram type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over a real space. Histograms can be fitted to data using the fit method.\n\nFields\n\nedges: An iterator that contains the boundaries of the bins in each dimension.\nweights: An array that contains the weight of each bin.\nclosed: A symbol with value :right or :left indicating on which side bins (half-open intervals or higher-dimensional analogues thereof) are closed. See below for an example.\nisdensity: There are two interpretations of a Histogram. If isdensity=false the weight of a bin corresponds to the amount of a quantity in the bin. If isdensity=true then it corresponds to the density (amount / volume) of the quantity in the bin. See below for an example.\n\nExamples\n\nExample illustrating closed\n\njulia> using StatsBase\n\njulia> fit(Histogram, [2.], 1:3, closed=:left)\nHistogram{Int64, 1, Tuple{UnitRange{Int64}}}\nedges:\n 1:3\nweights: [0, 1]\nclosed: left\nisdensity: false\n\njulia> fit(Histogram, [2.], 1:3, closed=:right)\nHistogram{Int64, 1, Tuple{UnitRange{Int64}}}\nedges:\n 1:3\nweights: [1, 0]\nclosed: right\nisdensity: false\n\nExample illustrating isdensity\n\njulia> using StatsBase, LinearAlgebra\n\njulia> bins = [0,1,7]; # a small and a large bin\n\njulia> obs = [0.5, 1.5, 1.5, 2.5]; # one observation in the small bin and three in the large\n\njulia> h = fit(Histogram, obs, bins)\nHistogram{Int64,1,Tuple{Array{Int64,1}}}\nedges:\n [0, 1, 7]\nweights: [1, 3]\nclosed: left\nisdensity: false\n\njulia> # observe isdensity = false and the weights field records the number of observations in each bin\n\njulia> normalize(h, mode=:density)\nHistogram{Float64,1,Tuple{Array{Int64,1}}}\nedges:\n [0, 1, 7]\nweights: [1.0, 0.5]\nclosed: left\nisdensity: true\n\njulia> # observe isdensity = true and weights tells us the number of observation per binsize in each bin\n\n\n\n\n\n","category":"type"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Histograms can be fitted to data using the fit method.","category":"page"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"fit(::Type{Histogram}, args...; kwargs...)","category":"page"},{"location":"empirical/#StatsAPI.fit-Tuple{Type{Histogram}, Vararg{Any}}","page":"Empirical Estimation","title":"StatsAPI.fit","text":"fit(Histogram, data[, weight][, edges]; closed=:left[, nbins])\n\nFit a histogram to data.\n\nArguments\n\ndata: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).\nweight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.\nedges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximately nbins bins of equal width are constructed along each dimension.\n\nnote: Note\nIn most cases, the number of bins will be nbins. However, to ensure that the bins have equal width, more or fewer than nbins bins may be used.\n\nKeyword arguments\n\nclosed: if :left (the default), the bin intervals are left-closed [a,b); if :right, intervals are right-closed (a,b].\nnbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e. ceil(log2(length(n))) + 1 with n the number of data points.\n\nExamples\n\n# Univariate\nh = fit(Histogram, rand(100))\nh = fit(Histogram, rand(100), 0:0.1:1.0)\nh = fit(Histogram, rand(100), nbins=10)\nh = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)\nh = fit(Histogram, [20], 0:20:100)\nh = fit(Histogram, [20], 0:20:100, closed=:right)\n\n# Multivariate\nh = fit(Histogram, (rand(100),rand(100)))\nh = fit(Histogram, (rand(100),rand(100)),nbins=10)\n\n\n\n\n\n","category":"method"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Additional methods","category":"page"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"merge!\nmerge\nnorm\nnormalize\nnormalize!\nzero","category":"page"},{"location":"empirical/#Base.merge!","page":"Empirical Estimation","title":"Base.merge!","text":"merge!(target::Histogram, others::Histogram...)\n\nUpdate histogram target by merging it with the histograms others. See merge(histogram::Histogram, others::Histogram...) for details.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Base.merge","page":"Empirical Estimation","title":"Base.merge","text":"merge(h::Histogram, others::Histogram...)\n\nConstruct a new histogram by merging h with others. All histograms must have the same binning, shape of weights and properties (closed and isdensity). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.norm","page":"Empirical Estimation","title":"LinearAlgebra.norm","text":"norm(h::Histogram)\n\nCalculate the norm of histogram h as the absolute value of its integral.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.normalize","page":"Empirical Estimation","title":"LinearAlgebra.normalize","text":"normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}\n\nNormalize the histogram h.\n\nValid values for mode are:\n\n:pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.\n:density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1).\n:probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.\n:none: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.\n\nSuccessive application of both :probability and :density normalization (in any order) is equivalent to :pdf normalization.\n\n\n\n\n\nnormalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}\n\nNormalize the histogram h and rescales one or more auxiliary weight arrays at the same time (aux_weights may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.normalize!","page":"Empirical Estimation","title":"LinearAlgebra.normalize!","text":"normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}\n\nNormalize the histogram h and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize for details. Returns h.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Base.zero","page":"Empirical Estimation","title":"Base.zero","text":"zero(h::Histogram)\n\nCreate a new histogram with the same binning, type and shape of weights and the same properties (closed and isdensity) as h, with all weights set to zero.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Empirical-Cumulative-Distribution-Function","page":"Empirical Estimation","title":"Empirical Cumulative Distribution Function","text":"","category":"section"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"ecdf","category":"page"},{"location":"empirical/#StatsBase.ecdf","page":"Empirical Estimation","title":"StatsBase.ecdf","text":"ecdf(X; weights::AbstractWeights)\n\nReturn an empirical cumulative distribution function (ECDF) based on a vector of samples given in X. Optionally providing weights returns a weighted ECDF.\n\nNote: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.\n\nextrema, minimum, and maximum are supported to for obtaining the range over which function is inside the interval (01); the function is defined for the whole real line.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#Data-Transformations","page":"Data Transformations","title":"Data Transformations","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"In general, data transformations change raw feature vectors into a representation that is more suitable for various estimators.","category":"page"},{"location":"transformations/#Standardization-a.k.a-Z-score-Normalization","page":"Data Transformations","title":"Standardization a.k.a Z-score Normalization","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization, also known as Z-score normalization, is a common requirement for many machine learning techniques. These techniques might perform poorly if the individual features do not more or less look like standard normally distributed data.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization transforms data points into corresponding standard scores by subtracting mean and scaling to unit variance.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"The standard score, also known as Z-score, is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization can be performed using t = fit(ZScoreTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(ZScoreTransform, ...) is a shorthand to perform both operations in a single call.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"fit(::Type{ZScoreTransform}, X::AbstractArray{<:Real,2}; center::Bool=true, scale::Bool=true)","category":"page"},{"location":"transformations/#StatsAPI.fit-Tuple{Type{ZScoreTransform}, AbstractMatrix{<:Real}}","page":"Data Transformations","title":"StatsAPI.fit","text":"fit(ZScoreTransform, X; dims=nothing, center=true, scale=true)\n\nFit standardization parameters to vector or matrix X and return a ZScoreTransform transformation object.\n\nKeyword arguments\n\ndims: if 1 fit standardization parameters in column-wise fashion; if 2 fit in row-wise fashion. The default is nothing, which is equivalent to dims=2 with a deprecation warning.\ncenter: if true (the default) center data so that its mean is zero.\nscale: if true (the default) scale the data so that its variance is equal to one.\n\nExamples\n\njulia> using StatsBase\n\njulia> X = [0.0 -0.5 0.5; 0.0 1.0 2.0]\n2×3 Matrix{Float64}:\n 0.0 -0.5 0.5\n 0.0 1.0 2.0\n\njulia> dt = fit(ZScoreTransform, X, dims=2)\nZScoreTransform{Float64, Vector{Float64}}(2, 2, [0.0, 1.0], [0.5, 1.0])\n\njulia> StatsBase.transform(dt, X)\n2×3 Matrix{Float64}:\n 0.0 -1.0 1.0\n -1.0 0.0 1.0\n\n\n\n\n\n","category":"method"},{"location":"transformations/#Unit-Range-Normalization","page":"Data Transformations","title":"Unit Range Normalization","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Unit range normalization, also known as min-max scaling, is an alternative data transformation which scales features to lie in the interval [0; 1].","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Unit range normalization can be performed using t = fit(UnitRangeTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(UnitRangeTransform, ...) is a shorthand to perform both operations in a single call.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"fit(::Type{UnitRangeTransform}, X::AbstractArray{<:Real,2}; unit::Bool=true)","category":"page"},{"location":"transformations/#StatsAPI.fit-Tuple{Type{UnitRangeTransform}, AbstractMatrix{<:Real}}","page":"Data Transformations","title":"StatsAPI.fit","text":"fit(UnitRangeTransform, X; dims=nothing, unit=true)\n\nFit a scaling parameters to vector or matrix X and return a UnitRangeTransform transformation object.\n\nKeyword arguments\n\ndims: if 1 fit standardization parameters in column-wise fashion;\n\nif 2 fit in row-wise fashion. The default is nothing.\n\nunit: if true (the default) shift the minimum data to zero.\n\nExamples\n\njulia> using StatsBase\n\njulia> X = [0.0 -0.5 0.5; 0.0 1.0 2.0]\n2×3 Matrix{Float64}:\n 0.0 -0.5 0.5\n 0.0 1.0 2.0\n\njulia> dt = fit(UnitRangeTransform, X, dims=2)\nUnitRangeTransform{Float64, Vector{Float64}}(2, 2, true, [-0.5, 0.0], [1.0, 0.5])\n\njulia> StatsBase.transform(dt, X)\n2×3 Matrix{Float64}:\n 0.5 0.0 1.0\n 0.0 0.5 1.0\n\n\n\n\n\n","category":"method"},{"location":"transformations/#Methods","page":"Data Transformations","title":"Methods","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"StatsBase.transform\nStatsBase.transform!\nStatsBase.reconstruct\nStatsBase.reconstruct!\nstandardize","category":"page"},{"location":"transformations/#StatsBase.transform","page":"Data Transformations","title":"StatsBase.transform","text":"transform(t::AbstractDataTransform, x)\n\nReturn a standardized copy of vector or matrix x using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.transform!","page":"Data Transformations","title":"StatsBase.transform!","text":"transform!(t::AbstractDataTransform, x)\n\nApply transformation t to vector or matrix x in place.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.reconstruct","page":"Data Transformations","title":"StatsBase.reconstruct","text":"reconstruct(t::AbstractDataTransform, y)\n\nReturn a reconstruction of an originally scaled data from a transformed vector or matrix y using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.reconstruct!","page":"Data Transformations","title":"StatsBase.reconstruct!","text":"reconstruct!(t::AbstractDataTransform, y)\n\nPerform an in-place reconstruction into an original data scale from a transformed vector or matrix y using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.standardize","page":"Data Transformations","title":"StatsBase.standardize","text":"standardize(DT, X; dims=nothing, kwargs...)\n\nReturn a standardized copy of vector or matrix X along dimensions dims using transformation DT which is a subtype of AbstractDataTransform:\n\nZScoreTransform\nUnitRangeTransform\n\nExample\n\njulia> using StatsBase\n\njulia> standardize(ZScoreTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)\n2×3 Matrix{Float64}:\n 0.0 -1.0 1.0\n -1.0 0.0 1.0\n\njulia> standardize(UnitRangeTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)\n2×3 Matrix{Float64}:\n 0.5 0.0 1.0\n 0.0 0.5 1.0\n\n\n\n\n\n","category":"function"},{"location":"transformations/#Types","page":"Data Transformations","title":"Types","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"UnitRangeTransform\nZScoreTransform","category":"page"},{"location":"transformations/#StatsBase.UnitRangeTransform","page":"Data Transformations","title":"StatsBase.UnitRangeTransform","text":"Unit range normalization\n\n\n\n\n\n","category":"type"},{"location":"transformations/#StatsBase.ZScoreTransform","page":"Data Transformations","title":"StatsBase.ZScoreTransform","text":"Standardization (Z-score transformation)\n\n\n\n\n\n","category":"type"},{"location":"signalcorr/#Correlation-Analysis-of-Signals","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"The package provides functions to perform correlation analysis of sequential signals.","category":"page"},{"location":"signalcorr/#Autocovariance-and-Autocorrelation","page":"Correlation Analysis of Signals","title":"Autocovariance and Autocorrelation","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"autocov\nautocov!\nautocor\nautocor!","category":"page"},{"location":"signalcorr/#StatsBase.autocov","page":"Correlation Analysis of Signals","title":"StatsBase.autocov","text":"autocov(x, [lags]; demean=true)\n\nCompute the autocovariance of a vector or matrix x, optionally specifying the lags at which to compute the autocovariance. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.\n\nWhen left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).\n\nThe output is not normalized. See autocor for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocov!","page":"Correlation Analysis of Signals","title":"StatsBase.autocov!","text":"autocov!(r, x, lags; demean=true)\n\nCompute the autocovariance of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.\n\nIf x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.\n\nThe output is not normalized. See autocor! for a method with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocor","page":"Correlation Analysis of Signals","title":"StatsBase.autocor","text":"autocor(x, [lags]; demean=true)\n\nCompute the autocorrelation function (ACF) of a vector or matrix x, optionally specifying the lags. demean denotes whether the mean of x should be subtracted from x before computing the ACF.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.\n\nWhen left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).\n\nThe output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocor!","page":"Correlation Analysis of Signals","title":"StatsBase.autocor!","text":"autocor!(r, x, lags; demean=true)\n\nCompute the autocorrelation function (ACF) of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the ACF.\n\nIf x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.\n\nThe output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov! for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#Cross-covariance-and-Cross-correlation","page":"Correlation Analysis of Signals","title":"Cross-covariance and Cross-correlation","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"crosscov\ncrosscov!\ncrosscor\ncrosscor!","category":"page"},{"location":"signalcorr/#StatsBase.crosscov","page":"Correlation Analysis of Signals","title":"StatsBase.crosscov","text":"crosscov(x, y, [lags]; demean=true)\n\nCompute the cross covariance function (CCF) between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.\n\nIf both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.\n\nWhen left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).\n\nThe output is not normalized. See crosscor for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscov!","page":"Correlation Analysis of Signals","title":"StatsBase.crosscov!","text":"crosscov!(r, x, y, lags; demean=true)\n\nCompute the cross covariance function (CCF) between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.\n\nIf both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).\n\nThe output is not normalized. See crosscor! for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscor","page":"Correlation Analysis of Signals","title":"StatsBase.crosscor","text":"crosscor(x, y, [lags]; demean=true)\n\nCompute the cross correlation between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.\n\nIf both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.\n\nWhen left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).\n\nThe output is normalized by sqrt(var(x)*var(y)). See crosscov for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscor!","page":"Correlation Analysis of Signals","title":"StatsBase.crosscor!","text":"crosscor!(r, x, y, lags; demean=true)\n\nCompute the cross correlation between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.\n\nIf both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).\n\nThe output is normalized by sqrt(var(x)*var(y)). See crosscov! for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#Partial-Autocorrelation-Function","page":"Correlation Analysis of Signals","title":"Partial Autocorrelation Function","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"pacf\npacf!","category":"page"},{"location":"signalcorr/#StatsBase.pacf","page":"Correlation Analysis of Signals","title":"StatsBase.pacf","text":"pacf(X, lags; method=:regression)\n\nCompute the partial autocorrelation function (PACF) of a real-valued vector or matrix X at lags. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x, 2)), where each column in the result corresponds to a column in x.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.pacf!","page":"Correlation Analysis of Signals","title":"StatsBase.pacf!","text":"pacf!(r, X, lags; method=:regression)\n\nCompute the partial autocorrelation function (PACF) of a matrix X at lags and store the result in r. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.\n\nr must be a matrix of size (length(lags), size(x, 2)).\n\n\n\n\n\n","category":"function"},{"location":"counts/#Counting-Functions","page":"Counting Functions","title":"Counting Functions","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"The package provides functions to count the occurrences of distinct values.","category":"page"},{"location":"counts/#Counting-over-an-Integer-Range","page":"Counting Functions","title":"Counting over an Integer Range","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"counts\nproportions\naddcounts!(r::AbstractArray, x::AbstractArray{<:Integer}, levels::UnitRange{<:Integer})","category":"page"},{"location":"counts/#StatsBase.counts","page":"Counting Functions","title":"StatsBase.counts","text":"counts(x, [wv::AbstractWeights])\ncounts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])\ncounts(x, k::Integer, [wv::AbstractWeights])\n\nCount the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\nThe output is a vector of length length(levels).\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.proportions","page":"Counting Functions","title":"StatsBase.proportions","text":"proportions(x, levels=span(x), [wv::AbstractWeights])\n\nReturn the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\nproportions(x, k::Integer, [wv::AbstractWeights])\n\nReturn the proportion of integers in 1 to k that occur in x.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.addcounts!-Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}","page":"Counting Functions","title":"StatsBase.addcounts!","text":"addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])\n\nAdd the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].\n\nIf a weighting vector wv is specified, the sum of weights is used rather than the raw counts.\n\n\n\n\n\n","category":"method"},{"location":"counts/#Counting-over-arbitrary-distinct-values","page":"Counting Functions","title":"Counting over arbitrary distinct values","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"countmap\nproportionmap\naddcounts!(cm::Dict, x::Any)","category":"page"},{"location":"counts/#StatsBase.countmap","page":"Counting Functions","title":"StatsBase.countmap","text":"countmap(x; alg = :auto)\ncountmap(x::AbstractVector, wv::AbstractVector{<:Real})\n\nReturn a dictionary mapping each unique value in x to its number of occurrences.\n\nIf a weighting vector wv is specified, the sum of weights is used rather than the raw counts.\n\nalg is only allowed for unweighted counting and can be one of:\n\n:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.\n:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.\n:dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.proportionmap","page":"Counting Functions","title":"StatsBase.proportionmap","text":"proportionmap(x)\nproportionmap(x::AbstractVector, w::AbstractVector{<:Real})\n\nReturn a dictionary mapping each unique value in x to its proportion in x.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.addcounts!-Tuple{Dict, Any}","page":"Counting Functions","title":"StatsBase.addcounts!","text":"addcounts!(dict, x; alg = :auto)\naddcounts!(dict, x, wv)\n\nAdd counts based on x to a count map. New entries will be added if new values come up.\n\nIf a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.\n\nalg is only allowed for unweighted counting and can be one of:\n\n:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.\n:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.\n:dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.\n\n\n\n\n\n","category":"method"},{"location":"scalarstats/#Scalar-Statistics","page":"Scalar Statistics","title":"Scalar Statistics","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"The package implements functions for computing various statistics over an array of scalar real numbers.","category":"page"},{"location":"scalarstats/#Weighted-sum-and-mean","page":"Scalar Statistics","title":"Weighted sum and mean","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"sum\nsum!\nwsum\nwsum!\nmean\nmean!","category":"page"},{"location":"scalarstats/#Base.sum","page":"Scalar Statistics","title":"Base.sum","text":"sum(v::AbstractArray, w::AbstractWeights{<:Real}; [dims])\n\nCompute the weighted sum of an array v with weights w, optionally over the dimension dims.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Base.sum!","page":"Scalar Statistics","title":"Base.sum!","text":"sum!(R::AbstractArray, A::AbstractArray,\n w::AbstractWeights{<:Real}, dim::Int;\n init::Bool=true)\n\nCompute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.wsum","page":"Scalar Statistics","title":"StatsBase.wsum","text":"wsum(v, w::AbstractVector, [dim])\n\nCompute the weighted sum of an array v with weights w, optionally over the dimension dim.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.wsum!","page":"Scalar Statistics","title":"StatsBase.wsum!","text":"wsum!(R::AbstractArray, A::AbstractArray,\n w::AbstractVector, dim::Int;\n init::Bool=true)\n\nCompute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.mean","page":"Scalar Statistics","title":"Statistics.mean","text":"mean(A::AbstractArray, w::AbstractWeights[, dims::Int])\n\nCompute the weighted mean of array A with weight vector w (of type AbstractWeights). If dim is provided, compute the weighted mean along dimension dims.\n\nExamples\n\nn = 20\nx = rand(n)\nw = rand(n)\nmean(x, weights(w))\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.mean!","page":"Scalar Statistics","title":"Statistics.mean!","text":"mean!(R::AbstractArray, A::AbstractArray, w::AbstractWeights[; dims=nothing])\n\nCompute the weighted mean of array A with weight vector w (of type AbstractWeights) along dimension dims, and write results to R.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Means","page":"Scalar Statistics","title":"Means","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"The package provides functions to compute means of different kinds.","category":"page"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"geomean\nharmmean\ngenmean","category":"page"},{"location":"scalarstats/#StatsBase.geomean","page":"Scalar Statistics","title":"StatsBase.geomean","text":"geomean(a)\n\nReturn the geometric mean of a collection.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.harmmean","page":"Scalar Statistics","title":"StatsBase.harmmean","text":"harmmean(a)\n\nReturn the harmonic mean of a collection.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.genmean","page":"Scalar Statistics","title":"StatsBase.genmean","text":"genmean(a, p)\n\nReturn the generalized/power mean with exponent p of a real-valued array, i.e. left( frac1n sum_i=1^n a_i^p right)^frac1p, where n = length(a). It is taken to be the geometric mean when p == 0.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Moments-and-cumulants","page":"Scalar Statistics","title":"Moments and cumulants","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"var\nstd\nmean_and_var\nmean_and_std\nskewness\nkurtosis\nmoment\ncumulant","category":"page"},{"location":"scalarstats/#Statistics.var","page":"Scalar Statistics","title":"Statistics.var","text":"var(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)\n\nCompute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:\n\nfrac1sumw sum_i=1^n w_ileft(x_i - μright)^2 \n\nwhere n is the length of the input and μ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing frac1sumw with a factor dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\nvar(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the variance of the vector x using the estimator ce.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.std","page":"Scalar Statistics","title":"Statistics.std","text":"std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)\n\nCompute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:\n\nsqrtfrac1sumw sum_i=1^n w_ileft(x_i - μright)^2 \n\nwhere n is the length of the input and μ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing frac1sumw with a factor dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\nstd(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the standard deviation of the vector x using the estimator ce.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mean_and_var","page":"Scalar Statistics","title":"StatsBase.mean_and_var","text":"mean_and_var(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, var)\n\nReturn the mean and variance of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mean_and_std","page":"Scalar Statistics","title":"StatsBase.mean_and_std","text":"mean_and_std(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, std)\n\nReturn the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.skewness","page":"Scalar Statistics","title":"StatsBase.skewness","text":"skewness(v, [wv::AbstractWeights], m=mean(v))\n\nCompute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.kurtosis","page":"Scalar Statistics","title":"StatsBase.kurtosis","text":"kurtosis(v, [wv::AbstractWeights], m=mean(v))\n\nCompute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.moment","page":"Scalar Statistics","title":"StatsBase.moment","text":"moment(v, k, [wv::AbstractWeights], m=mean(v))\n\nReturn the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.cumulant","page":"Scalar Statistics","title":"StatsBase.cumulant","text":"cumulant(v, k, [wv::AbstractWeights], m=mean(v))\n\nReturn the kth order cumulant of a real-valued array v, optionally specifying a weighting vector wv and a pre-computed mean m.\n\nIf k is a range of Integers, then return all the cumulants of orders in this range as a vector.\n\nThis quantity is calculated using a recursive definition on lower-order cumulants and central moments.\n\nReference: Smith, P. J. 1995. A Recursive Formulation of the Old Problem of Obtaining Moments from Cumulants and Vice Versa. The American Statistician, 49(2), 217–218. https://doi.org/10.2307/2684642\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Measurements-of-Variation","page":"Scalar Statistics","title":"Measurements of Variation","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"span\nvariation\nsem\nmad\nmad!","category":"page"},{"location":"scalarstats/#StatsBase.span","page":"Scalar Statistics","title":"StatsBase.span","text":"span(x)\n\nReturn the span of a collection, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one pass using extrema.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.variation","page":"Scalar Statistics","title":"StatsBase.variation","text":"variation(x, m=mean(x); corrected=true)\n\nReturn the coefficient of variation of collection x, optionally specifying a precomputed mean m, and the optional correction parameter corrected. The coefficient of variation is the ratio of the standard deviation to the mean. If corrected is false, then std is calculated with denominator n. Else, the std is calculated with denominator n-1.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.sem","page":"Scalar Statistics","title":"StatsBase.sem","text":"sem(x; mean=nothing)\nsem(x::AbstractArray[, weights::AbstractWeights]; mean=nothing)\n\nReturn the standard error of the mean for a collection x. A pre-computed mean may be provided.\n\nWhen not using weights, this is the (sample) standard deviation divided by the sample size. If weights are used, the variance of the sample mean is calculated as follows:\n\nAnalyticWeights: Not implemented.\nFrequencyWeights: fracsum_i=1^n w_i (x_i - barx_i)^2(sum w_i) (sum w_i - 1)\nProbabilityWeights: fracnn-1 fracsum_i=1^n w_i^2 (x_i - barx_i)^2left( sum w_i right)^2\n\nThe standard error is then the square root of the above quantities.\n\nReferences\n\nCarl-Erik Särndal, Bengt Swensson, Jan Wretman (1992). Model Assisted Survey Sampling. New York: Springer. pp. 51-53.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mad","page":"Scalar Statistics","title":"StatsBase.mad","text":"mad(x; center=median(x), normalize=true)\n\nCompute the median absolute deviation (MAD) of collection x around center (by default, around the median).\n\nIf normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mad!","page":"Scalar Statistics","title":"StatsBase.mad!","text":"StatsBase.mad!(x; center=median!(x), normalize=true)\n\nCompute the median absolute deviation (MAD) of array x around center (by default, around the median), overwriting x in the process.\n\nIf normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Z-scores","page":"Scalar Statistics","title":"Z-scores","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"zscore\nzscore!","category":"page"},{"location":"scalarstats/#StatsBase.zscore","page":"Scalar Statistics","title":"StatsBase.zscore","text":"zscore(X, [μ, σ])\n\nCompute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. (x - μ) σ.\n\nμ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.zscore!","page":"Scalar Statistics","title":"StatsBase.zscore!","text":"zscore!([Z], X, μ, σ)\n\nCompute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. (x - μ) σ.\n\nIf a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Entropy-and-Related-Functions","page":"Scalar Statistics","title":"Entropy and Related Functions","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"entropy\nrenyientropy\ncrossentropy\nkldivergence","category":"page"},{"location":"scalarstats/#StatsBase.entropy","page":"Scalar Statistics","title":"StatsBase.entropy","text":"entropy(p, [b])\n\nCompute the entropy of a collection of probabilities p, optionally specifying a real number b such that the entropy is scaled by 1/log(b). Elements with probability 0 or 1 add 0 to the entropy.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.renyientropy","page":"Scalar Statistics","title":"StatsBase.renyientropy","text":"renyientropy(p, α)\n\nCompute the Rényi (generalized) entropy of order α of an array p.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.crossentropy","page":"Scalar Statistics","title":"StatsBase.crossentropy","text":"crossentropy(p, q, [b])\n\nCompute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.kldivergence","page":"Scalar Statistics","title":"StatsBase.kldivergence","text":"kldivergence(p, q, [b])\n\nCompute the Kullback-Leibler divergence from q to p, also called the relative entropy of p with respect to q, that is the sum pᵢ * log(pᵢ / qᵢ). Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Quantile-and-Related-Functions","page":"Scalar Statistics","title":"Quantile and Related Functions","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"percentile\niqr\nnquantile\nquantile\nStatistics.median(v::AbstractVector{<:Real}, w::AbstractWeights{<:Real})\nquantilerank\npercentilerank","category":"page"},{"location":"scalarstats/#StatsBase.percentile","page":"Scalar Statistics","title":"StatsBase.percentile","text":"percentile(x, p)\n\nReturn the pth percentile of a collection x, i.e. quantile(x, p / 100).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.iqr","page":"Scalar Statistics","title":"StatsBase.iqr","text":"iqr(x)\n\nCompute the interquartile range (IQR) of collection x, i.e. the 75th percentile minus the 25th percentile.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.nquantile","page":"Scalar Statistics","title":"StatsBase.nquantile","text":"nquantile(x, n::Integer)\n\nReturn the n-quantiles of collection x, i.e. the values which partition v into n subsets of nearly equal size.\n\nEquivalent to quantile(x, [0:n]/n). For example, nquantiles(x, 5) returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.quantile","page":"Scalar Statistics","title":"Statistics.quantile","text":"quantile(v, w::AbstractWeights, p)\n\nCompute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.\n\nWith FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.\n\nWith non FrequencyWeights, denote N the length of the vector, w the vector of weights, h = p (sum_i= N w_i - w_1) + w_1 the cumulative weight corresponding to the probability p and S_k = sum_i=k w_i the cumulative weight for each observation, define v_k+1 the smallest element of v such that S_k+1 is strictly superior to h. The weighted p quantile is given by v_k + gamma (v_k+1 - v_k) with gamma = (h - S_k)(S_k+1 - S_k). In particular, when all weights are equal, the function returns the same result as the unweighted quantile.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.median-Tuple{AbstractVector{<:Real}, AbstractWeights}","page":"Scalar Statistics","title":"Statistics.median","text":"median(v::AbstractVector{<:Real}, w::AbstractWeights)\n\nCompute the weighted median of v with weights w (of type AbstractWeights). See the documentation for quantile for more details.\n\n\n\n\n\n","category":"method"},{"location":"scalarstats/#StatsBase.quantilerank","page":"Scalar Statistics","title":"StatsBase.quantilerank","text":"quantilerank(itr, value; method=:inc)\n\nCompute the quantile position in the [0, 1] interval of value relative to collection itr.\n\nDifferent definitions can be chosen via the method keyword argument. Let count_less be the number of elements of itr that are less than value, count_equal the number of elements of itr that are equal to value, n the length of itr, greatest_smaller the highest value below value and smallest_greater the lowest value above value. Then method supports the following definitions:\n\n:inc (default): Return a value in the range 0 to 1 inclusive.\n\nReturn count_less / (n - 1) if value ∈ itr, otherwise apply interpolation based on definition 7 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK and PERCENTRANK.INC). This definition corresponds to the lower semi-continuous inverse of quantile with its default parameters.\n\n:exc: Return a value in the range 0 to 1 exclusive.\n\nReturn (count_less + 1) / (n + 1) if value ∈ itr otherwise apply interpolation based on definition 6 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK.EXC).\n\n:compete: Return count_less / (n - 1) if value ∈ itr, otherwise\n\nreturn (count_less - 1) / (n - 1), without interpolation (equivalent to MariaDB PERCENT_RANK, dplyr percent_rank).\n\n:tied: Return (count_less + count_equal/2) / n, without interpolation.\n\nBased on the definition in Roscoe, J. T. (1975) (equivalent to \"mean\" kind of SciPy percentileofscore).\n\n:strict: Return count_less / n, without interpolation\n\n(equivalent to \"strict\" kind of SciPy percentileofscore).\n\n:weak: Return (count_less + count_equal) / n, without interpolation\n\n(equivalent to \"weak\" kind of SciPy percentileofscore).\n\nnote: Note\nAn ArgumentError is thrown if itr contains NaN or missing values or if itr contains fewer than two elements.\n\nReferences\n\nRoscoe, J. T. (1975). Fundamental Research Statistics for the Behavioral Sciences\", 2nd ed., New York : Holt, Rinehart and Winston.\n\nHyndman, R.J and Fan, Y. (1996) \"Sample Quantiles in Statistical Packages\", The American Statistician, Vol. 50, No. 4, pp. 361-365.\n\nExamples\n\njulia> using StatsBase\n\njulia> v1 = [1, 1, 1, 2, 3, 4, 8, 11, 12, 13];\n\njulia> v2 = [1, 2, 3, 5, 6, missing, 8];\n\njulia> v3 = [1, 2, 3, 4, 4, 5, 6, 7, 8, 9];\n\njulia> quantilerank(v1, 2)\n0.3333333333333333\n\njulia> quantilerank(v1, 2, method=:exc), quantilerank(v1, 2, method=:tied)\n(0.36363636363636365, 0.35)\n\n# use `skipmissing` for vectors with missing entries.\njulia> quantilerank(skipmissing(v2), 4)\n0.5\n\n# use broadcasting with `Ref` to compute quantile rank for multiple values\njulia> quantilerank.(Ref(v3), [4, 8])\n2-element Vector{Float64}:\n 0.3333333333333333\n 0.8888888888888888\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.percentilerank","page":"Scalar Statistics","title":"StatsBase.percentilerank","text":"percentilerank(itr, value; method=:inc)\n\nReturn the qth percentile of value in collection itr, i.e. quantilerank(itr, value) * 100.\n\nSee the quantilerank docstring for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Mode-and-Modes","page":"Scalar Statistics","title":"Mode and Modes","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"mode\nmodes","category":"page"},{"location":"scalarstats/#StatsBase.mode","page":"Scalar Statistics","title":"StatsBase.mode","text":"mode(a, [r])\nmode(a::AbstractArray, wv::AbstractWeights)\n\nReturn the mode (most common number) of an array, optionally over a specified range r or weighted via a vector wv. If several modes exist, the first one (in order of appearance) is returned.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.modes","page":"Scalar Statistics","title":"StatsBase.modes","text":"modes(a, [r])::Vector\nmode(a::AbstractArray, wv::AbstractWeights)::Vector\n\nReturn all modes (most common numbers) of an array, optionally over a specified range r or weighted via vector wv.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Summary-Statistics","page":"Scalar Statistics","title":"Summary Statistics","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"summarystats\ndescribe","category":"page"},{"location":"scalarstats/#StatsBase.summarystats","page":"Scalar Statistics","title":"StatsBase.summarystats","text":"summarystats(a)\n\nCompute summary statistics for a real-valued array a. Returns a SummaryStats object containing the number of observations, number of missing observations, standard deviation, mean, minimum, 25th percentile, median, 75th percentile, and maximum.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#DataAPI.describe","page":"Scalar Statistics","title":"DataAPI.describe","text":"describe(a)\n\nPretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Reliability-Measures","page":"Scalar Statistics","title":"Reliability Measures","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"cronbachalpha","category":"page"},{"location":"scalarstats/#StatsBase.cronbachalpha","page":"Scalar Statistics","title":"StatsBase.cronbachalpha","text":"cronbachalpha(covmatrix::AbstractMatrix{<:Real})\n\nCalculate Cronbach's alpha (1951) from a covariance matrix covmatrix according to the formula:\n\nrho = frackk-1 (1 - fracsum^k_i=1 sigma^2_isum_i=1^k sum_j=1^k sigma_ij)\n\nwhere k is the number of items, i.e. columns, sigma_i^2 the item variance, and sigma_ij the inter-item covariance.\n\nReturns a CronbachAlpha object that holds:\n\nalpha: the Cronbach's alpha score for all items, i.e. columns, in covmatrix; and\ndropped: a vector giving Cronbach's alpha scores if a specific item, i.e. column, is dropped from covmatrix.\n\nExample\n\njulia> using StatsBase\n\njulia> cov_X = [10 6 6 6;\n 6 11 6 6;\n 6 6 12 6;\n 6 6 6 13];\n\njulia> cronbachalpha(cov_X)\nCronbach's alpha for all items: 0.8136\n\nCronbach's alpha if an item is dropped:\nitem 1: 0.7500\nitem 2: 0.7606\nitem 3: 0.7714\nitem 4: 0.7826\n\n\n\n\n\n","category":"function"},{"location":"ranking/#Rankings-and-Rank-Correlations","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"","category":"section"},{"location":"ranking/","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"This package implements various strategies for computing ranks and rank correlations.","category":"page"},{"location":"ranking/","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"ordinalrank\ncompeterank\ndenserank\ntiedrank\ncorspearman\ncorkendall","category":"page"},{"location":"ranking/#StatsBase.ordinalrank","page":"Rankings and Rank Correlations","title":"StatsBase.ordinalrank","text":"ordinalrank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the ordinal ranking (\"1234\" ranking) of an array. Supports the same keyword arguments as the sort function. All items in x are given distinct, successive ranks based on their position in the sorted vector. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.competerank","page":"Rankings and Rank Correlations","title":"StatsBase.competerank","text":"competerank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the standard competition ranking (\"1224\" ranking) of an array. Supports the same keyword arguments as the sort function. Equal (\"tied\") items are given the same rank, and the next rank comes after a gap that is equal to the number of tied items - 1. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.denserank","page":"Rankings and Rank Correlations","title":"StatsBase.denserank","text":"denserank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the dense ranking (\"1223\" ranking) of an array. Supports the same keyword arguments as the sort function. Equal items receive the same rank, and the next subsequent rank is assigned with no gap. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.tiedrank","page":"Rankings and Rank Correlations","title":"StatsBase.tiedrank","text":"tiedrank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the tied ranking, also called fractional or \"1 2.5 2.5 4\" ranking, of an array. Supports the same keyword arguments as the sort function. Equal (\"tied\") items receive the mean of the ranks they would have been assigned under the ordinal ranking (see ordinalrank). Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.corspearman","page":"Rankings and Rank Correlations","title":"StatsBase.corspearman","text":"corspearman(x, y=x)\n\nCompute Spearman's rank correlation coefficient. If x and y are vectors, the output is a float, otherwise it's a matrix corresponding to the pairwise correlations of the columns of x and y.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.corkendall","page":"Rankings and Rank Correlations","title":"StatsBase.corkendall","text":"corkendall(x, y=x)\n\nCompute Kendall's rank correlation coefficient, τ. x and y must both be either matrices or vectors.\n\n\n\n\n\n","category":"function"},{"location":"robust/#Robust-Statistics","page":"Robust Statistics","title":"Robust Statistics","text":"","category":"section"},{"location":"robust/","page":"Robust Statistics","title":"Robust Statistics","text":"trim\ntrim!\nwinsor\nwinsor!\ntrimvar","category":"page"},{"location":"robust/#StatsBase.trim","page":"Robust Statistics","title":"StatsBase.trim","text":"trim(x::AbstractVector; prop=0.0, count=0)\n\nReturn an iterator of all elements of x that omits either count or proportion prop of the highest and lowest elements.\n\nThe number of trimmed elements could be smaller than specified if several elements equal the lower or upper bound.\n\nTo compute the trimmed mean of x use mean(trim(x)); to compute the variance use trimvar(x) (see trimvar).\n\nExample\n\njulia> collect(trim([5,2,4,3,1], prop=0.2))\n3-element Array{Int64,1}:\n 2\n 4\n 3\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.trim!","page":"Robust Statistics","title":"StatsBase.trim!","text":"trim!(x::AbstractVector; prop=0.0, count=0)\n\nA variant of trim that modifies x in place.\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.winsor","page":"Robust Statistics","title":"StatsBase.winsor","text":"winsor(x::AbstractVector; prop=0.0, count=0)\n\nReturn an iterator of all elements of x that replaces either count or proportion prop of the highest elements with the previous-highest element and an equal number of the lowest elements with the next-lowest element.\n\nThe number of replaced elements could be smaller than specified if several elements equal the lower or upper bound.\n\nTo compute the Winsorized mean of x use mean(winsor(x)).\n\nExample\n\njulia> collect(winsor([5,2,3,4,1], prop=0.2))\n5-element Array{Int64,1}:\n 4\n 2\n 3\n 4\n 2\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.winsor!","page":"Robust Statistics","title":"StatsBase.winsor!","text":"winsor!(x::AbstractVector; prop=0.0, count=0)\n\nA variant of winsor that modifies vector x in place.\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.trimvar","page":"Robust Statistics","title":"StatsBase.trimvar","text":"trimvar(x; prop=0.0, count=0)\n\nCompute the variance of the trimmed mean of x. This function uses the Winsorized variance, as described in Wilcox (2010).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Sampling-from-Population","page":"Sampling from Population","title":"Sampling from Population","text":"","category":"section"},{"location":"sampling/#Sampling-API","page":"Sampling from Population","title":"Sampling API","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"The package provides functions for sampling from a given population (with or without replacement).","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"sample\nsample!\nwsample\nwsample!","category":"page"},{"location":"sampling/#StatsBase.sample","page":"Sampling from Population","title":"StatsBase.sample","text":"sample([rng], a, [wv::AbstractWeights])\n\nSelect a single random element of a. Sampling probabilities are proportional to the weights given in wv, if provided.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)\n\nSelect a random, optionally weighted sample of size n from an array a using a polyalgorithm. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)\n\nSelect a random, optionally weighted sample from an array a specifying the dimensions dims of the output array. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], wv::AbstractWeights)\n\nSelect a single random integer in 1:length(wv) with probabilities proportional to the weights given in wv.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.sample!","page":"Sampling from Population","title":"StatsBase.sample!","text":"sample!([rng], a, [wv::AbstractWeights], x; replace=true, ordered=false)\n\nDraw a random sample of length(x) elements from an array a and store the result in x. A polyalgorithm is used for sampling. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\nOutput array a must not be the same object as x or wv nor share memory with them, or the result may be incorrect.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.wsample","page":"Sampling from Population","title":"StatsBase.wsample","text":"wsample([rng], [a], w)\n\nSelect a weighted random sample of size 1 from a with probabilities proportional to the weights given in w. If a is not present, select a random weight from w.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nwsample([rng], [a], w, n::Integer; replace=true, ordered=false)\n\nSelect a weighted random sample of size n from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nwsample([rng], [a], w, dims::Dims; replace=true, ordered=false)\n\nSelect a weighted random sample from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. The dimensions of the output are given by dims.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.wsample!","page":"Sampling from Population","title":"StatsBase.wsample!","text":"wsample!([rng], a, w, x; replace=true, ordered=false)\n\nSelect a weighted sample from an array a and store the result in x. Sampling probabilities are proportional to the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Algorithms","page":"Sampling from Population","title":"Algorithms","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Internally, this package implements multiple algorithms, and the sample (and sample!) methods integrate them into a poly-algorithm, which chooses a specific algorithm based on inputs.","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Note that the choices made in sample are decided based on extensive benchmarking (see perf/sampling.jl and perf/wsampling.jl). It performs reasonably fast for most cases. That being said, if you know that a certain algorithm is particularly suitable for your context, directly calling an internal algorithm function might be slightly more efficient.","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Here are a list of algorithms implemented in the package. The functions below are not exported (one can still import them from StatsBase via using though).","category":"page"},{"location":"sampling/#Notations","page":"Sampling from Population","title":"Notations","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"a: source array representing the population\nx: the destination array\nwv: the weight vector (of type AbstractWeights), for weighted sampling\nn: the length of a\nk: the length of x. For sampling without replacement, k must not exceed n.\nrng: optional random number generator (defaults to Random.default_rng() on Julia >= 1.3 and Random.GLOBAL_RNG on Julia < 1.3)","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"All following functions write results to x (pre-allocated) and return x.","category":"page"},{"location":"sampling/#Sampling-Algorithms-(Non-Weighted)","page":"Sampling from Population","title":"Sampling Algorithms (Non-Weighted)","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"StatsBase.direct_sample!(rng::Random.AbstractRNG, a::AbstractArray, x::AbstractArray)\nsamplepair\nStatsBase.knuths_sample!\nStatsBase.fisher_yates_sample!\nStatsBase.self_avoid_sample!\nStatsBase.seqsample_a!\nStatsBase.seqsample_c!\nStatsBase.seqsample_d!","category":"page"},{"location":"sampling/#StatsBase.direct_sample!-Tuple{AbstractRNG, AbstractArray, AbstractArray}","page":"Sampling from Population","title":"StatsBase.direct_sample!","text":"direct_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nDirect sampling: for each j in 1:k, randomly pick i from 1:n, and set x[j] = a[i], with n=length(a) and k=length(x).\n\nThis algorithm consumes k random numbers.\n\n\n\n\n\n","category":"method"},{"location":"sampling/#StatsBase.samplepair","page":"Sampling from Population","title":"StatsBase.samplepair","text":"samplepair([rng], n)\n\nDraw a pair of distinct integers between 1 and n without replacement.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsamplepair([rng], a)\n\nDraw a pair of distinct elements from the array a without replacement.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.knuths_sample!","page":"Sampling from Population","title":"StatsBase.knuths_sample!","text":"knuths_sample!([rng], a, x)\n\nKnuth's Algorithm S for random sampling without replacement.\n\nReference: D. Knuth. The Art of Computer Programming. Vol 2, 3.4.2, p.142.\n\nThis algorithm consumes length(a) random numbers. It requires no additional memory space. Suitable for the case where memory is tight.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.fisher_yates_sample!","page":"Sampling from Population","title":"StatsBase.fisher_yates_sample!","text":"fisher_yates_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nFisher-Yates shuffling (with early termination).\n\nPseudo-code:\n\nn = length(a)\nk = length(x)\n\n# Create an array of the indices\ninds = collect(1:n)\n\nfor i = 1:k\n # swap element `i` with another random element in inds[i:n]\n # set element `i` in `x`\nend\n\nThis algorithm consumes k=length(x) random numbers. It uses an integer array of length n=length(a) internally to maintain the shuffled indices. It is considerably faster than Knuth's algorithm especially when n is greater than k. It is O(n) for initialization, plus O(k) for random shuffling\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.self_avoid_sample!","page":"Sampling from Population","title":"StatsBase.self_avoid_sample!","text":"self_avoid_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nSelf-avoid sampling: use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one.\n\nThis algorithm consumes about (or slightly more than) k=length(x) random numbers, and requires O(k) memory to store the set of sampled indices. Very fast when n k, with n=length(a).\n\nHowever, if k is large and approaches n, the rejection rate would increase drastically, resulting in poorer performance.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_a!","page":"Sampling from Population","title":"StatsBase.seqsample_a!","text":"seqsample_a!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm A described in the following paper (page 714): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(n) random numbers, with n=length(a). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_c!","page":"Sampling from Population","title":"StatsBase.seqsample_c!","text":"seqsample_c!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm C described in the following paper (page 715): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(k^2) random numbers, with k=length(x). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_d!","page":"Sampling from Population","title":"StatsBase.seqsample_d!","text":"seqsample_d!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm D described in the following paper (page 716-17): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(k) random numbers, with k=length(x). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Weighted-Sampling-Algorithms","page":"Sampling from Population","title":"Weighted Sampling Algorithms","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"StatsBase.direct_sample!(rng::Random.AbstractRNG, a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\nStatsBase.alias_sample!\nStatsBase.naive_wsample_norep!\nStatsBase.efraimidis_a_wsample_norep!\nStatsBase.efraimidis_ares_wsample_norep!","category":"page"},{"location":"sampling/#StatsBase.direct_sample!-Tuple{AbstractRNG, AbstractArray, AbstractWeights, AbstractArray}","page":"Sampling from Population","title":"StatsBase.direct_sample!","text":"direct_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nDirect sampling.\n\nDraw each sample by scanning the weight vector.\n\nNoting k=length(x) and n=length(a), this algorithm:\n\nconsumes k random numbers\nhas time complexity O(n k), as scanning the weight vector each time takes O(n)\nrequires no additional memory space.\n\n\n\n\n\n","category":"method"},{"location":"sampling/#StatsBase.alias_sample!","page":"Sampling from Population","title":"StatsBase.alias_sample!","text":"alias_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nAlias method.\n\nBuild an alias table, and sample therefrom.\n\nReference: Walker, A. J. \"An Efficient Method for Generating Discrete Random Variables with General Distributions.\" ACM Transactions on Mathematical Software 3 (3): 253, 1977.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(n) time for building the alias table, and then O(1) to draw each sample. It consumes k random numbers.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.naive_wsample_norep!","page":"Sampling from Population","title":"StatsBase.naive_wsample_norep!","text":"naive_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nNaive implementation of weighted sampling without replacement.\n\nIt makes a copy of the weight vector at initialization, and sets the weight to zero when the corresponding sample is picked.\n\nNoting k=length(x) and n=length(a), this algorithm consumes O(k) random numbers, and has overall time complexity O(n k).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.efraimidis_a_wsample_norep!","page":"Sampling from Population","title":"StatsBase.efraimidis_a_wsample_norep!","text":"efraimidis_a_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nWeighted sampling without replacement using Efraimidis-Spirakis A algorithm.\n\nReference: Efraimidis, P. S., Spirakis, P. G. \"Weighted random sampling with a reservoir.\" Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(n + k log k) processing time to draw k elements. It consumes n random numbers.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.efraimidis_ares_wsample_norep!","page":"Sampling from Population","title":"StatsBase.efraimidis_ares_wsample_norep!","text":"efraimidis_ares_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nImplementation of weighted sampling without replacement using Efraimidis-Spirakis A-Res algorithm.\n\nReference: Efraimidis, P. S., Spirakis, P. G. \"Weighted random sampling with a reservoir.\" Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(k log(k) log(n k)) processing time to draw k elements. It consumes n random numbers.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#Computing-Deviations","page":"Computing Deviations","title":"Computing Deviations","text":"","category":"section"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"This package provides functions to compute various deviations between arrays in a variety of ways:","category":"page"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"counteq\ncountne\nsqL2dist\nL2dist\nL1dist\nLinfdist\ngkldiv\nmeanad\nmaxad\nmsd\nrmsd\npsnr","category":"page"},{"location":"deviation/#StatsBase.counteq","page":"Computing Deviations","title":"StatsBase.counteq","text":"counteq(a, b)\n\nCount the number of indices at which the elements of the arrays a and b are equal.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.countne","page":"Computing Deviations","title":"StatsBase.countne","text":"countne(a, b)\n\nCount the number of indices at which the elements of the arrays a and b are not equal.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.sqL2dist","page":"Computing Deviations","title":"StatsBase.sqL2dist","text":"sqL2dist(a, b)\n\nCompute the squared L2 distance between two arrays: sum_i=1^n a_i - b_i^2. Efficient equivalent of sum(abs2, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.L2dist","page":"Computing Deviations","title":"StatsBase.L2dist","text":"L2dist(a, b)\n\nCompute the L2 distance between two arrays: sqrtsum_i=1^n a_i - b_i^2. Efficient equivalent of sqrt(sum(abs2, a - b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.L1dist","page":"Computing Deviations","title":"StatsBase.L1dist","text":"L1dist(a, b)\n\nCompute the L1 distance between two arrays: sum_i=1^n a_i - b_i. Efficient equivalent of sum(abs, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.Linfdist","page":"Computing Deviations","title":"StatsBase.Linfdist","text":"Linfdist(a, b)\n\nCompute the L∞ distance, also called the Chebyshev distance, between two arrays: max_iin1n a_i - b_i. Efficient equivalent of maxabs(a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.gkldiv","page":"Computing Deviations","title":"StatsBase.gkldiv","text":"gkldiv(a, b)\n\nCompute the generalized Kullback-Leibler divergence between two arrays: sum_i=1^n (a_i log(a_ib_i) - a_i + b_i). Efficient equivalent of sum(a*log(a/b)-a+b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.meanad","page":"Computing Deviations","title":"StatsBase.meanad","text":"meanad(a, b)\n\nReturn the mean absolute deviation between two arrays: mean(abs, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.maxad","page":"Computing Deviations","title":"StatsBase.maxad","text":"maxad(a, b)\n\nReturn the maximum absolute deviation between two arrays: maxabs(a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.msd","page":"Computing Deviations","title":"StatsBase.msd","text":"msd(a, b)\n\nReturn the mean squared deviation between two arrays: mean(abs2, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.rmsd","page":"Computing Deviations","title":"StatsBase.rmsd","text":"rmsd(a, b; normalize=false)\n\nReturn the root mean squared deviation between two optionally normalized arrays. The root mean squared deviation is computed as sqrt(msd(a, b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.psnr","page":"Computing Deviations","title":"StatsBase.psnr","text":"psnr(a, b, maxv)\n\nCompute the peak signal-to-noise ratio between two arrays a and b. maxv is the maximum possible value either array can take. The PSNR is computed as 10 * log10(maxv^2 / msd(a, b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"note: Note\nAll these functions are implemented in a reasonably efficient way without creating any temporary arrays in the middle.","category":"page"},{"location":"#Getting-Started","page":"Getting Started","title":"Getting Started","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"CurrentModule = StatsBase\nDocTestSetup = quote\n using Statistics\n using Random\nend","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.","category":"page"},{"location":"#Installation","page":"Getting Started","title":"Installation","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"To install StatsBase through the Julia REPL, you can type ] add StatsBase or:","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"using Pkg\nPkg.add(\"StatsBase\")","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"To load the package, use the command:","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"using StatsBase","category":"page"},{"location":"#Available-Features","page":"Getting Started","title":"Available Features","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"Pages = [\"weights.md\", \"scalarstats.md\", \"robust.md\", \"deviation.md\", \"cov.md\", \"counts.md\", \"ranking.md\", \"sampling.md\", \"empirical.md\", \"signalcorr.md\", \"misc.md\", \"statmodels.md\", \"transformations.md\"]\nDepth = 2","category":"page"}] +[{"location":"statmodels/#Abstraction-for-Statistical-Models","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"","category":"section"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"StatsAPI.jl defines an abstract type StatisticalModel, and an abstract subtype RegressionModel. They are both extended by StatsBase, and documented here.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"Particularly, instances of StatisticalModel implement the following methods.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"adjr2\naic\naicc\nbic\ncoef\ncoefnames\ncoeftable\nconfint\ndeviance\ndof\nfit\nfit!\ninformationmatrix\nisfitted\nislinear\nloglikelihood\nmss\nnobs\nnulldeviance\nnullloglikelihood\nr2\nrss\nscore\nstderror\nvcov\nweights","category":"page"},{"location":"statmodels/#StatsAPI.adjr2","page":"Abstraction for Statistical Models","title":"StatsAPI.adjr2","text":"adjr2(model::StatisticalModel)\nadjr²(model::StatisticalModel)\n\nAdjusted coefficient of determination (adjusted R-squared).\n\nFor linear models, the adjusted R² is defined as 1 - (1 - (1-R^2)(n-1)(n-p)), with R^2 the coefficient of determination, n the number of observations, and p the number of coefficients (including the intercept). This definition is generally known as the Wherry Formula I.\n\n\n\n\n\nadjr2(model::StatisticalModel, variant::Symbol)\nadjr²(model::StatisticalModel, variant::Symbol)\n\nAdjusted pseudo-coefficient of determination (adjusted pseudo R-squared). For nonlinear models, one of the several pseudo R² definitions must be chosen via variant. The only currently supported variants are :MacFadden, defined as 1 - (log (L) - k)log (L0) and :devianceratio, defined as 1 - (D(n-k))(D_0(n-1)). In these formulas, L is the likelihood of the model, L0 that of the null model (the model including only the intercept), D is the deviance of the model, D_0 is the deviance of the null model, n is the number of observations (given by nobs) and k is the number of consumed degrees of freedom of the model (as returned by dof).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.aic","page":"Abstraction for Statistical Models","title":"StatsAPI.aic","text":"aic(model::StatisticalModel)\n\nAkaike's Information Criterion, defined as -2 log L + 2k, with L the likelihood of the model, and k its number of consumed degrees of freedom (as returned by dof).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.aicc","page":"Abstraction for Statistical Models","title":"StatsAPI.aicc","text":"aicc(model::StatisticalModel)\n\nCorrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as -2 log L + 2k + 2k(k-1)(n-k-1), with L the likelihood of the model, k its number of consumed degrees of freedom (as returned by dof), and n the number of observations (as returned by nobs).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.bic","page":"Abstraction for Statistical Models","title":"StatsAPI.bic","text":"bic(model::StatisticalModel)\n\nBayesian Information Criterion, defined as -2 log L + k log n, with L the likelihood of the model, k its number of consumed degrees of freedom (as returned by dof), and n the number of observations (as returned by nobs).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coef","page":"Abstraction for Statistical Models","title":"StatsAPI.coef","text":"coef(model::StatisticalModel)\n\nReturn the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coefnames","page":"Abstraction for Statistical Models","title":"StatsAPI.coefnames","text":"coefnames(model::StatisticalModel)\n\nReturn the names of the coefficients.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.coeftable","page":"Abstraction for Statistical Models","title":"StatsAPI.coeftable","text":"coeftable(model::StatisticalModel; level::Real=0.95)\n\nReturn a table with coefficients and related statistics of the model. level determines the level for confidence intervals (by default, 95%).\n\nThe returned CoefTable object implements the Tables.jl interface, and can be converted e.g. to a DataFrame via using DataFrames; DataFrame(coeftable(model)).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.confint","page":"Abstraction for Statistical Models","title":"StatsAPI.confint","text":"confint(model::StatisticalModel; level::Real=0.95)\n\nCompute confidence intervals for coefficients, with confidence level level (by default 95%).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.deviance","page":"Abstraction for Statistical Models","title":"StatsAPI.deviance","text":"deviance(model::StatisticalModel)\n\nReturn the deviance of the model relative to a reference, which is usually when applicable the saturated model. It is equal, up to a constant, to -2 log L, with L the likelihood of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.dof","page":"Abstraction for Statistical Models","title":"StatsAPI.dof","text":"dof(model::StatisticalModel)\n\nReturn the number of degrees of freedom consumed in the model, including when applicable the intercept and the distribution's dispersion parameter.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fit","page":"Abstraction for Statistical Models","title":"StatsAPI.fit","text":"Fit a statistical model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fit!","page":"Abstraction for Statistical Models","title":"StatsAPI.fit!","text":"Fit a statistical model in-place.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.informationmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.informationmatrix","text":"informationmatrix(model::StatisticalModel; expected::Bool = true)\n\nReturn the information matrix of the model. By default the Fisher information matrix is returned, while the observed information matrix can be requested with expected = false.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.isfitted","page":"Abstraction for Statistical Models","title":"StatsAPI.isfitted","text":"isfitted(model::StatisticalModel)\n\nIndicate whether the model has been fitted.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.islinear","page":"Abstraction for Statistical Models","title":"StatsAPI.islinear","text":"islinear(model::StatisticalModel)\n\nIndicate whether the model is linear.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.loglikelihood","page":"Abstraction for Statistical Models","title":"StatsAPI.loglikelihood","text":"loglikelihood(model::StatisticalModel)\nloglikelihood(model::StatisticalModel, observation)\n\nReturn the log-likelihood of the model.\n\nWith an observation argument, return the contribution of observation to the log-likelihood of model.\n\nIf observation is a Colon, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.\n\nIn general, sum(loglikehood(model, :)) == loglikelihood(model).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.mss","page":"Abstraction for Statistical Models","title":"StatsAPI.mss","text":"mss(model::StatisticalModel)\n\nReturn the model sum of squares.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nobs","page":"Abstraction for Statistical Models","title":"StatsAPI.nobs","text":"nobs(model::StatisticalModel)\n\nReturn the number of independent observations on which the model was fitted. Be careful when using this information, as the definition of an independent observation may vary depending on the model, on the format used to pass the data, on the sampling plan (if specified), etc.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nulldeviance","page":"Abstraction for Statistical Models","title":"StatsAPI.nulldeviance","text":"nulldeviance(model::StatisticalModel)\n\nReturn the deviance of the null model, obtained by dropping all independent variables present in model.\n\nIf model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.nullloglikelihood","page":"Abstraction for Statistical Models","title":"StatsAPI.nullloglikelihood","text":"nullloglikelihood(model::StatisticalModel)\n\nReturn the log-likelihood of the null model, obtained by dropping all independent variables present in model.\n\nIf model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.r2","page":"Abstraction for Statistical Models","title":"StatsAPI.r2","text":"r2(model::StatisticalModel)\nr²(model::StatisticalModel)\n\nCoefficient of determination (R-squared).\n\nFor a linear model, the R² is defined as ESSTSS, with ESS the explained sum of squares and TSS the total sum of squares.\n\n\n\n\n\nr2(model::StatisticalModel, variant::Symbol)\nr²(model::StatisticalModel, variant::Symbol)\n\nPseudo-coefficient of determination (pseudo R-squared).\n\nFor nonlinear models, one of several pseudo R² definitions must be chosen via variant. Supported variants are:\n\n:MacFadden (a.k.a. likelihood ratio index), defined as 1 - log (L)log (L_0);\n:CoxSnell, defined as 1 - (L_0L)^2n;\n:Nagelkerke, defined as (1 - (L_0L)^2n)(1 - L_0^2n).\n:devianceratio, defined as 1 - DD_0.\n\nIn the above formulas, L is the likelihood of the model, L_0 is the likelihood of the null model (the model with only an intercept), D is the deviance of the model (from the saturated model), D_0 is the deviance of the null model, n is the number of observations (given by nobs).\n\nThe Cox-Snell and the deviance ratio variants both match the classical definition of R² for linear models.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.rss","page":"Abstraction for Statistical Models","title":"StatsAPI.rss","text":"rss(model::StatisticalModel)\n\nReturn the residual sum of squares of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.score","page":"Abstraction for Statistical Models","title":"StatsAPI.score","text":"score(model::StatisticalModel)\n\nReturn the score of the model, that is the gradient of the log-likelihood with respect to the coefficients.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.stderror","page":"Abstraction for Statistical Models","title":"StatsAPI.stderror","text":"stderror(model::StatisticalModel)\n\nReturn the standard errors for the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.vcov","page":"Abstraction for Statistical Models","title":"StatsAPI.vcov","text":"vcov(model::StatisticalModel)\n\nReturn the variance-covariance matrix for the coefficients of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.weights","page":"Abstraction for Statistical Models","title":"StatsAPI.weights","text":"weights(model::StatisticalModel)\n\nReturn the weights used in the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"RegressionModel extends StatisticalModel by implementing the following additional methods.","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"crossmodelmatrix\ndof_residual\nfitted\nleverage\ncooksdistance\nmeanresponse\nmodelmatrix\nresponse\nresponsename\npredict\npredict!\nresiduals","category":"page"},{"location":"statmodels/#StatsAPI.crossmodelmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.crossmodelmatrix","text":"crossmodelmatrix(model::RegressionModel)\n\nReturn X'X where X is the model matrix of model. This function will return a pre-computed matrix stored in model if possible.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.dof_residual","page":"Abstraction for Statistical Models","title":"StatsAPI.dof_residual","text":"dof_residual(model::RegressionModel)\n\nReturn the residual degrees of freedom of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.fitted","page":"Abstraction for Statistical Models","title":"StatsAPI.fitted","text":"fitted(model::RegressionModel)\n\nReturn the fitted values of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.leverage","page":"Abstraction for Statistical Models","title":"StatsAPI.leverage","text":"leverage(model::RegressionModel)\n\nReturn the diagonal of the projection matrix of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.cooksdistance","page":"Abstraction for Statistical Models","title":"StatsAPI.cooksdistance","text":"cooksdistance(model::RegressionModel)\n\nCompute Cook's distance for each observation in linear model model, giving an estimate of the influence of each data point.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.meanresponse","page":"Abstraction for Statistical Models","title":"StatsAPI.meanresponse","text":"meanresponse(model::RegressionModel)\n\nReturn the mean of the response.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.modelmatrix","page":"Abstraction for Statistical Models","title":"StatsAPI.modelmatrix","text":"modelmatrix(model::RegressionModel)\n\nReturn the model matrix (a.k.a. the design matrix).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.response","page":"Abstraction for Statistical Models","title":"StatsAPI.response","text":"response(model::RegressionModel)\n\nReturn the model response (a.k.a. the dependent variable).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.responsename","page":"Abstraction for Statistical Models","title":"StatsAPI.responsename","text":"responsename(model::RegressionModel)\n\nReturn the name of the model response (a.k.a. the dependent variable).\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.predict","page":"Abstraction for Statistical Models","title":"StatsAPI.predict","text":"predict(model::RegressionModel, [newX])\n\nForm the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.predict!","page":"Abstraction for Statistical Models","title":"StatsAPI.predict!","text":"predict!\n\nIn-place version of predict.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/#StatsAPI.residuals","page":"Abstraction for Statistical Models","title":"StatsAPI.residuals","text":"residuals(model::RegressionModel)\n\nReturn the residuals of the model.\n\n\n\n\n\n","category":"function"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"An exception type is provided to signal convergence failures during model estimation:","category":"page"},{"location":"statmodels/","page":"Abstraction for Statistical Models","title":"Abstraction for Statistical Models","text":"ConvergenceException","category":"page"},{"location":"statmodels/#StatsBase.ConvergenceException","page":"Abstraction for Statistical Models","title":"StatsBase.ConvergenceException","text":"ConvergenceException(iters::Int, lastchange::Real=NaN, tol::Real=NaN)\n\nThe fitting procedure failed to converge in iters number of iterations, i.e. the lastchange between the cost of the final and penultimate iteration was greater than specified tolerance tol.\n\n\n\n\n\n","category":"type"},{"location":"multivariate/#Multivariate-Summary-Statistics","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"This package provides a few methods for summarizing multivariate data.","category":"page"},{"location":"multivariate/#Partial-Correlation","page":"Multivariate Summary Statistics","title":"Partial Correlation","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"partialcor","category":"page"},{"location":"multivariate/#StatsBase.partialcor","page":"Multivariate Summary Statistics","title":"StatsBase.partialcor","text":"partialcor(x, y, Z)\n\nCompute the partial correlation of the vectors x and y given Z, which can be a vector or matrix.\n\n\n\n\n\n","category":"function"},{"location":"multivariate/#Generalizations-of-Variance","page":"Multivariate Summary Statistics","title":"Generalizations of Variance","text":"","category":"section"},{"location":"multivariate/","page":"Multivariate Summary Statistics","title":"Multivariate Summary Statistics","text":"genvar\ntotalvar","category":"page"},{"location":"multivariate/#StatsBase.genvar","page":"Multivariate Summary Statistics","title":"StatsBase.genvar","text":"genvar(X)\n\nCompute the generalized sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the determinant of the covariance matrix of X.\n\nnote: Note\nThe generalized sample variance will be 0 if the columns of the matrix of deviations are linearly dependent.\n\n\n\n\n\n","category":"function"},{"location":"multivariate/#StatsBase.totalvar","page":"Multivariate Summary Statistics","title":"StatsBase.totalvar","text":"totalvar(X)\n\nCompute the total sample variance of X. If X is a vector, one-column matrix, or other iterable, this is equivalent to the sample variance. Otherwise if X is a matrix, this is equivalent to the sum of the diagonal elements of the covariance matrix of X.\n\n\n\n\n\n","category":"function"},{"location":"cov/#Scatter-Matrix-and-Covariance","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"","category":"section"},{"location":"cov/","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"This package implements functions for computing scatter matrix, as well as weighted covariance matrix.","category":"page"},{"location":"cov/","page":"Scatter Matrix and Covariance","title":"Scatter Matrix and Covariance","text":"scattermat\ncov\ncov(::CovarianceEstimator, ::AbstractVector)\ncov(::CovarianceEstimator, ::AbstractVector, ::AbstractVector)\ncov(::CovarianceEstimator, ::AbstractMatrix)\nvar(::CovarianceEstimator, ::AbstractVector)\nstd(::CovarianceEstimator, ::AbstractVector)\ncor\nmean_and_cov\ncov2cor\ncor2cov\nCovarianceEstimator\nSimpleCovariance","category":"page"},{"location":"cov/#StatsBase.scattermat","page":"Scatter Matrix and Covariance","title":"StatsBase.scattermat","text":"scattermat(X, [wv::AbstractWeights]; mean=nothing, dims=1)\n\nCompute the scatter matrix, which is an unnormalized covariance matrix. A weighting vector wv can be specified to weight the estimate.\n\nArguments\n\nmean=nothing: a known mean value. nothing indicates that the mean is unknown, and the function will compute the mean. Specifying mean=0 indicates that the data are centered and hence there's no need to subtract the mean.\ndims=1: the dimension along which the variables are organized. When dims = 1, the variables are considered columns with observations in rows; when dims = 2, variables are in rows with observations in columns.\n\n\n\n\n\n","category":"function"},{"location":"cov/#Statistics.cov","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(X, w::AbstractWeights, vardim=1; mean=nothing, corrected=false)\n\nCompute the weighted covariance matrix. Similar to var and std the biased covariance matrix (corrected=false) is computed by multiplying scattermat(X, w) by frac1sumw to normalize. However, the unbiased covariance matrix (corrected=true) is dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\n","category":"function"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute a variance estimate from the observation vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractVector, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)\n\nCompute the covariance of the vectors x and y using estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cov-Tuple{CovarianceEstimator, AbstractMatrix}","page":"Scatter Matrix and Covariance","title":"Statistics.cov","text":"cov(ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights]; mean=nothing, dims::Int=1)\n\nCompute the covariance matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:\n\nnothing (default) in which case the mean is estimated and subtracted from the data X,\na precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:\nwhen dims=1, an AbstractMatrix of size (1,M),\nwhen dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.var-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.var","text":"var(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the variance of the vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.std-Tuple{CovarianceEstimator, AbstractVector}","page":"Scatter Matrix and Covariance","title":"Statistics.std","text":"std(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the standard deviation of the vector x using the estimator ce.\n\n\n\n\n\n","category":"method"},{"location":"cov/#Statistics.cor","page":"Scatter Matrix and Covariance","title":"Statistics.cor","text":"cor(X, w::AbstractWeights, dims=1)\n\nCompute the Pearson correlation matrix of X along the dimension dims with a weighting w .\n\n\n\n\n\ncor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector)\n\nCompute the correlation of the vectors x and y using estimator ce.\n\n\n\n\n\ncor(\n ce::CovarianceEstimator, X::AbstractMatrix, [w::AbstractWeights];\n mean=nothing, dims::Int=1\n)\n\nCompute the correlation matrix of the matrix X along dimension dims using estimator ce. A weighting vector w can be specified. The keyword argument mean can be:\n\nnothing (default) in which case the mean is estimated and subtracted from the data X,\na precalculated mean in which case it is subtracted from the data X. Assuming size(X) is (N,M), mean can either be:\nwhen dims=1, an AbstractMatrix of size (1,M),\nwhen dims=2, an AbstractVector of length N or an AbstractMatrix of size (N,1).\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.mean_and_cov","page":"Scatter Matrix and Covariance","title":"StatsBase.mean_and_cov","text":"mean_and_cov(x, [wv::AbstractWeights,] vardim=1; corrected=false) -> (mean, cov)\n\nReturn the mean and covariance matrix as a tuple. A weighting vector wv can be specified. vardim that designates whether the variables are columns in the matrix (1) or rows (2). Finally, bias correction is applied to the covariance calculation if corrected=true. See cov documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.cov2cor","page":"Scatter Matrix and Covariance","title":"StatsBase.cov2cor","text":"cov2cor(C::AbstractMatrix, [s::AbstractArray])\n\nCompute the correlation matrix from the covariance matrix C and, optionally, a vector of standard deviations s. Use StatsBase.cov2cor! for an in-place version.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.cor2cov","page":"Scatter Matrix and Covariance","title":"StatsBase.cor2cov","text":"cor2cov(C, s)\n\nCompute the covariance matrix from the correlation matrix C and a vector of standard deviations s. Use StatsBase.cor2cov! for an in-place version.\n\n\n\n\n\n","category":"function"},{"location":"cov/#StatsBase.CovarianceEstimator","page":"Scatter Matrix and Covariance","title":"StatsBase.CovarianceEstimator","text":"CovarianceEstimator\n\nAbstract type for covariance estimators.\n\n\n\n\n\n","category":"type"},{"location":"cov/#StatsBase.SimpleCovariance","page":"Scatter Matrix and Covariance","title":"StatsBase.SimpleCovariance","text":"SimpleCovariance(;corrected::Bool=false)\n\nSimple covariance estimator. Estimation calls cov(x; corrected=corrected), cov(x, y; corrected=corrected) or cov(X, w, dims; corrected=corrected) where x, y are vectors, X is a matrix and w is a weighting vector.\n\n\n\n\n\n","category":"type"},{"location":"misc/#Miscellaneous-Functions","page":"Miscellaneous Functions","title":"Miscellaneous Functions","text":"","category":"section"},{"location":"misc/","page":"Miscellaneous Functions","title":"Miscellaneous Functions","text":"rle\ninverse_rle\nlevelsmap\nindexmap\nindicatormat\nStatsBase.midpoints\npairwise\npairwise!","category":"page"},{"location":"misc/#StatsBase.rle","page":"Miscellaneous Functions","title":"StatsBase.rle","text":"rle(v) -> (vals, lens)\n\nReturn the run-length encoding of a vector as a tuple. The first element of the tuple is a vector of values of the input and the second is the number of consecutive occurrences of each element.\n\nExamples\n\njulia> using StatsBase\n\njulia> rle([1,1,1,2,2,3,3,3,3,2,2,2])\n([1, 2, 3, 2], [3, 2, 4, 3])\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.inverse_rle","page":"Miscellaneous Functions","title":"StatsBase.inverse_rle","text":"inverse_rle(vals, lens)\n\nReconstruct a vector from its run-length encoding (see rle). vals is a vector of the values and lens is a vector of the corresponding run lengths.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.levelsmap","page":"Miscellaneous Functions","title":"StatsBase.levelsmap","text":"levelsmap(a)\n\nConstruct a dictionary that maps each of the n unique values in a to a number between 1 and n.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.indexmap","page":"Miscellaneous Functions","title":"StatsBase.indexmap","text":"indexmap(a)\n\nConstruct a dictionary that maps each unique value in a to the index of its first occurrence in a.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.indicatormat","page":"Miscellaneous Functions","title":"StatsBase.indicatormat","text":"indicatormat(x, k::Integer; sparse=false)\n\nConstruct a boolean matrix I of size (k, length(x)) such that I[x[i], i] = true and all other elements are set to false. If sparse is true, the output will be a sparse matrix, otherwise it will be dense (default).\n\nExamples\n\njulia> using StatsBase\n\njulia> indicatormat([1 2 2], 2)\n2×3 Matrix{Bool}:\n 1 0 0\n 0 1 1\n\n\n\n\n\nindicatormat(x, c=sort(unique(x)); sparse=false)\n\nConstruct a boolean matrix I of size (length(c), length(x)). Let ci be the index of x[i] in c. Then I[ci, i] = true and all other elements are false.\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsBase.midpoints","page":"Miscellaneous Functions","title":"StatsBase.midpoints","text":"StatsBase.midpoints(v)\n\nCalculate the midpoints (pairwise mean of consecutive elements).\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsAPI.pairwise","page":"Miscellaneous Functions","title":"StatsAPI.pairwise","text":"pairwise(f, x[, y];\n symmetric::Bool=false, skipmissing::Symbol=:none)\n\nReturn a matrix holding the result of applying f to all possible pairs of entries in iterators x and y. Rows correspond to entries in x and columns to entries in y. If y is omitted then a square matrix crossing x with itself is returned.\n\nAs a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.\n\nKeyword arguments\n\nsymmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.\nskipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.\n\nExamples\n\njulia> using StatsBase, Statistics\n\njulia> x = [1 3 7\n 2 5 6\n 3 8 4\n 4 6 2];\n\njulia> pairwise(cor, eachcol(x))\n3×3 Matrix{Float64}:\n 1.0 0.744208 -0.989778\n 0.744208 1.0 -0.68605\n -0.989778 -0.68605 1.0\n\njulia> y = [1 3 missing\n 2 5 6\n 3 missing 2\n 4 6 2];\n\njulia> pairwise(cor, eachcol(y), skipmissing=:pairwise)\n3×3 Matrix{Float64}:\n 1.0 0.928571 -0.866025\n 0.928571 1.0 -1.0\n -0.866025 -1.0 1.0\n\n\n\n\n\n","category":"function"},{"location":"misc/#StatsAPI.pairwise!","page":"Miscellaneous Functions","title":"StatsAPI.pairwise!","text":"pairwise!(f, dest::AbstractMatrix, x[, y];\n symmetric::Bool=false, skipmissing::Symbol=:none)\n\nStore in matrix dest the result of applying f to all possible pairs of entries in iterators x and y, and return it. Rows correspond to entries in x and columns to entries in y, and dest must therefore be of size length(x) × length(y). If y is omitted then x is crossed with itself.\n\nAs a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.\n\nKeyword arguments\n\nsymmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.\nskipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.\n\nExamples\n\njulia> using StatsBase, Statistics\n\njulia> dest = zeros(3, 3);\n\njulia> x = [1 3 7\n 2 5 6\n 3 8 4\n 4 6 2];\n\njulia> pairwise!(cor, dest, eachcol(x));\n\njulia> dest\n3×3 Matrix{Float64}:\n 1.0 0.744208 -0.989778\n 0.744208 1.0 -0.68605\n -0.989778 -0.68605 1.0\n\njulia> y = [1 3 missing\n 2 5 6\n 3 missing 2\n 4 6 2];\n\njulia> pairwise!(cor, dest, eachcol(y), skipmissing=:pairwise);\n\njulia> dest\n3×3 Matrix{Float64}:\n 1.0 0.928571 -0.866025\n 0.928571 1.0 -1.0\n -0.866025 -1.0 1.0\n\n\n\n\n\n","category":"function"},{"location":"weights/#Weight-Vectors","page":"Weight Vectors","title":"Weight Vectors","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type AbstractWeights for the purpose of representing weight vectors, which has two advantages:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"A different type AbstractWeights distinguishes the role of the weight vector from other data vectors in the input arguments.\nStatistical functions that utilize weights often need the sum of weights for various purposes. The weight vector maintains the sum of weights, so that it needn't be computed repeatedly each time the sum of weights is needed.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"note: Note\nThe weight vector is a light-weight wrapper of the input vector. The input vector is NOT copied during construction.\nThe weight vector maintains the sum of weights, which is computed upon construction. If the value of the sum is pre-computed, one can supply it as the second argument to the constructor and save the time of computing the sum again.","category":"page"},{"location":"weights/#Implementations","page":"Weight Vectors","title":"Implementations","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Several statistical weight types are provided which subtype AbstractWeights. The choice of weights impacts how bias is corrected in several methods. See the var, std and cov docstrings for more details.","category":"page"},{"location":"weights/#AnalyticWeights","page":"Weight Vectors","title":"AnalyticWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = AnalyticWeights([0.2, 0.1, 0.3])\nw = aweights([0.2, 0.1, 0.3])","category":"page"},{"location":"weights/#FrequencyWeights","page":"Weight Vectors","title":"FrequencyWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = FrequencyWeights([2, 1, 3])\nw = fweights([2, 1, 3])","category":"page"},{"location":"weights/#ProbabilityWeights","page":"Weight Vectors","title":"ProbabilityWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = ProbabilityWeights([0.2, 0.1, 0.3])\nw = pweights([0.2, 0.1, 0.3])","category":"page"},{"location":"weights/#UnitWeights","page":"Weight Vectors","title":"UnitWeights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Unit weights are a special case in which all observations are given a weight equal to 1. Using such weights is equivalent to computing unweighted statistics.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a UnitWeights object. This is very efficient since no weights vector is actually allocated.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = uweights(3)\nw = uweights(Float64, 3)","category":"page"},{"location":"weights/#Weights","page":"Weight Vectors","title":"Weights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"The Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights, ProbabilityWeights and UnitWeights.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"w = Weights([1., 2., 3.])\nw = weights([1., 2., 3.])","category":"page"},{"location":"weights/#Exponential-weights:-eweights","page":"Weight Vectors","title":"Exponential weights: eweights","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Exponential weights are a common form of temporal weights which assign exponentially decreasing weights to past observations.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"If t is a vector of temporal indices then for each index i we compute the weight as:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"λ (1 - λ)^1 - i","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"λ is a smoothing factor or rate parameter such that 0 λ 1. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"For example, the following call generates exponential weights for ten observations with λ = 03.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> eweights(1:10, 0.3)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.42857142857142855\n 0.6122448979591837\n 0.8746355685131197\n 1.249479383590171\n 1.7849705479859588\n 2.549957925694227\n 3.642797036706039\n 5.203995766722913\n 7.434279666747019","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Simply passing the number of observations n is equivalent to passing in 1:n.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> eweights(10, 0.3)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.42857142857142855\n 0.6122448979591837\n 0.8746355685131197\n 1.249479383590171\n 1.7849705479859588\n 2.549957925694227\n 3.642797036706039\n 5.203995766722913\n 7.434279666747019","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"julia> t\n2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00\n\njulia> r\n2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00\n\njulia> eweights(t, r, 0.3)\n3-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.3\n 0.6122448979591837\n 1.249479383590171","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"NOTE: This is equivalent to eweights(something.(indexin(t, r)), 0.3), which is saying that for each value in t return the corresponding index for that value in r. Since indexin returns nothing if there is no corresponding value from t in r we use something to eliminate that possibility.","category":"page"},{"location":"weights/#Methods","page":"Weight Vectors","title":"Methods","text":"","category":"section"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"AbstractWeights implements the following methods:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"eltype\nlength\nisempty\nvalues\nsum","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"The following constructors are provided:","category":"page"},{"location":"weights/","page":"Weight Vectors","title":"Weight Vectors","text":"AnalyticWeights\nFrequencyWeights\nProbabilityWeights\nUnitWeights\nWeights\naweights\nfweights\npweights\neweights\nuweights\nweights(vs::AbstractArray{<:Real})","category":"page"},{"location":"weights/#StatsBase.AnalyticWeights","page":"Weight Vectors","title":"StatsBase.AnalyticWeights","text":"AnalyticWeights(vs, wsum=sum(vs))\n\nConstruct an AnalyticWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nAnalytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.FrequencyWeights","page":"Weight Vectors","title":"StatsBase.FrequencyWeights","text":"FrequencyWeights(vs, wsum=sum(vs))\n\nConstruct a FrequencyWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nFrequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.ProbabilityWeights","page":"Weight Vectors","title":"StatsBase.ProbabilityWeights","text":"ProbabilityWeights(vs, wsum=sum(vs))\n\nConstruct a ProbabilityWeights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nProbability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.UnitWeights","page":"Weight Vectors","title":"StatsBase.UnitWeights","text":"UnitWeights{T}(s)\n\nConstruct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.Weights","page":"Weight Vectors","title":"StatsBase.Weights","text":"Weights(vs, wsum=sum(vs))\n\nConstruct a Weights vector with weight values vs. A precomputed sum may be provided as wsum.\n\nThe Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights and ProbabilityWeights.\n\n\n\n\n\n","category":"type"},{"location":"weights/#StatsBase.aweights","page":"Weight Vectors","title":"StatsBase.aweights","text":"aweights(vs)\n\nConstruct an AnalyticWeights vector from array vs. See the documentation for AnalyticWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.fweights","page":"Weight Vectors","title":"StatsBase.fweights","text":"fweights(vs)\n\nConstruct a FrequencyWeights vector from a given array. See the documentation for FrequencyWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.pweights","page":"Weight Vectors","title":"StatsBase.pweights","text":"pweights(vs)\n\nConstruct a ProbabilityWeights vector from a given array. See the documentation for ProbabilityWeights for more details.\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.eweights","page":"Weight Vectors","title":"StatsBase.eweights","text":"eweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)\neweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real; scale=false) where T\neweights(n::Integer, λ::Real; scale=false)\n\nConstruct a Weights vector which assigns exponentially decreasing weights to past observations (larger integer values i in t). The integer value n represents the number of past observations to consider. n defaults to maximum(t) - minimum(t) + 1 if only t is passed in and the elements are integers, and to length(r) if a superset range r is also passed in. If n is explicitly passed instead of t, t defaults to 1:n.\n\nIf scale is true then for each element i in t the weight value is computed as:\n\n(1 - λ)^n - i\n\nIf scale is false then each value is computed as:\n\nλ (1 - λ)^1 - i\n\nArguments\n\nt::AbstractVector: temporal indices or timestamps\nr::StepRange: a larger range to use when constructing weights from a subset of timestamps\nn::Integer: the number of past events to consider\nλ::Real: a smoothing factor or rate parameter such that 0 λ 1. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.\n\nKeyword arguments\n\nscale::Bool: Return the weights scaled to between 0 and 1 (default: false)\n\nExamples\n\njulia> eweights(1:10, 0.3; scale=true)\n10-element Weights{Float64,Float64,Array{Float64,1}}:\n 0.04035360699999998\n 0.05764800999999997\n 0.08235429999999996\n 0.11764899999999996\n 0.16806999999999994\n 0.24009999999999995\n 0.3429999999999999\n 0.48999999999999994\n 0.7\n 1.0\n\nLinks\n\nhttps://en.wikipedia.org/wiki/Movingaverage#Exponentialmoving_average\nhttps://en.wikipedia.org/wiki/Exponential_smoothing\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsBase.uweights","page":"Weight Vectors","title":"StatsBase.uweights","text":"uweights(s::Integer)\nuweights(::Type{T}, s::Integer) where T<:Real\n\nConstruct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.\n\nExamples\n\njulia> uweights(3)\n3-element UnitWeights{Int64}:\n 1\n 1\n 1\n\njulia> uweights(Float64, 3)\n3-element UnitWeights{Float64}:\n 1.0\n 1.0\n 1.0\n\n\n\n\n\n","category":"function"},{"location":"weights/#StatsAPI.weights-Tuple{AbstractArray{<:Real}}","page":"Weight Vectors","title":"StatsAPI.weights","text":"weights(vs::AbstractArray{<:Real})\n\nConstruct a Weights vector from array vs. See the documentation for Weights for more details.\n\n\n\n\n\n","category":"method"},{"location":"empirical/#Empirical-Estimation","page":"Empirical Estimation","title":"Empirical Estimation","text":"","category":"section"},{"location":"empirical/#Histograms","page":"Empirical Estimation","title":"Histograms","text":"","category":"section"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Histogram","category":"page"},{"location":"empirical/#StatsBase.Histogram","page":"Empirical Estimation","title":"StatsBase.Histogram","text":"Histogram <: AbstractHistogram\n\nThe Histogram type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over a real space. Histograms can be fitted to data using the fit method.\n\nFields\n\nedges: An iterator that contains the boundaries of the bins in each dimension.\nweights: An array that contains the weight of each bin.\nclosed: A symbol with value :right or :left indicating on which side bins (half-open intervals or higher-dimensional analogues thereof) are closed. See below for an example.\nisdensity: There are two interpretations of a Histogram. If isdensity=false the weight of a bin corresponds to the amount of a quantity in the bin. If isdensity=true then it corresponds to the density (amount / volume) of the quantity in the bin. See below for an example.\n\nExamples\n\nExample illustrating closed\n\njulia> using StatsBase\n\njulia> fit(Histogram, [2.], 1:3, closed=:left)\nHistogram{Int64, 1, Tuple{UnitRange{Int64}}}\nedges:\n 1:3\nweights: [0, 1]\nclosed: left\nisdensity: false\n\njulia> fit(Histogram, [2.], 1:3, closed=:right)\nHistogram{Int64, 1, Tuple{UnitRange{Int64}}}\nedges:\n 1:3\nweights: [1, 0]\nclosed: right\nisdensity: false\n\nExample illustrating isdensity\n\njulia> using StatsBase, LinearAlgebra\n\njulia> bins = [0,1,7]; # a small and a large bin\n\njulia> obs = [0.5, 1.5, 1.5, 2.5]; # one observation in the small bin and three in the large\n\njulia> h = fit(Histogram, obs, bins)\nHistogram{Int64,1,Tuple{Array{Int64,1}}}\nedges:\n [0, 1, 7]\nweights: [1, 3]\nclosed: left\nisdensity: false\n\njulia> # observe isdensity = false and the weights field records the number of observations in each bin\n\njulia> normalize(h, mode=:density)\nHistogram{Float64,1,Tuple{Array{Int64,1}}}\nedges:\n [0, 1, 7]\nweights: [1.0, 0.5]\nclosed: left\nisdensity: true\n\njulia> # observe isdensity = true and weights tells us the number of observation per binsize in each bin\n\n\n\n\n\n","category":"type"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Histograms can be fitted to data using the fit method.","category":"page"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"fit(::Type{Histogram}, args...; kwargs...)","category":"page"},{"location":"empirical/#StatsAPI.fit-Tuple{Type{Histogram}, Vararg{Any}}","page":"Empirical Estimation","title":"StatsAPI.fit","text":"fit(Histogram, data[, weight][, edges]; closed=:left[, nbins])\n\nFit a histogram to data.\n\nArguments\n\ndata: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).\nweight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.\nedges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximately nbins bins of equal width are constructed along each dimension.\n\nnote: Note\nIn most cases, the number of bins will be nbins. However, to ensure that the bins have equal width, more or fewer than nbins bins may be used.\n\nKeyword arguments\n\nclosed: if :left (the default), the bin intervals are left-closed [a,b); if :right, intervals are right-closed (a,b].\nnbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e. ceil(log2(length(n))) + 1 with n the number of data points.\n\nExamples\n\n# Univariate\nh = fit(Histogram, rand(100))\nh = fit(Histogram, rand(100), 0:0.1:1.0)\nh = fit(Histogram, rand(100), nbins=10)\nh = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)\nh = fit(Histogram, [20], 0:20:100)\nh = fit(Histogram, [20], 0:20:100, closed=:right)\n\n# Multivariate\nh = fit(Histogram, (rand(100),rand(100)))\nh = fit(Histogram, (rand(100),rand(100)),nbins=10)\n\n\n\n\n\n","category":"method"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"Additional methods","category":"page"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"merge!\nmerge\nnorm\nnormalize\nnormalize!\nzero","category":"page"},{"location":"empirical/#Base.merge!","page":"Empirical Estimation","title":"Base.merge!","text":"merge!(target::Histogram, others::Histogram...)\n\nUpdate histogram target by merging it with the histograms others. See merge(histogram::Histogram, others::Histogram...) for details.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Base.merge","page":"Empirical Estimation","title":"Base.merge","text":"merge(h::Histogram, others::Histogram...)\n\nConstruct a new histogram by merging h with others. All histograms must have the same binning, shape of weights and properties (closed and isdensity). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.norm","page":"Empirical Estimation","title":"LinearAlgebra.norm","text":"norm(h::Histogram)\n\nCalculate the norm of histogram h as the absolute value of its integral.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.normalize","page":"Empirical Estimation","title":"LinearAlgebra.normalize","text":"normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}\n\nNormalize the histogram h.\n\nValid values for mode are:\n\n:pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.\n:density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1).\n:probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.\n:none: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.\n\nSuccessive application of both :probability and :density normalization (in any order) is equivalent to :pdf normalization.\n\n\n\n\n\nnormalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}\n\nNormalize the histogram h and rescales one or more auxiliary weight arrays at the same time (aux_weights may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#LinearAlgebra.normalize!","page":"Empirical Estimation","title":"LinearAlgebra.normalize!","text":"normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}\n\nNormalize the histogram h and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize for details. Returns h.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Base.zero","page":"Empirical Estimation","title":"Base.zero","text":"zero(h::Histogram)\n\nCreate a new histogram with the same binning, type and shape of weights and the same properties (closed and isdensity) as h, with all weights set to zero.\n\n\n\n\n\n","category":"function"},{"location":"empirical/#Empirical-Cumulative-Distribution-Function","page":"Empirical Estimation","title":"Empirical Cumulative Distribution Function","text":"","category":"section"},{"location":"empirical/","page":"Empirical Estimation","title":"Empirical Estimation","text":"ecdf","category":"page"},{"location":"empirical/#StatsBase.ecdf","page":"Empirical Estimation","title":"StatsBase.ecdf","text":"ecdf(X; weights::AbstractWeights)\n\nReturn an empirical cumulative distribution function (ECDF) based on a vector of samples given in X. Optionally providing weights returns a weighted ECDF.\n\nNote: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.\n\nextrema, minimum, and maximum are supported to for obtaining the range over which function is inside the interval (01); the function is defined for the whole real line.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#Data-Transformations","page":"Data Transformations","title":"Data Transformations","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"In general, data transformations change raw feature vectors into a representation that is more suitable for various estimators.","category":"page"},{"location":"transformations/#Standardization-a.k.a-Z-score-Normalization","page":"Data Transformations","title":"Standardization a.k.a Z-score Normalization","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization, also known as Z-score normalization, is a common requirement for many machine learning techniques. These techniques might perform poorly if the individual features do not more or less look like standard normally distributed data.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization transforms data points into corresponding standard scores by subtracting mean and scaling to unit variance.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"The standard score, also known as Z-score, is the signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Standardization can be performed using t = fit(ZScoreTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(ZScoreTransform, ...) is a shorthand to perform both operations in a single call.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"fit(::Type{ZScoreTransform}, X::AbstractArray{<:Real,2}; center::Bool=true, scale::Bool=true)","category":"page"},{"location":"transformations/#StatsAPI.fit-Tuple{Type{ZScoreTransform}, AbstractMatrix{<:Real}}","page":"Data Transformations","title":"StatsAPI.fit","text":"fit(ZScoreTransform, X; dims=nothing, center=true, scale=true)\n\nFit standardization parameters to vector or matrix X and return a ZScoreTransform transformation object.\n\nKeyword arguments\n\ndims: if 1 fit standardization parameters in column-wise fashion; if 2 fit in row-wise fashion. The default is nothing, which is equivalent to dims=2 with a deprecation warning.\ncenter: if true (the default) center data so that its mean is zero.\nscale: if true (the default) scale the data so that its variance is equal to one.\n\nExamples\n\njulia> using StatsBase\n\njulia> X = [0.0 -0.5 0.5; 0.0 1.0 2.0]\n2×3 Matrix{Float64}:\n 0.0 -0.5 0.5\n 0.0 1.0 2.0\n\njulia> dt = fit(ZScoreTransform, X, dims=2)\nZScoreTransform{Float64, Vector{Float64}}(2, 2, [0.0, 1.0], [0.5, 1.0])\n\njulia> StatsBase.transform(dt, X)\n2×3 Matrix{Float64}:\n 0.0 -1.0 1.0\n -1.0 0.0 1.0\n\n\n\n\n\n","category":"method"},{"location":"transformations/#Unit-Range-Normalization","page":"Data Transformations","title":"Unit Range Normalization","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Unit range normalization, also known as min-max scaling, is an alternative data transformation which scales features to lie in the interval [0; 1].","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"Unit range normalization can be performed using t = fit(UnitRangeTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(UnitRangeTransform, ...) is a shorthand to perform both operations in a single call.","category":"page"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"fit(::Type{UnitRangeTransform}, X::AbstractArray{<:Real,2}; unit::Bool=true)","category":"page"},{"location":"transformations/#StatsAPI.fit-Tuple{Type{UnitRangeTransform}, AbstractMatrix{<:Real}}","page":"Data Transformations","title":"StatsAPI.fit","text":"fit(UnitRangeTransform, X; dims=nothing, unit=true)\n\nFit a scaling parameters to vector or matrix X and return a UnitRangeTransform transformation object.\n\nKeyword arguments\n\ndims: if 1 fit standardization parameters in column-wise fashion;\n\nif 2 fit in row-wise fashion. The default is nothing.\n\nunit: if true (the default) shift the minimum data to zero.\n\nExamples\n\njulia> using StatsBase\n\njulia> X = [0.0 -0.5 0.5; 0.0 1.0 2.0]\n2×3 Matrix{Float64}:\n 0.0 -0.5 0.5\n 0.0 1.0 2.0\n\njulia> dt = fit(UnitRangeTransform, X, dims=2)\nUnitRangeTransform{Float64, Vector{Float64}}(2, 2, true, [-0.5, 0.0], [1.0, 0.5])\n\njulia> StatsBase.transform(dt, X)\n2×3 Matrix{Float64}:\n 0.5 0.0 1.0\n 0.0 0.5 1.0\n\n\n\n\n\n","category":"method"},{"location":"transformations/#Methods","page":"Data Transformations","title":"Methods","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"StatsBase.transform\nStatsBase.transform!\nStatsBase.reconstruct\nStatsBase.reconstruct!\nstandardize","category":"page"},{"location":"transformations/#StatsBase.transform","page":"Data Transformations","title":"StatsBase.transform","text":"transform(t::AbstractDataTransform, x)\n\nReturn a standardized copy of vector or matrix x using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.transform!","page":"Data Transformations","title":"StatsBase.transform!","text":"transform!(t::AbstractDataTransform, x)\n\nApply transformation t to vector or matrix x in place.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.reconstruct","page":"Data Transformations","title":"StatsBase.reconstruct","text":"reconstruct(t::AbstractDataTransform, y)\n\nReturn a reconstruction of an originally scaled data from a transformed vector or matrix y using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.reconstruct!","page":"Data Transformations","title":"StatsBase.reconstruct!","text":"reconstruct!(t::AbstractDataTransform, y)\n\nPerform an in-place reconstruction into an original data scale from a transformed vector or matrix y using transformation t.\n\n\n\n\n\n","category":"function"},{"location":"transformations/#StatsBase.standardize","page":"Data Transformations","title":"StatsBase.standardize","text":"standardize(DT, X; dims=nothing, kwargs...)\n\nReturn a standardized copy of vector or matrix X along dimensions dims using transformation DT which is a subtype of AbstractDataTransform:\n\nZScoreTransform\nUnitRangeTransform\n\nExample\n\njulia> using StatsBase\n\njulia> standardize(ZScoreTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)\n2×3 Matrix{Float64}:\n 0.0 -1.0 1.0\n -1.0 0.0 1.0\n\njulia> standardize(UnitRangeTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)\n2×3 Matrix{Float64}:\n 0.5 0.0 1.0\n 0.0 0.5 1.0\n\n\n\n\n\n","category":"function"},{"location":"transformations/#Types","page":"Data Transformations","title":"Types","text":"","category":"section"},{"location":"transformations/","page":"Data Transformations","title":"Data Transformations","text":"UnitRangeTransform\nZScoreTransform","category":"page"},{"location":"transformations/#StatsBase.UnitRangeTransform","page":"Data Transformations","title":"StatsBase.UnitRangeTransform","text":"Unit range normalization\n\n\n\n\n\n","category":"type"},{"location":"transformations/#StatsBase.ZScoreTransform","page":"Data Transformations","title":"StatsBase.ZScoreTransform","text":"Standardization (Z-score transformation)\n\n\n\n\n\n","category":"type"},{"location":"signalcorr/#Correlation-Analysis-of-Signals","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"The package provides functions to perform correlation analysis of sequential signals.","category":"page"},{"location":"signalcorr/#Autocovariance-and-Autocorrelation","page":"Correlation Analysis of Signals","title":"Autocovariance and Autocorrelation","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"autocov\nautocov!\nautocor\nautocor!","category":"page"},{"location":"signalcorr/#StatsBase.autocov","page":"Correlation Analysis of Signals","title":"StatsBase.autocov","text":"autocov(x, [lags]; demean=true)\n\nCompute the autocovariance of a vector or matrix x, optionally specifying the lags at which to compute the autocovariance. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.\n\nWhen left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).\n\nThe output is not normalized. See autocor for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocov!","page":"Correlation Analysis of Signals","title":"StatsBase.autocov!","text":"autocov!(r, x, lags; demean=true)\n\nCompute the autocovariance of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.\n\nIf x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.\n\nThe output is not normalized. See autocor! for a method with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocor","page":"Correlation Analysis of Signals","title":"StatsBase.autocor","text":"autocor(x, [lags]; demean=true)\n\nCompute the autocorrelation function (ACF) of a vector or matrix x, optionally specifying the lags. demean denotes whether the mean of x should be subtracted from x before computing the ACF.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.\n\nWhen left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).\n\nThe output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.autocor!","page":"Correlation Analysis of Signals","title":"StatsBase.autocor!","text":"autocor!(r, x, lags; demean=true)\n\nCompute the autocorrelation function (ACF) of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the ACF.\n\nIf x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.\n\nThe output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov! for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#Cross-covariance-and-Cross-correlation","page":"Correlation Analysis of Signals","title":"Cross-covariance and Cross-correlation","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"crosscov\ncrosscov!\ncrosscor\ncrosscor!","category":"page"},{"location":"signalcorr/#StatsBase.crosscov","page":"Correlation Analysis of Signals","title":"StatsBase.crosscov","text":"crosscov(x, y, [lags]; demean=true)\n\nCompute the cross covariance function (CCF) between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.\n\nIf both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.\n\nWhen left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).\n\nThe output is not normalized. See crosscor for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscov!","page":"Correlation Analysis of Signals","title":"StatsBase.crosscov!","text":"crosscov!(r, x, y, lags; demean=true)\n\nCompute the cross covariance function (CCF) between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.\n\nIf both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).\n\nThe output is not normalized. See crosscor! for a function with normalization.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscor","page":"Correlation Analysis of Signals","title":"StatsBase.crosscor","text":"crosscor(x, y, [lags]; demean=true)\n\nCompute the cross correlation between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.\n\nIf both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.\n\nWhen left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).\n\nThe output is normalized by sqrt(var(x)*var(y)). See crosscov for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.crosscor!","page":"Correlation Analysis of Signals","title":"StatsBase.crosscor!","text":"crosscor!(r, x, y, lags; demean=true)\n\nCompute the cross correlation between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.\n\nIf both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).\n\nThe output is normalized by sqrt(var(x)*var(y)). See crosscov! for the unnormalized form.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#Partial-Autocorrelation-Function","page":"Correlation Analysis of Signals","title":"Partial Autocorrelation Function","text":"","category":"section"},{"location":"signalcorr/","page":"Correlation Analysis of Signals","title":"Correlation Analysis of Signals","text":"pacf\npacf!","category":"page"},{"location":"signalcorr/#StatsBase.pacf","page":"Correlation Analysis of Signals","title":"StatsBase.pacf","text":"pacf(X, lags; method=:regression)\n\nCompute the partial autocorrelation function (PACF) of a real-valued vector or matrix X at lags. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.\n\nIf x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x, 2)), where each column in the result corresponds to a column in x.\n\n\n\n\n\n","category":"function"},{"location":"signalcorr/#StatsBase.pacf!","page":"Correlation Analysis of Signals","title":"StatsBase.pacf!","text":"pacf!(r, X, lags; method=:regression)\n\nCompute the partial autocorrelation function (PACF) of a matrix X at lags and store the result in r. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.\n\nr must be a matrix of size (length(lags), size(x, 2)).\n\n\n\n\n\n","category":"function"},{"location":"counts/#Counting-Functions","page":"Counting Functions","title":"Counting Functions","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"The package provides functions to count the occurrences of distinct values.","category":"page"},{"location":"counts/#Counting-over-an-Integer-Range","page":"Counting Functions","title":"Counting over an Integer Range","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"counts\nproportions\naddcounts!(r::AbstractArray, x::AbstractArray{<:Integer}, levels::UnitRange{<:Integer})","category":"page"},{"location":"counts/#StatsBase.counts","page":"Counting Functions","title":"StatsBase.counts","text":"counts(x, [wv::AbstractWeights])\ncounts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])\ncounts(x, k::Integer, [wv::AbstractWeights])\n\nCount the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\nThe output is a vector of length length(levels).\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.proportions","page":"Counting Functions","title":"StatsBase.proportions","text":"proportions(x, levels=span(x), [wv::AbstractWeights])\n\nReturn the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\nproportions(x, k::Integer, [wv::AbstractWeights])\n\nReturn the proportion of integers in 1 to k that occur in x.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.addcounts!-Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}","page":"Counting Functions","title":"StatsBase.addcounts!","text":"addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])\n\nAdd the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].\n\nIf a weighting vector wv is specified, the sum of weights is used rather than the raw counts.\n\n\n\n\n\n","category":"method"},{"location":"counts/#Counting-over-arbitrary-distinct-values","page":"Counting Functions","title":"Counting over arbitrary distinct values","text":"","category":"section"},{"location":"counts/","page":"Counting Functions","title":"Counting Functions","text":"countmap\nproportionmap\naddcounts!(cm::Dict, x::Any)","category":"page"},{"location":"counts/#StatsBase.countmap","page":"Counting Functions","title":"StatsBase.countmap","text":"countmap(x; alg = :auto)\ncountmap(x::AbstractVector, wv::AbstractVector{<:Real})\n\nReturn a dictionary mapping each unique value in x to its number of occurrences.\n\nIf a weighting vector wv is specified, the sum of weights is used rather than the raw counts.\n\nalg is only allowed for unweighted counting and can be one of:\n\n:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.\n:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.\n:dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.proportionmap","page":"Counting Functions","title":"StatsBase.proportionmap","text":"proportionmap(x)\nproportionmap(x::AbstractVector, w::AbstractVector{<:Real})\n\nReturn a dictionary mapping each unique value in x to its proportion in x.\n\nIf a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.\n\n\n\n\n\n","category":"function"},{"location":"counts/#StatsBase.addcounts!-Tuple{Dict, Any}","page":"Counting Functions","title":"StatsBase.addcounts!","text":"addcounts!(dict, x; alg = :auto)\naddcounts!(dict, x, wv)\n\nAdd counts based on x to a count map. New entries will be added if new values come up.\n\nIf a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.\n\nalg is only allowed for unweighted counting and can be one of:\n\n:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.\n:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for large x with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.\n:dict: use Dict-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.\n\n\n\n\n\n","category":"method"},{"location":"scalarstats/#Scalar-Statistics","page":"Scalar Statistics","title":"Scalar Statistics","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"The package implements functions for computing various statistics over an array of scalar real numbers.","category":"page"},{"location":"scalarstats/#Weighted-sum-and-mean","page":"Scalar Statistics","title":"Weighted sum and mean","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"sum\nsum!\nwsum\nwsum!\nmean\nmean!","category":"page"},{"location":"scalarstats/#Base.sum","page":"Scalar Statistics","title":"Base.sum","text":"sum(v::AbstractArray, w::AbstractWeights{<:Real}; [dims])\n\nCompute the weighted sum of an array v with weights w, optionally over the dimension dims.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Base.sum!","page":"Scalar Statistics","title":"Base.sum!","text":"sum!(R::AbstractArray, A::AbstractArray,\n w::AbstractWeights{<:Real}, dim::Int;\n init::Bool=true)\n\nCompute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.wsum","page":"Scalar Statistics","title":"StatsBase.wsum","text":"wsum(v, w::AbstractVector, [dim])\n\nCompute the weighted sum of an array v with weights w, optionally over the dimension dim.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.wsum!","page":"Scalar Statistics","title":"StatsBase.wsum!","text":"wsum!(R::AbstractArray, A::AbstractArray,\n w::AbstractVector, dim::Int;\n init::Bool=true)\n\nCompute the weighted sum of A with weights w over the dimension dim and store the result in R. If init=false, the sum is added to R rather than starting from zero.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.mean","page":"Scalar Statistics","title":"Statistics.mean","text":"mean(A::AbstractArray, w::AbstractWeights[, dims::Int])\n\nCompute the weighted mean of array A with weight vector w (of type AbstractWeights). If dim is provided, compute the weighted mean along dimension dims.\n\nExamples\n\nn = 20\nx = rand(n)\nw = rand(n)\nmean(x, weights(w))\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.mean!","page":"Scalar Statistics","title":"Statistics.mean!","text":"mean!(R::AbstractArray, A::AbstractArray, w::AbstractWeights[; dims=nothing])\n\nCompute the weighted mean of array A with weight vector w (of type AbstractWeights) along dimension dims, and write results to R.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Means","page":"Scalar Statistics","title":"Means","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"The package provides functions to compute means of different kinds.","category":"page"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"geomean\nharmmean\ngenmean","category":"page"},{"location":"scalarstats/#StatsBase.geomean","page":"Scalar Statistics","title":"StatsBase.geomean","text":"geomean(a)\n\nReturn the geometric mean of a collection.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.harmmean","page":"Scalar Statistics","title":"StatsBase.harmmean","text":"harmmean(a)\n\nReturn the harmonic mean of a collection.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.genmean","page":"Scalar Statistics","title":"StatsBase.genmean","text":"genmean(a, p)\n\nReturn the generalized/power mean with exponent p of a real-valued array, i.e. left( frac1n sum_i=1^n a_i^p right)^frac1p, where n = length(a). It is taken to be the geometric mean when p == 0.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Moments-and-cumulants","page":"Scalar Statistics","title":"Moments and cumulants","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"var\nstd\nmean_and_var\nmean_and_std\nskewness\nkurtosis\nmoment\ncumulant","category":"page"},{"location":"scalarstats/#Statistics.var","page":"Scalar Statistics","title":"Statistics.var","text":"var(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)\n\nCompute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:\n\nfrac1sumw sum_i=1^n w_ileft(x_i - μright)^2 \n\nwhere n is the length of the input and μ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing frac1sumw with a factor dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\nvar(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the variance of the vector x using the estimator ce.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.std","page":"Scalar Statistics","title":"Statistics.std","text":"std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)\n\nCompute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:\n\nsqrtfrac1sumw sum_i=1^n w_ileft(x_i - μright)^2 \n\nwhere n is the length of the input and μ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing frac1sumw with a factor dependent on the type of weights used:\n\nAnalyticWeights: frac1sum w - sum w^2 sum w\nFrequencyWeights: frac1sumw - 1\nProbabilityWeights: fracn(n - 1) sum w where n equals count(!iszero, w)\nWeights: ArgumentError (bias correction not supported)\n\n\n\n\n\nstd(ce::CovarianceEstimator, x::AbstractVector; mean=nothing)\n\nCompute the standard deviation of the vector x using the estimator ce.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mean_and_var","page":"Scalar Statistics","title":"StatsBase.mean_and_var","text":"mean_and_var(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, var)\n\nReturn the mean and variance of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mean_and_std","page":"Scalar Statistics","title":"StatsBase.mean_and_std","text":"mean_and_std(x, [w::AbstractWeights], [dim]; corrected=true) -> (mean, std)\n\nReturn the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.skewness","page":"Scalar Statistics","title":"StatsBase.skewness","text":"skewness(v, [wv::AbstractWeights], m=mean(v))\n\nCompute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.kurtosis","page":"Scalar Statistics","title":"StatsBase.kurtosis","text":"kurtosis(v, [wv::AbstractWeights], m=mean(v))\n\nCompute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.moment","page":"Scalar Statistics","title":"StatsBase.moment","text":"moment(v, k, [wv::AbstractWeights], m=mean(v))\n\nReturn the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.cumulant","page":"Scalar Statistics","title":"StatsBase.cumulant","text":"cumulant(v, k, [wv::AbstractWeights], m=mean(v))\n\nReturn the kth order cumulant of a real-valued array v, optionally specifying a weighting vector wv and a pre-computed mean m.\n\nIf k is a range of Integers, then return all the cumulants of orders in this range as a vector.\n\nThis quantity is calculated using a recursive definition on lower-order cumulants and central moments.\n\nReference: Smith, P. J. 1995. A Recursive Formulation of the Old Problem of Obtaining Moments from Cumulants and Vice Versa. The American Statistician, 49(2), 217–218. https://doi.org/10.2307/2684642\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Measurements-of-Variation","page":"Scalar Statistics","title":"Measurements of Variation","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"span\nvariation\nsem\nmad\nmad!","category":"page"},{"location":"scalarstats/#StatsBase.span","page":"Scalar Statistics","title":"StatsBase.span","text":"span(x)\n\nReturn the span of a collection, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one pass using extrema.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.variation","page":"Scalar Statistics","title":"StatsBase.variation","text":"variation(x, m=mean(x); corrected=true)\n\nReturn the coefficient of variation of collection x, optionally specifying a precomputed mean m, and the optional correction parameter corrected. The coefficient of variation is the ratio of the standard deviation to the mean. If corrected is false, then std is calculated with denominator n. Else, the std is calculated with denominator n-1.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.sem","page":"Scalar Statistics","title":"StatsBase.sem","text":"sem(x; mean=nothing)\nsem(x::AbstractArray[, weights::AbstractWeights]; mean=nothing)\n\nReturn the standard error of the mean for a collection x. A pre-computed mean may be provided.\n\nWhen not using weights, this is the (sample) standard deviation divided by the square root of the sample size. If weights are used, the variance of the sample mean is calculated as follows:\n\nAnalyticWeights: Not implemented.\nFrequencyWeights: fracsum_i=1^n w_i (x_i - barx_i)^2(sum w_i) (sum w_i - 1)\nProbabilityWeights: fracnn-1 fracsum_i=1^n w_i^2 (x_i - barx_i)^2left( sum w_i right)^2\n\nThe standard error is then the square root of the above quantities.\n\nReferences\n\nCarl-Erik Särndal, Bengt Swensson, Jan Wretman (1992). Model Assisted Survey Sampling. New York: Springer. pp. 51-53.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mad","page":"Scalar Statistics","title":"StatsBase.mad","text":"mad(x; center=median(x), normalize=true)\n\nCompute the median absolute deviation (MAD) of collection x around center (by default, around the median).\n\nIf normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.mad!","page":"Scalar Statistics","title":"StatsBase.mad!","text":"StatsBase.mad!(x; center=median!(x), normalize=true)\n\nCompute the median absolute deviation (MAD) of array x around center (by default, around the median), overwriting x in the process.\n\nIf normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Z-scores","page":"Scalar Statistics","title":"Z-scores","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"zscore\nzscore!","category":"page"},{"location":"scalarstats/#StatsBase.zscore","page":"Scalar Statistics","title":"StatsBase.zscore","text":"zscore(X, [μ, σ])\n\nCompute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. (x - μ) σ.\n\nμ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.zscore!","page":"Scalar Statistics","title":"StatsBase.zscore!","text":"zscore!([Z], X, μ, σ)\n\nCompute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. (x - μ) σ.\n\nIf a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Entropy-and-Related-Functions","page":"Scalar Statistics","title":"Entropy and Related Functions","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"entropy\nrenyientropy\ncrossentropy\nkldivergence","category":"page"},{"location":"scalarstats/#StatsBase.entropy","page":"Scalar Statistics","title":"StatsBase.entropy","text":"entropy(p, [b])\n\nCompute the entropy of a collection of probabilities p, optionally specifying a real number b such that the entropy is scaled by 1/log(b). Elements with probability 0 or 1 add 0 to the entropy.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.renyientropy","page":"Scalar Statistics","title":"StatsBase.renyientropy","text":"renyientropy(p, α)\n\nCompute the Rényi (generalized) entropy of order α of an array p.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.crossentropy","page":"Scalar Statistics","title":"StatsBase.crossentropy","text":"crossentropy(p, q, [b])\n\nCompute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.kldivergence","page":"Scalar Statistics","title":"StatsBase.kldivergence","text":"kldivergence(p, q, [b])\n\nCompute the Kullback-Leibler divergence from q to p, also called the relative entropy of p with respect to q, that is the sum pᵢ * log(pᵢ / qᵢ). Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Quantile-and-Related-Functions","page":"Scalar Statistics","title":"Quantile and Related Functions","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"percentile\niqr\nnquantile\nquantile\nStatistics.median(v::AbstractVector{<:Real}, w::AbstractWeights{<:Real})\nquantilerank\npercentilerank","category":"page"},{"location":"scalarstats/#StatsBase.percentile","page":"Scalar Statistics","title":"StatsBase.percentile","text":"percentile(x, p)\n\nReturn the pth percentile of a collection x, i.e. quantile(x, p / 100).\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.iqr","page":"Scalar Statistics","title":"StatsBase.iqr","text":"iqr(x)\n\nCompute the interquartile range (IQR) of collection x, i.e. the 75th percentile minus the 25th percentile.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.nquantile","page":"Scalar Statistics","title":"StatsBase.nquantile","text":"nquantile(x, n::Integer)\n\nReturn the n-quantiles of collection x, i.e. the values which partition v into n subsets of nearly equal size.\n\nEquivalent to quantile(x, [0:n]/n). For example, nquantiles(x, 5) returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.quantile","page":"Scalar Statistics","title":"Statistics.quantile","text":"quantile(v, w::AbstractWeights, p)\n\nCompute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.\n\nWith FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.\n\nWith non FrequencyWeights, denote N the length of the vector, w the vector of weights, h = p (sum_i= N w_i - w_1) + w_1 the cumulative weight corresponding to the probability p and S_k = sum_i=k w_i the cumulative weight for each observation, define v_k+1 the smallest element of v such that S_k+1 is strictly superior to h. The weighted p quantile is given by v_k + gamma (v_k+1 - v_k) with gamma = (h - S_k)(S_k+1 - S_k). In particular, when all weights are equal, the function returns the same result as the unweighted quantile.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Statistics.median-Tuple{AbstractVector{<:Real}, AbstractWeights}","page":"Scalar Statistics","title":"Statistics.median","text":"median(v::AbstractVector{<:Real}, w::AbstractWeights)\n\nCompute the weighted median of v with weights w (of type AbstractWeights). See the documentation for quantile for more details.\n\n\n\n\n\n","category":"method"},{"location":"scalarstats/#StatsBase.quantilerank","page":"Scalar Statistics","title":"StatsBase.quantilerank","text":"quantilerank(itr, value; method=:inc)\n\nCompute the quantile position in the [0, 1] interval of value relative to collection itr.\n\nDifferent definitions can be chosen via the method keyword argument. Let count_less be the number of elements of itr that are less than value, count_equal the number of elements of itr that are equal to value, n the length of itr, greatest_smaller the highest value below value and smallest_greater the lowest value above value. Then method supports the following definitions:\n\n:inc (default): Return a value in the range 0 to 1 inclusive.\n\nReturn count_less / (n - 1) if value ∈ itr, otherwise apply interpolation based on definition 7 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK and PERCENTRANK.INC). This definition corresponds to the lower semi-continuous inverse of quantile with its default parameters.\n\n:exc: Return a value in the range 0 to 1 exclusive.\n\nReturn (count_less + 1) / (n + 1) if value ∈ itr otherwise apply interpolation based on definition 6 of quantile in Hyndman and Fan (1996) (equivalent to Excel PERCENTRANK.EXC).\n\n:compete: Return count_less / (n - 1) if value ∈ itr, otherwise\n\nreturn (count_less - 1) / (n - 1), without interpolation (equivalent to MariaDB PERCENT_RANK, dplyr percent_rank).\n\n:tied: Return (count_less + count_equal/2) / n, without interpolation.\n\nBased on the definition in Roscoe, J. T. (1975) (equivalent to \"mean\" kind of SciPy percentileofscore).\n\n:strict: Return count_less / n, without interpolation\n\n(equivalent to \"strict\" kind of SciPy percentileofscore).\n\n:weak: Return (count_less + count_equal) / n, without interpolation\n\n(equivalent to \"weak\" kind of SciPy percentileofscore).\n\nnote: Note\nAn ArgumentError is thrown if itr contains NaN or missing values or if itr contains fewer than two elements.\n\nReferences\n\nRoscoe, J. T. (1975). Fundamental Research Statistics for the Behavioral Sciences\", 2nd ed., New York : Holt, Rinehart and Winston.\n\nHyndman, R.J and Fan, Y. (1996) \"Sample Quantiles in Statistical Packages\", The American Statistician, Vol. 50, No. 4, pp. 361-365.\n\nExamples\n\njulia> using StatsBase\n\njulia> v1 = [1, 1, 1, 2, 3, 4, 8, 11, 12, 13];\n\njulia> v2 = [1, 2, 3, 5, 6, missing, 8];\n\njulia> v3 = [1, 2, 3, 4, 4, 5, 6, 7, 8, 9];\n\njulia> quantilerank(v1, 2)\n0.3333333333333333\n\njulia> quantilerank(v1, 2, method=:exc), quantilerank(v1, 2, method=:tied)\n(0.36363636363636365, 0.35)\n\n# use `skipmissing` for vectors with missing entries.\njulia> quantilerank(skipmissing(v2), 4)\n0.5\n\n# use broadcasting with `Ref` to compute quantile rank for multiple values\njulia> quantilerank.(Ref(v3), [4, 8])\n2-element Vector{Float64}:\n 0.3333333333333333\n 0.8888888888888888\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.percentilerank","page":"Scalar Statistics","title":"StatsBase.percentilerank","text":"percentilerank(itr, value; method=:inc)\n\nReturn the qth percentile of value in collection itr, i.e. quantilerank(itr, value) * 100.\n\nSee the quantilerank docstring for more details.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Mode-and-Modes","page":"Scalar Statistics","title":"Mode and Modes","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"mode\nmodes","category":"page"},{"location":"scalarstats/#StatsBase.mode","page":"Scalar Statistics","title":"StatsBase.mode","text":"mode(a, [r])\nmode(a::AbstractArray, wv::AbstractWeights)\n\nReturn the mode (most common number) of an array, optionally over a specified range r or weighted via a vector wv. If several modes exist, the first one (in order of appearance) is returned.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#StatsBase.modes","page":"Scalar Statistics","title":"StatsBase.modes","text":"modes(a, [r])::Vector\nmode(a::AbstractArray, wv::AbstractWeights)::Vector\n\nReturn all modes (most common numbers) of an array, optionally over a specified range r or weighted via vector wv.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Summary-Statistics","page":"Scalar Statistics","title":"Summary Statistics","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"summarystats\ndescribe","category":"page"},{"location":"scalarstats/#StatsBase.summarystats","page":"Scalar Statistics","title":"StatsBase.summarystats","text":"summarystats(a)\n\nCompute summary statistics for a real-valued array a. Returns a SummaryStats object containing the number of observations, number of missing observations, standard deviation, mean, minimum, 25th percentile, median, 75th percentile, and maximum.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#DataAPI.describe","page":"Scalar Statistics","title":"DataAPI.describe","text":"describe(a)\n\nPretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.\n\n\n\n\n\n","category":"function"},{"location":"scalarstats/#Reliability-Measures","page":"Scalar Statistics","title":"Reliability Measures","text":"","category":"section"},{"location":"scalarstats/","page":"Scalar Statistics","title":"Scalar Statistics","text":"cronbachalpha","category":"page"},{"location":"scalarstats/#StatsBase.cronbachalpha","page":"Scalar Statistics","title":"StatsBase.cronbachalpha","text":"cronbachalpha(covmatrix::AbstractMatrix{<:Real})\n\nCalculate Cronbach's alpha (1951) from a covariance matrix covmatrix according to the formula:\n\nrho = frackk-1 (1 - fracsum^k_i=1 sigma^2_isum_i=1^k sum_j=1^k sigma_ij)\n\nwhere k is the number of items, i.e. columns, sigma_i^2 the item variance, and sigma_ij the inter-item covariance.\n\nReturns a CronbachAlpha object that holds:\n\nalpha: the Cronbach's alpha score for all items, i.e. columns, in covmatrix; and\ndropped: a vector giving Cronbach's alpha scores if a specific item, i.e. column, is dropped from covmatrix.\n\nExample\n\njulia> using StatsBase\n\njulia> cov_X = [10 6 6 6;\n 6 11 6 6;\n 6 6 12 6;\n 6 6 6 13];\n\njulia> cronbachalpha(cov_X)\nCronbach's alpha for all items: 0.8136\n\nCronbach's alpha if an item is dropped:\nitem 1: 0.7500\nitem 2: 0.7606\nitem 3: 0.7714\nitem 4: 0.7826\n\n\n\n\n\n","category":"function"},{"location":"ranking/#Rankings-and-Rank-Correlations","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"","category":"section"},{"location":"ranking/","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"This package implements various strategies for computing ranks and rank correlations.","category":"page"},{"location":"ranking/","page":"Rankings and Rank Correlations","title":"Rankings and Rank Correlations","text":"ordinalrank\ncompeterank\ndenserank\ntiedrank\ncorspearman\ncorkendall","category":"page"},{"location":"ranking/#StatsBase.ordinalrank","page":"Rankings and Rank Correlations","title":"StatsBase.ordinalrank","text":"ordinalrank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the ordinal ranking (\"1234\" ranking) of an array. Supports the same keyword arguments as the sort function. All items in x are given distinct, successive ranks based on their position in the sorted vector. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.competerank","page":"Rankings and Rank Correlations","title":"StatsBase.competerank","text":"competerank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the standard competition ranking (\"1224\" ranking) of an array. Supports the same keyword arguments as the sort function. Equal (\"tied\") items are given the same rank, and the next rank comes after a gap that is equal to the number of tied items - 1. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.denserank","page":"Rankings and Rank Correlations","title":"StatsBase.denserank","text":"denserank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the dense ranking (\"1223\" ranking) of an array. Supports the same keyword arguments as the sort function. Equal items receive the same rank, and the next subsequent rank is assigned with no gap. Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.tiedrank","page":"Rankings and Rank Correlations","title":"StatsBase.tiedrank","text":"tiedrank(x; lt=isless, by=identity, rev::Bool=false, ...)\n\nReturn the tied ranking, also called fractional or \"1 2.5 2.5 4\" ranking, of an array. Supports the same keyword arguments as the sort function. Equal (\"tied\") items receive the mean of the ranks they would have been assigned under the ordinal ranking (see ordinalrank). Missing values are assigned rank missing.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.corspearman","page":"Rankings and Rank Correlations","title":"StatsBase.corspearman","text":"corspearman(x, y=x)\n\nCompute Spearman's rank correlation coefficient. If x and y are vectors, the output is a float, otherwise it's a matrix corresponding to the pairwise correlations of the columns of x and y.\n\n\n\n\n\n","category":"function"},{"location":"ranking/#StatsBase.corkendall","page":"Rankings and Rank Correlations","title":"StatsBase.corkendall","text":"corkendall(x, y=x)\n\nCompute Kendall's rank correlation coefficient, τ. x and y must both be either matrices or vectors.\n\n\n\n\n\n","category":"function"},{"location":"robust/#Robust-Statistics","page":"Robust Statistics","title":"Robust Statistics","text":"","category":"section"},{"location":"robust/","page":"Robust Statistics","title":"Robust Statistics","text":"trim\ntrim!\nwinsor\nwinsor!\ntrimvar","category":"page"},{"location":"robust/#StatsBase.trim","page":"Robust Statistics","title":"StatsBase.trim","text":"trim(x::AbstractVector; prop=0.0, count=0)\n\nReturn an iterator of all elements of x that omits either count or proportion prop of the highest and lowest elements.\n\nThe number of trimmed elements could be smaller than specified if several elements equal the lower or upper bound.\n\nTo compute the trimmed mean of x use mean(trim(x)); to compute the variance use trimvar(x) (see trimvar).\n\nExample\n\njulia> collect(trim([5,2,4,3,1], prop=0.2))\n3-element Array{Int64,1}:\n 2\n 4\n 3\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.trim!","page":"Robust Statistics","title":"StatsBase.trim!","text":"trim!(x::AbstractVector; prop=0.0, count=0)\n\nA variant of trim that modifies x in place.\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.winsor","page":"Robust Statistics","title":"StatsBase.winsor","text":"winsor(x::AbstractVector; prop=0.0, count=0)\n\nReturn an iterator of all elements of x that replaces either count or proportion prop of the highest elements with the previous-highest element and an equal number of the lowest elements with the next-lowest element.\n\nThe number of replaced elements could be smaller than specified if several elements equal the lower or upper bound.\n\nTo compute the Winsorized mean of x use mean(winsor(x)).\n\nExample\n\njulia> collect(winsor([5,2,3,4,1], prop=0.2))\n5-element Array{Int64,1}:\n 4\n 2\n 3\n 4\n 2\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.winsor!","page":"Robust Statistics","title":"StatsBase.winsor!","text":"winsor!(x::AbstractVector; prop=0.0, count=0)\n\nA variant of winsor that modifies vector x in place.\n\n\n\n\n\n","category":"function"},{"location":"robust/#StatsBase.trimvar","page":"Robust Statistics","title":"StatsBase.trimvar","text":"trimvar(x; prop=0.0, count=0)\n\nCompute the variance of the trimmed mean of x. This function uses the Winsorized variance, as described in Wilcox (2010).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Sampling-from-Population","page":"Sampling from Population","title":"Sampling from Population","text":"","category":"section"},{"location":"sampling/#Sampling-API","page":"Sampling from Population","title":"Sampling API","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"The package provides functions for sampling from a given population (with or without replacement).","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"sample\nsample!\nwsample\nwsample!","category":"page"},{"location":"sampling/#StatsBase.sample","page":"Sampling from Population","title":"StatsBase.sample","text":"sample([rng], a, [wv::AbstractWeights])\n\nSelect a single random element of a. Sampling probabilities are proportional to the weights given in wv, if provided.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)\n\nSelect a random, optionally weighted sample of size n from an array a using a polyalgorithm. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)\n\nSelect a random, optionally weighted sample from an array a specifying the dimensions dims of the output array. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsample([rng], wv::AbstractWeights)\n\nSelect a single random integer in 1:length(wv) with probabilities proportional to the weights given in wv.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.sample!","page":"Sampling from Population","title":"StatsBase.sample!","text":"sample!([rng], a, [wv::AbstractWeights], x; replace=true, ordered=false)\n\nDraw a random sample of length(x) elements from an array a and store the result in x. A polyalgorithm is used for sampling. Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\nOutput array a must not be the same object as x or wv nor share memory with them, or the result may be incorrect.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.wsample","page":"Sampling from Population","title":"StatsBase.wsample","text":"wsample([rng], [a], w)\n\nSelect a weighted random sample of size 1 from a with probabilities proportional to the weights given in w. If a is not present, select a random weight from w.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nwsample([rng], [a], w, n::Integer; replace=true, ordered=false)\n\nSelect a weighted random sample of size n from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nwsample([rng], [a], w, dims::Dims; replace=true, ordered=false)\n\nSelect a weighted random sample from a with probabilities proportional to the weights given in w if a is present, otherwise select a random sample of size n of the weights given in w. The dimensions of the output are given by dims.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.wsample!","page":"Sampling from Population","title":"StatsBase.wsample!","text":"wsample!([rng], a, w, x; replace=true, ordered=false)\n\nSelect a weighted sample from an array a and store the result in x. Sampling probabilities are proportional to the weights given in w. replace dictates whether sampling is performed with replacement. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in a) should be taken.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Algorithms","page":"Sampling from Population","title":"Algorithms","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Internally, this package implements multiple algorithms, and the sample (and sample!) methods integrate them into a poly-algorithm, which chooses a specific algorithm based on inputs.","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Note that the choices made in sample are decided based on extensive benchmarking (see perf/sampling.jl and perf/wsampling.jl). It performs reasonably fast for most cases. That being said, if you know that a certain algorithm is particularly suitable for your context, directly calling an internal algorithm function might be slightly more efficient.","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"Here are a list of algorithms implemented in the package. The functions below are not exported (one can still import them from StatsBase via using though).","category":"page"},{"location":"sampling/#Notations","page":"Sampling from Population","title":"Notations","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"a: source array representing the population\nx: the destination array\nwv: the weight vector (of type AbstractWeights), for weighted sampling\nn: the length of a\nk: the length of x. For sampling without replacement, k must not exceed n.\nrng: optional random number generator (defaults to Random.default_rng() on Julia >= 1.3 and Random.GLOBAL_RNG on Julia < 1.3)","category":"page"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"All following functions write results to x (pre-allocated) and return x.","category":"page"},{"location":"sampling/#Sampling-Algorithms-(Non-Weighted)","page":"Sampling from Population","title":"Sampling Algorithms (Non-Weighted)","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"StatsBase.direct_sample!(rng::Random.AbstractRNG, a::AbstractArray, x::AbstractArray)\nsamplepair\nStatsBase.knuths_sample!\nStatsBase.fisher_yates_sample!\nStatsBase.self_avoid_sample!\nStatsBase.seqsample_a!\nStatsBase.seqsample_c!\nStatsBase.seqsample_d!","category":"page"},{"location":"sampling/#StatsBase.direct_sample!-Tuple{AbstractRNG, AbstractArray, AbstractArray}","page":"Sampling from Population","title":"StatsBase.direct_sample!","text":"direct_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nDirect sampling: for each j in 1:k, randomly pick i from 1:n, and set x[j] = a[i], with n=length(a) and k=length(x).\n\nThis algorithm consumes k random numbers.\n\n\n\n\n\n","category":"method"},{"location":"sampling/#StatsBase.samplepair","page":"Sampling from Population","title":"StatsBase.samplepair","text":"samplepair([rng], n)\n\nDraw a pair of distinct integers between 1 and n without replacement.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\nsamplepair([rng], a)\n\nDraw a pair of distinct elements from the array a without replacement.\n\nOptionally specify a random number generator rng as the first argument (defaults to Random.default_rng()).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.knuths_sample!","page":"Sampling from Population","title":"StatsBase.knuths_sample!","text":"knuths_sample!([rng], a, x)\n\nKnuth's Algorithm S for random sampling without replacement.\n\nReference: D. Knuth. The Art of Computer Programming. Vol 2, 3.4.2, p.142.\n\nThis algorithm consumes length(a) random numbers. It requires no additional memory space. Suitable for the case where memory is tight.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.fisher_yates_sample!","page":"Sampling from Population","title":"StatsBase.fisher_yates_sample!","text":"fisher_yates_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nFisher-Yates shuffling (with early termination).\n\nPseudo-code:\n\nn = length(a)\nk = length(x)\n\n# Create an array of the indices\ninds = collect(1:n)\n\nfor i = 1:k\n # swap element `i` with another random element in inds[i:n]\n # set element `i` in `x`\nend\n\nThis algorithm consumes k=length(x) random numbers. It uses an integer array of length n=length(a) internally to maintain the shuffled indices. It is considerably faster than Knuth's algorithm especially when n is greater than k. It is O(n) for initialization, plus O(k) for random shuffling\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.self_avoid_sample!","page":"Sampling from Population","title":"StatsBase.self_avoid_sample!","text":"self_avoid_sample!([rng], a::AbstractArray, x::AbstractArray)\n\nSelf-avoid sampling: use a set to maintain the index that has been sampled. Each time draw a new index, if the index has already been sampled, redraw until it draws an unsampled one.\n\nThis algorithm consumes about (or slightly more than) k=length(x) random numbers, and requires O(k) memory to store the set of sampled indices. Very fast when n k, with n=length(a).\n\nHowever, if k is large and approaches n, the rejection rate would increase drastically, resulting in poorer performance.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_a!","page":"Sampling from Population","title":"StatsBase.seqsample_a!","text":"seqsample_a!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm A described in the following paper (page 714): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(n) random numbers, with n=length(a). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_c!","page":"Sampling from Population","title":"StatsBase.seqsample_c!","text":"seqsample_c!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm C described in the following paper (page 715): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(k^2) random numbers, with k=length(x). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.seqsample_d!","page":"Sampling from Population","title":"StatsBase.seqsample_d!","text":"seqsample_d!([rng], a::AbstractArray, x::AbstractArray)\n\nRandom subsequence sampling using algorithm D described in the following paper (page 716-17): Jeffrey Scott Vitter. \"Faster Methods for Random Sampling\". Communications of the ACM, 27 (7), July 1984.\n\nThis algorithm consumes O(k) random numbers, with k=length(x). The outputs are ordered.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#Weighted-Sampling-Algorithms","page":"Sampling from Population","title":"Weighted Sampling Algorithms","text":"","category":"section"},{"location":"sampling/","page":"Sampling from Population","title":"Sampling from Population","text":"StatsBase.direct_sample!(rng::Random.AbstractRNG, a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\nStatsBase.alias_sample!\nStatsBase.naive_wsample_norep!\nStatsBase.efraimidis_a_wsample_norep!\nStatsBase.efraimidis_ares_wsample_norep!","category":"page"},{"location":"sampling/#StatsBase.direct_sample!-Tuple{AbstractRNG, AbstractArray, AbstractWeights, AbstractArray}","page":"Sampling from Population","title":"StatsBase.direct_sample!","text":"direct_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nDirect sampling.\n\nDraw each sample by scanning the weight vector.\n\nNoting k=length(x) and n=length(a), this algorithm:\n\nconsumes k random numbers\nhas time complexity O(n k), as scanning the weight vector each time takes O(n)\nrequires no additional memory space.\n\n\n\n\n\n","category":"method"},{"location":"sampling/#StatsBase.alias_sample!","page":"Sampling from Population","title":"StatsBase.alias_sample!","text":"alias_sample!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nAlias method.\n\nBuild an alias table, and sample therefrom.\n\nReference: Walker, A. J. \"An Efficient Method for Generating Discrete Random Variables with General Distributions.\" ACM Transactions on Mathematical Software 3 (3): 253, 1977.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(n) time for building the alias table, and then O(1) to draw each sample. It consumes k random numbers.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.naive_wsample_norep!","page":"Sampling from Population","title":"StatsBase.naive_wsample_norep!","text":"naive_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nNaive implementation of weighted sampling without replacement.\n\nIt makes a copy of the weight vector at initialization, and sets the weight to zero when the corresponding sample is picked.\n\nNoting k=length(x) and n=length(a), this algorithm consumes O(k) random numbers, and has overall time complexity O(n k).\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.efraimidis_a_wsample_norep!","page":"Sampling from Population","title":"StatsBase.efraimidis_a_wsample_norep!","text":"efraimidis_a_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nWeighted sampling without replacement using Efraimidis-Spirakis A algorithm.\n\nReference: Efraimidis, P. S., Spirakis, P. G. \"Weighted random sampling with a reservoir.\" Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(n + k log k) processing time to draw k elements. It consumes n random numbers.\n\n\n\n\n\n","category":"function"},{"location":"sampling/#StatsBase.efraimidis_ares_wsample_norep!","page":"Sampling from Population","title":"StatsBase.efraimidis_ares_wsample_norep!","text":"efraimidis_ares_wsample_norep!([rng], a::AbstractArray, wv::AbstractWeights, x::AbstractArray)\n\nImplementation of weighted sampling without replacement using Efraimidis-Spirakis A-Res algorithm.\n\nReference: Efraimidis, P. S., Spirakis, P. G. \"Weighted random sampling with a reservoir.\" Information Processing Letters, 97 (5), 181-185, 2006. doi:10.1016/j.ipl.2005.11.003.\n\nNoting k=length(x) and n=length(a), this algorithm takes O(k log(k) log(n k)) processing time to draw k elements. It consumes n random numbers.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#Computing-Deviations","page":"Computing Deviations","title":"Computing Deviations","text":"","category":"section"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"This package provides functions to compute various deviations between arrays in a variety of ways:","category":"page"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"counteq\ncountne\nsqL2dist\nL2dist\nL1dist\nLinfdist\ngkldiv\nmeanad\nmaxad\nmsd\nrmsd\npsnr","category":"page"},{"location":"deviation/#StatsBase.counteq","page":"Computing Deviations","title":"StatsBase.counteq","text":"counteq(a, b)\n\nCount the number of indices at which the elements of the arrays a and b are equal.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.countne","page":"Computing Deviations","title":"StatsBase.countne","text":"countne(a, b)\n\nCount the number of indices at which the elements of the arrays a and b are not equal.\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.sqL2dist","page":"Computing Deviations","title":"StatsBase.sqL2dist","text":"sqL2dist(a, b)\n\nCompute the squared L2 distance between two arrays: sum_i=1^n a_i - b_i^2. Efficient equivalent of sum(abs2, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.L2dist","page":"Computing Deviations","title":"StatsBase.L2dist","text":"L2dist(a, b)\n\nCompute the L2 distance between two arrays: sqrtsum_i=1^n a_i - b_i^2. Efficient equivalent of sqrt(sum(abs2, a - b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.L1dist","page":"Computing Deviations","title":"StatsBase.L1dist","text":"L1dist(a, b)\n\nCompute the L1 distance between two arrays: sum_i=1^n a_i - b_i. Efficient equivalent of sum(abs, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.Linfdist","page":"Computing Deviations","title":"StatsBase.Linfdist","text":"Linfdist(a, b)\n\nCompute the L∞ distance, also called the Chebyshev distance, between two arrays: max_iin1n a_i - b_i. Efficient equivalent of maxabs(a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.gkldiv","page":"Computing Deviations","title":"StatsBase.gkldiv","text":"gkldiv(a, b)\n\nCompute the generalized Kullback-Leibler divergence between two arrays: sum_i=1^n (a_i log(a_ib_i) - a_i + b_i). Efficient equivalent of sum(a*log(a/b)-a+b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.meanad","page":"Computing Deviations","title":"StatsBase.meanad","text":"meanad(a, b)\n\nReturn the mean absolute deviation between two arrays: mean(abs, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.maxad","page":"Computing Deviations","title":"StatsBase.maxad","text":"maxad(a, b)\n\nReturn the maximum absolute deviation between two arrays: maxabs(a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.msd","page":"Computing Deviations","title":"StatsBase.msd","text":"msd(a, b)\n\nReturn the mean squared deviation between two arrays: mean(abs2, a - b).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.rmsd","page":"Computing Deviations","title":"StatsBase.rmsd","text":"rmsd(a, b; normalize=false)\n\nReturn the root mean squared deviation between two optionally normalized arrays. The root mean squared deviation is computed as sqrt(msd(a, b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/#StatsBase.psnr","page":"Computing Deviations","title":"StatsBase.psnr","text":"psnr(a, b, maxv)\n\nCompute the peak signal-to-noise ratio between two arrays a and b. maxv is the maximum possible value either array can take. The PSNR is computed as 10 * log10(maxv^2 / msd(a, b)).\n\n\n\n\n\n","category":"function"},{"location":"deviation/","page":"Computing Deviations","title":"Computing Deviations","text":"note: Note\nAll these functions are implemented in a reasonably efficient way without creating any temporary arrays in the middle.","category":"page"},{"location":"#Getting-Started","page":"Getting Started","title":"Getting Started","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"CurrentModule = StatsBase\nDocTestSetup = quote\n using Statistics\n using Random\nend","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"StatsBase.jl is a Julia package that provides basic support for statistics. Particularly, it implements a variety of statistics-related functions, such as scalar statistics, high-order moment computation, counting, ranking, covariances, sampling, and empirical density estimation.","category":"page"},{"location":"#Installation","page":"Getting Started","title":"Installation","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"To install StatsBase through the Julia REPL, you can type ] add StatsBase or:","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"using Pkg\nPkg.add(\"StatsBase\")","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"To load the package, use the command:","category":"page"},{"location":"","page":"Getting Started","title":"Getting Started","text":"using StatsBase","category":"page"},{"location":"#Available-Features","page":"Getting Started","title":"Available Features","text":"","category":"section"},{"location":"","page":"Getting Started","title":"Getting Started","text":"Pages = [\"weights.md\", \"scalarstats.md\", \"robust.md\", \"deviation.md\", \"cov.md\", \"counts.md\", \"ranking.md\", \"sampling.md\", \"empirical.md\", \"signalcorr.md\", \"misc.md\", \"statmodels.md\", \"transformations.md\"]\nDepth = 2","category":"page"}] } diff --git a/dev/signalcorr/index.html b/dev/signalcorr/index.html index 26cff5dd..212574a4 100644 --- a/dev/signalcorr/index.html +++ b/dev/signalcorr/index.html @@ -1,2 +1,2 @@ -Correlation Analysis of Signals · StatsBase.jl

      Correlation Analysis of Signals

      The package provides functions to perform correlation analysis of sequential signals.

      Autocovariance and Autocorrelation

      StatsBase.autocovFunction
      autocov(x, [lags]; demean=true)

      Compute the autocovariance of a vector or matrix x, optionally specifying the lags at which to compute the autocovariance. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.

      When left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).

      The output is not normalized. See autocor for a function with normalization.

      source
      StatsBase.autocov!Function
      autocov!(r, x, lags; demean=true)

      Compute the autocovariance of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.

      If x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.

      The output is not normalized. See autocor! for a method with normalization.

      source
      StatsBase.autocorFunction
      autocor(x, [lags]; demean=true)

      Compute the autocorrelation function (ACF) of a vector or matrix x, optionally specifying the lags. demean denotes whether the mean of x should be subtracted from x before computing the ACF.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.

      When left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).

      The output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov for the unnormalized form.

      source
      StatsBase.autocor!Function
      autocor!(r, x, lags; demean=true)

      Compute the autocorrelation function (ACF) of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the ACF.

      If x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.

      The output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov! for the unnormalized form.

      source

      Cross-covariance and Cross-correlation

      StatsBase.crosscovFunction
      crosscov(x, y, [lags]; demean=true)

      Compute the cross covariance function (CCF) between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.

      If both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.

      When left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).

      The output is not normalized. See crosscor for a function with normalization.

      source
      StatsBase.crosscov!Function
      crosscov!(r, x, y, lags; demean=true)

      Compute the cross covariance function (CCF) between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.

      If both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).

      The output is not normalized. See crosscor! for a function with normalization.

      source
      StatsBase.crosscorFunction
      crosscor(x, y, [lags]; demean=true)

      Compute the cross correlation between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.

      If both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.

      When left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).

      The output is normalized by sqrt(var(x)*var(y)). See crosscov for the unnormalized form.

      source
      StatsBase.crosscor!Function
      crosscor!(r, x, y, lags; demean=true)

      Compute the cross correlation between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.

      If both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).

      The output is normalized by sqrt(var(x)*var(y)). See crosscov! for the unnormalized form.

      source

      Partial Autocorrelation Function

      StatsBase.pacfFunction
      pacf(X, lags; method=:regression)

      Compute the partial autocorrelation function (PACF) of a real-valued vector or matrix X at lags. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x, 2)), where each column in the result corresponds to a column in x.

      source
      StatsBase.pacf!Function
      pacf!(r, X, lags; method=:regression)

      Compute the partial autocorrelation function (PACF) of a matrix X at lags and store the result in r. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.

      r must be a matrix of size (length(lags), size(x, 2)).

      source
      +Correlation Analysis of Signals · StatsBase.jl

      Correlation Analysis of Signals

      The package provides functions to perform correlation analysis of sequential signals.

      Autocovariance and Autocorrelation

      StatsBase.autocovFunction
      autocov(x, [lags]; demean=true)

      Compute the autocovariance of a vector or matrix x, optionally specifying the lags at which to compute the autocovariance. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.

      When left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).

      The output is not normalized. See autocor for a function with normalization.

      source
      StatsBase.autocov!Function
      autocov!(r, x, lags; demean=true)

      Compute the autocovariance of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the autocovariance.

      If x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.

      The output is not normalized. See autocor! for a method with normalization.

      source
      StatsBase.autocorFunction
      autocor(x, [lags]; demean=true)

      Compute the autocorrelation function (ACF) of a vector or matrix x, optionally specifying the lags. demean denotes whether the mean of x should be subtracted from x before computing the ACF.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x,2)), where each column in the result corresponds to a column in x.

      When left unspecified, the lags used are the integers from 0 to min(size(x,1)-1, 10*log10(size(x,1))).

      The output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov for the unnormalized form.

      source
      StatsBase.autocor!Function
      autocor!(r, x, lags; demean=true)

      Compute the autocorrelation function (ACF) of a vector or matrix x at lags and store the result in r. demean denotes whether the mean of x should be subtracted from x before computing the ACF.

      If x is a vector, r must be a vector of the same length as lags. If x is a matrix, r must be a matrix of size (length(lags), size(x,2)), and where each column in the result will correspond to a column in x.

      The output is normalized by the variance of x, i.e. so that the lag 0 autocorrelation is 1. See autocov! for the unnormalized form.

      source

      Cross-covariance and Cross-correlation

      StatsBase.crosscovFunction
      crosscov(x, y, [lags]; demean=true)

      Compute the cross covariance function (CCF) between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.

      If both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.

      When left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).

      The output is not normalized. See crosscor for a function with normalization.

      source
      StatsBase.crosscov!Function
      crosscov!(r, x, y, lags; demean=true)

      Compute the cross covariance function (CCF) between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their CCF.

      If both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).

      The output is not normalized. See crosscor! for a function with normalization.

      source
      StatsBase.crosscorFunction
      crosscor(x, y, [lags]; demean=true)

      Compute the cross correlation between real-valued vectors or matrices x and y, optionally specifying the lags. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.

      If both x and y are vectors, return a vector of the same length as lags. Otherwise, compute cross covariances between each pairs of columns in x and y.

      When left unspecified, the lags used are the integers from -min(size(x,1)-1, 10*log10(size(x,1))) to min(size(x,1), 10*log10(size(x,1))).

      The output is normalized by sqrt(var(x)*var(y)). See crosscov for the unnormalized form.

      source
      StatsBase.crosscor!Function
      crosscor!(r, x, y, lags; demean=true)

      Compute the cross correlation between real-valued vectors or matrices x and y at lags and store the result in r. demean specifies whether the respective means of x and y should be subtracted from them before computing their cross correlation.

      If both x and y are vectors, r must be a vector of the same length as lags. If either x is a matrix and y is a vector, r must be a matrix of size (length(lags), size(x, 2)); if x is a vector and y is a matrix, r must be a matrix of size (length(lags), size(y, 2)). If both x and y are matrices, r must be a three-dimensional array of size (length(lags), size(x, 2), size(y, 2)).

      The output is normalized by sqrt(var(x)*var(y)). See crosscov! for the unnormalized form.

      source

      Partial Autocorrelation Function

      StatsBase.pacfFunction
      pacf(X, lags; method=:regression)

      Compute the partial autocorrelation function (PACF) of a real-valued vector or matrix X at lags. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.

      If x is a vector, return a vector of the same length as lags. If x is a matrix, return a matrix of size (length(lags), size(x, 2)), where each column in the result corresponds to a column in x.

      source
      StatsBase.pacf!Function
      pacf!(r, X, lags; method=:regression)

      Compute the partial autocorrelation function (PACF) of a matrix X at lags and store the result in r. method designates the estimation method. Recognized values are :regression, which computes the partial autocorrelations via successive regression models, and :yulewalker, which computes the partial autocorrelations using the Yule-Walker equations.

      r must be a matrix of size (length(lags), size(x, 2)).

      source
      diff --git a/dev/statmodels/index.html b/dev/statmodels/index.html index ada01b85..9f48a695 100644 --- a/dev/statmodels/index.html +++ b/dev/statmodels/index.html @@ -4,4 +4,4 @@ adjr²(model::StatisticalModel, variant::Symbol)

      Adjusted pseudo-coefficient of determination (adjusted pseudo R-squared). For nonlinear models, one of the several pseudo R² definitions must be chosen via variant. The only currently supported variants are :MacFadden, defined as $1 - (\log (L) - k)/\log (L0)$ and :devianceratio, defined as $1 - (D/(n-k))/(D_0/(n-1))$. In these formulas, $L$ is the likelihood of the model, $L0$ that of the null model (the model including only the intercept), $D$ is the deviance of the model, $D_0$ is the deviance of the null model, $n$ is the number of observations (given by nobs) and $k$ is the number of consumed degrees of freedom of the model (as returned by dof).

      StatsAPI.aicFunction
      aic(model::StatisticalModel)

      Akaike's Information Criterion, defined as $-2 \log L + 2k$, with $L$ the likelihood of the model, and k its number of consumed degrees of freedom (as returned by dof).

      StatsAPI.aiccFunction
      aicc(model::StatisticalModel)

      Corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as $-2 \log L + 2k + 2k(k-1)/(n-k-1)$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).

      StatsAPI.bicFunction
      bic(model::StatisticalModel)

      Bayesian Information Criterion, defined as $-2 \log L + k \log n$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).

      StatsAPI.coefFunction
      coef(model::StatisticalModel)

      Return the coefficients of the model.

      StatsAPI.coefnamesFunction
      coefnames(model::StatisticalModel)

      Return the names of the coefficients.

      StatsAPI.coeftableFunction
      coeftable(model::StatisticalModel; level::Real=0.95)

      Return a table with coefficients and related statistics of the model. level determines the level for confidence intervals (by default, 95%).

      The returned CoefTable object implements the Tables.jl interface, and can be converted e.g. to a DataFrame via using DataFrames; DataFrame(coeftable(model)).

      StatsAPI.confintFunction
      confint(model::StatisticalModel; level::Real=0.95)

      Compute confidence intervals for coefficients, with confidence level level (by default 95%).

      StatsAPI.devianceFunction
      deviance(model::StatisticalModel)

      Return the deviance of the model relative to a reference, which is usually when applicable the saturated model. It is equal, up to a constant, to $-2 \log L$, with $L$ the likelihood of the model.

      StatsAPI.dofFunction
      dof(model::StatisticalModel)

      Return the number of degrees of freedom consumed in the model, including when applicable the intercept and the distribution's dispersion parameter.

      StatsAPI.fitFunction

      Fit a statistical model.

      StatsAPI.fit!Function

      Fit a statistical model in-place.

      StatsAPI.informationmatrixFunction
      informationmatrix(model::StatisticalModel; expected::Bool = true)

      Return the information matrix of the model. By default the Fisher information matrix is returned, while the observed information matrix can be requested with expected = false.

      StatsAPI.isfittedFunction
      isfitted(model::StatisticalModel)

      Indicate whether the model has been fitted.

      StatsAPI.islinearFunction
      islinear(model::StatisticalModel)

      Indicate whether the model is linear.

      StatsAPI.loglikelihoodFunction
      loglikelihood(model::StatisticalModel)
       loglikelihood(model::StatisticalModel, observation)

      Return the log-likelihood of the model.

      With an observation argument, return the contribution of observation to the log-likelihood of model.

      If observation is a Colon, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.

      In general, sum(loglikehood(model, :)) == loglikelihood(model).

      StatsAPI.mssFunction
      mss(model::StatisticalModel)

      Return the model sum of squares.

      StatsAPI.nobsFunction
      nobs(model::StatisticalModel)

      Return the number of independent observations on which the model was fitted. Be careful when using this information, as the definition of an independent observation may vary depending on the model, on the format used to pass the data, on the sampling plan (if specified), etc.

      StatsAPI.nulldevianceFunction
      nulldeviance(model::StatisticalModel)

      Return the deviance of the null model, obtained by dropping all independent variables present in model.

      If model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).

      StatsAPI.nullloglikelihoodFunction
      nullloglikelihood(model::StatisticalModel)

      Return the log-likelihood of the null model, obtained by dropping all independent variables present in model.

      If model includes an intercept, the null model is the one with only the intercept; otherwise, it is the one without any predictor (not even the intercept).

      StatsAPI.r2Function
      r2(model::StatisticalModel)
       r²(model::StatisticalModel)

      Coefficient of determination (R-squared).

      For a linear model, the R² is defined as $ESS/TSS$, with $ESS$ the explained sum of squares and $TSS$ the total sum of squares.

      r2(model::StatisticalModel, variant::Symbol)
      -r²(model::StatisticalModel, variant::Symbol)

      Pseudo-coefficient of determination (pseudo R-squared).

      For nonlinear models, one of several pseudo R² definitions must be chosen via variant. Supported variants are:

      • :MacFadden (a.k.a. likelihood ratio index), defined as $1 - \log (L)/\log (L_0)$;
      • :CoxSnell, defined as $1 - (L_0/L)^{2/n}$;
      • :Nagelkerke, defined as $(1 - (L_0/L)^{2/n})/(1 - L_0^{2/n})$.
      • :devianceratio, defined as $1 - D/D_0$.

      In the above formulas, $L$ is the likelihood of the model, $L_0$ is the likelihood of the null model (the model with only an intercept), $D$ is the deviance of the model (from the saturated model), $D_0$ is the deviance of the null model, $n$ is the number of observations (given by nobs).

      The Cox-Snell and the deviance ratio variants both match the classical definition of R² for linear models.

      StatsAPI.rssFunction
      rss(model::StatisticalModel)

      Return the residual sum of squares of the model.

      StatsAPI.scoreFunction
      score(model::StatisticalModel)

      Return the score of the model, that is the gradient of the log-likelihood with respect to the coefficients.

      StatsAPI.stderrorFunction
      stderror(model::StatisticalModel)

      Return the standard errors for the coefficients of the model.

      StatsAPI.vcovFunction
      vcov(model::StatisticalModel)

      Return the variance-covariance matrix for the coefficients of the model.

      StatsAPI.weightsFunction
      weights(model::StatisticalModel)

      Return the weights used in the model.

      RegressionModel extends StatisticalModel by implementing the following additional methods.

      StatsAPI.crossmodelmatrixFunction
      crossmodelmatrix(model::RegressionModel)

      Return X'X where X is the model matrix of model. This function will return a pre-computed matrix stored in model if possible.

      StatsAPI.dof_residualFunction
      dof_residual(model::RegressionModel)

      Return the residual degrees of freedom of the model.

      StatsAPI.fittedFunction
      fitted(model::RegressionModel)

      Return the fitted values of the model.

      StatsAPI.leverageFunction
      leverage(model::RegressionModel)

      Return the diagonal of the projection matrix of the model.

      StatsAPI.cooksdistanceFunction
      cooksdistance(model::RegressionModel)

      Compute Cook's distance for each observation in linear model model, giving an estimate of the influence of each data point.

      StatsAPI.meanresponseFunction
      meanresponse(model::RegressionModel)

      Return the mean of the response.

      StatsAPI.modelmatrixFunction
      modelmatrix(model::RegressionModel)

      Return the model matrix (a.k.a. the design matrix).

      StatsAPI.responseFunction
      response(model::RegressionModel)

      Return the model response (a.k.a. the dependent variable).

      StatsAPI.responsenameFunction
      responsename(model::RegressionModel)

      Return the name of the model response (a.k.a. the dependent variable).

      StatsAPI.predictFunction
      predict(model::RegressionModel, [newX])

      Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

      StatsAPI.predict!Function
      predict!

      In-place version of predict.

      StatsAPI.residualsFunction
      residuals(model::RegressionModel)

      Return the residuals of the model.

      An exception type is provided to signal convergence failures during model estimation:

      StatsBase.ConvergenceExceptionType
      ConvergenceException(iters::Int, lastchange::Real=NaN, tol::Real=NaN)

      The fitting procedure failed to converge in iters number of iterations, i.e. the lastchange between the cost of the final and penultimate iteration was greater than specified tolerance tol.

      source
      +r²(model::StatisticalModel, variant::Symbol)

      Pseudo-coefficient of determination (pseudo R-squared).

      For nonlinear models, one of several pseudo R² definitions must be chosen via variant. Supported variants are:

      In the above formulas, $L$ is the likelihood of the model, $L_0$ is the likelihood of the null model (the model with only an intercept), $D$ is the deviance of the model (from the saturated model), $D_0$ is the deviance of the null model, $n$ is the number of observations (given by nobs).

      The Cox-Snell and the deviance ratio variants both match the classical definition of R² for linear models.

      StatsAPI.rssFunction
      rss(model::StatisticalModel)

      Return the residual sum of squares of the model.

      StatsAPI.scoreFunction
      score(model::StatisticalModel)

      Return the score of the model, that is the gradient of the log-likelihood with respect to the coefficients.

      StatsAPI.stderrorFunction
      stderror(model::StatisticalModel)

      Return the standard errors for the coefficients of the model.

      StatsAPI.vcovFunction
      vcov(model::StatisticalModel)

      Return the variance-covariance matrix for the coefficients of the model.

      StatsAPI.weightsFunction
      weights(model::StatisticalModel)

      Return the weights used in the model.

      RegressionModel extends StatisticalModel by implementing the following additional methods.

      StatsAPI.crossmodelmatrixFunction
      crossmodelmatrix(model::RegressionModel)

      Return X'X where X is the model matrix of model. This function will return a pre-computed matrix stored in model if possible.

      StatsAPI.dof_residualFunction
      dof_residual(model::RegressionModel)

      Return the residual degrees of freedom of the model.

      StatsAPI.fittedFunction
      fitted(model::RegressionModel)

      Return the fitted values of the model.

      StatsAPI.leverageFunction
      leverage(model::RegressionModel)

      Return the diagonal of the projection matrix of the model.

      StatsAPI.cooksdistanceFunction
      cooksdistance(model::RegressionModel)

      Compute Cook's distance for each observation in linear model model, giving an estimate of the influence of each data point.

      StatsAPI.meanresponseFunction
      meanresponse(model::RegressionModel)

      Return the mean of the response.

      StatsAPI.modelmatrixFunction
      modelmatrix(model::RegressionModel)

      Return the model matrix (a.k.a. the design matrix).

      StatsAPI.responseFunction
      response(model::RegressionModel)

      Return the model response (a.k.a. the dependent variable).

      StatsAPI.responsenameFunction
      responsename(model::RegressionModel)

      Return the name of the model response (a.k.a. the dependent variable).

      StatsAPI.predictFunction
      predict(model::RegressionModel, [newX])

      Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

      StatsAPI.predict!Function
      predict!

      In-place version of predict.

      StatsAPI.residualsFunction
      residuals(model::RegressionModel)

      Return the residuals of the model.

      An exception type is provided to signal convergence failures during model estimation:

      StatsBase.ConvergenceExceptionType
      ConvergenceException(iters::Int, lastchange::Real=NaN, tol::Real=NaN)

      The fitting procedure failed to converge in iters number of iterations, i.e. the lastchange between the cost of the final and penultimate iteration was greater than specified tolerance tol.

      source
      diff --git a/dev/transformations/index.html b/dev/transformations/index.html index 20f43336..35f486e9 100644 --- a/dev/transformations/index.html +++ b/dev/transformations/index.html @@ -12,7 +12,7 @@ julia> StatsBase.transform(dt, X) 2×3 Matrix{Float64}: 0.0 -1.0 1.0 - -1.0 0.0 1.0source

      Unit Range Normalization

      Unit range normalization, also known as min-max scaling, is an alternative data transformation which scales features to lie in the interval [0; 1].

      Unit range normalization can be performed using t = fit(UnitRangeTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(UnitRangeTransform, ...) is a shorthand to perform both operations in a single call.

      StatsAPI.fitMethod
      fit(UnitRangeTransform, X; dims=nothing, unit=true)

      Fit a scaling parameters to vector or matrix X and return a UnitRangeTransform transformation object.

      Keyword arguments

      • dims: if 1 fit standardization parameters in column-wise fashion;

      if 2 fit in row-wise fashion. The default is nothing.

      • unit: if true (the default) shift the minimum data to zero.

      Examples

      julia> using StatsBase
      + -1.0   0.0  1.0
      source

      Unit Range Normalization

      Unit range normalization, also known as min-max scaling, is an alternative data transformation which scales features to lie in the interval [0; 1].

      Unit range normalization can be performed using t = fit(UnitRangeTransform, ...) followed by StatsBase.transform(t, ...) or StatsBase.transform!(t, ...). standardize(UnitRangeTransform, ...) is a shorthand to perform both operations in a single call.

      StatsAPI.fitMethod
      fit(UnitRangeTransform, X; dims=nothing, unit=true)

      Fit a scaling parameters to vector or matrix X and return a UnitRangeTransform transformation object.

      Keyword arguments

      • dims: if 1 fit standardization parameters in column-wise fashion;

      if 2 fit in row-wise fashion. The default is nothing.

      • unit: if true (the default) shift the minimum data to zero.

      Examples

      julia> using StatsBase
       
       julia> X = [0.0 -0.5 0.5; 0.0 1.0 2.0]
       2×3 Matrix{Float64}:
      @@ -25,7 +25,7 @@
       julia> StatsBase.transform(dt, X)
       2×3 Matrix{Float64}:
        0.5  0.0  1.0
      - 0.0  0.5  1.0
      source

      Methods

      StatsBase.transformFunction
      transform(t::AbstractDataTransform, x)

      Return a standardized copy of vector or matrix x using transformation t.

      source
      StatsBase.transform!Function
      transform!(t::AbstractDataTransform, x)

      Apply transformation t to vector or matrix x in place.

      source
      StatsBase.reconstructFunction
      reconstruct(t::AbstractDataTransform, y)

      Return a reconstruction of an originally scaled data from a transformed vector or matrix y using transformation t.

      source
      StatsBase.reconstruct!Function
      reconstruct!(t::AbstractDataTransform, y)

      Perform an in-place reconstruction into an original data scale from a transformed vector or matrix y using transformation t.

      source
      StatsBase.standardizeFunction
      standardize(DT, X; dims=nothing, kwargs...)

      Return a standardized copy of vector or matrix X along dimensions dims using transformation DT which is a subtype of AbstractDataTransform:

      • ZScoreTransform
      • UnitRangeTransform

      Example

      julia> using StatsBase
      + 0.0  0.5  1.0
      source

      Methods

      StatsBase.transformFunction
      transform(t::AbstractDataTransform, x)

      Return a standardized copy of vector or matrix x using transformation t.

      source
      StatsBase.transform!Function
      transform!(t::AbstractDataTransform, x)

      Apply transformation t to vector or matrix x in place.

      source
      StatsBase.reconstructFunction
      reconstruct(t::AbstractDataTransform, y)

      Return a reconstruction of an originally scaled data from a transformed vector or matrix y using transformation t.

      source
      StatsBase.reconstruct!Function
      reconstruct!(t::AbstractDataTransform, y)

      Perform an in-place reconstruction into an original data scale from a transformed vector or matrix y using transformation t.

      source
      StatsBase.standardizeFunction
      standardize(DT, X; dims=nothing, kwargs...)

      Return a standardized copy of vector or matrix X along dimensions dims using transformation DT which is a subtype of AbstractDataTransform:

      • ZScoreTransform
      • UnitRangeTransform

      Example

      julia> using StatsBase
       
       julia> standardize(ZScoreTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)
       2×3 Matrix{Float64}:
      @@ -35,4 +35,4 @@
       julia> standardize(UnitRangeTransform, [0.0 -0.5 0.5; 0.0 1.0 2.0], dims=2)
       2×3 Matrix{Float64}:
        0.5  0.0  1.0
      - 0.0  0.5  1.0
      source

      Types

      StatsBase.UnitRangeTransformType

      Unit range normalization

      source
      StatsBase.ZScoreTransformType

      Standardization (Z-score transformation)

      source
      + 0.0 0.5 1.0source

      Types

      StatsBase.UnitRangeTransformType

      Unit range normalization

      source
      StatsBase.ZScoreTransformType

      Standardization (Z-score transformation)

      source
      diff --git a/dev/weights/index.html b/dev/weights/index.html index 48a3cb9f..cd461840 100644 --- a/dev/weights/index.html +++ b/dev/weights/index.html @@ -40,7 +40,7 @@ length isempty values -sum

      The following constructors are provided:

      StatsBase.AnalyticWeightsType
      AnalyticWeights(vs, wsum=sum(vs))

      Construct an AnalyticWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.

      source
      StatsBase.FrequencyWeightsType
      FrequencyWeights(vs, wsum=sum(vs))

      Construct a FrequencyWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

      source
      StatsBase.ProbabilityWeightsType
      ProbabilityWeights(vs, wsum=sum(vs))

      Construct a ProbabilityWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.

      source
      StatsBase.UnitWeightsType
      UnitWeights{T}(s)

      Construct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.

      source
      StatsBase.WeightsType
      Weights(vs, wsum=sum(vs))

      Construct a Weights vector with weight values vs. A precomputed sum may be provided as wsum.

      The Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights and ProbabilityWeights.

      source
      StatsBase.aweightsFunction
      aweights(vs)

      Construct an AnalyticWeights vector from array vs. See the documentation for AnalyticWeights for more details.

      source
      StatsBase.fweightsFunction
      fweights(vs)

      Construct a FrequencyWeights vector from a given array. See the documentation for FrequencyWeights for more details.

      source
      StatsBase.pweightsFunction
      pweights(vs)

      Construct a ProbabilityWeights vector from a given array. See the documentation for ProbabilityWeights for more details.

      source
      StatsBase.eweightsFunction
      eweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)
      +sum

      The following constructors are provided:

      StatsBase.AnalyticWeightsType
      AnalyticWeights(vs, wsum=sum(vs))

      Construct an AnalyticWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.

      source
      StatsBase.FrequencyWeightsType
      FrequencyWeights(vs, wsum=sum(vs))

      Construct a FrequencyWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

      source
      StatsBase.ProbabilityWeightsType
      ProbabilityWeights(vs, wsum=sum(vs))

      Construct a ProbabilityWeights vector with weight values vs. A precomputed sum may be provided as wsum.

      Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.

      source
      StatsBase.UnitWeightsType
      UnitWeights{T}(s)

      Construct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.

      source
      StatsBase.eweightsFunction
      eweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)
       eweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real; scale=false) where T
       eweights(n::Integer, λ::Real; scale=false)

      Construct a Weights vector which assigns exponentially decreasing weights to past observations (larger integer values i in t). The integer value n represents the number of past observations to consider. n defaults to maximum(t) - minimum(t) + 1 if only t is passed in and the elements are integers, and to length(r) if a superset range r is also passed in. If n is explicitly passed instead of t, t defaults to 1:n.

      If scale is true then for each element i in t the weight value is computed as:

      $(1 - λ)^{n - i}$

      If scale is false then each value is computed as:

      $λ (1 - λ)^{1 - i}$

      Arguments

      • t::AbstractVector: temporal indices or timestamps
      • r::StepRange: a larger range to use when constructing weights from a subset of timestamps
      • n::Integer: the number of past events to consider
      • λ::Real: a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

      Keyword arguments

      • scale::Bool: Return the weights scaled to between 0 and 1 (default: false)

      Examples

      julia> eweights(1:10, 0.3; scale=true)
       10-element Weights{Float64,Float64,Array{Float64,1}}:
      @@ -53,7 +53,7 @@
        0.3429999999999999
        0.48999999999999994
        0.7
      - 1.0

      Links

      • https://en.wikipedia.org/wiki/Movingaverage#Exponentialmoving_average
      • https://en.wikipedia.org/wiki/Exponential_smoothing
      source
      StatsBase.uweightsFunction
      uweights(s::Integer)
      + 1.0

      Links

      • https://en.wikipedia.org/wiki/Movingaverage#Exponentialmoving_average
      • https://en.wikipedia.org/wiki/Exponential_smoothing
      source
      StatsBase.uweightsFunction
      uweights(s::Integer)
       uweights(::Type{T}, s::Integer) where T<:Real

      Construct a UnitWeights vector with length s and weight elements of type T. All weight elements are identically one.

      Examples

      julia> uweights(3)
       3-element UnitWeights{Int64}:
        1
      @@ -64,4 +64,4 @@
       3-element UnitWeights{Float64}:
        1.0
        1.0
      - 1.0
      source
      StatsAPI.weightsMethod
      weights(vs::AbstractArray{<:Real})

      Construct a Weights vector from array vs. See the documentation for Weights for more details.

      source
      + 1.0source
      StatsAPI.weightsMethod
      weights(vs::AbstractArray{<:Real})

      Construct a Weights vector from array vs. See the documentation for Weights for more details.

      source