Skip to content

PreProcessing for ML modeled after sklearn.preprocessing

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



5 Commits

Repository files navigation

Library modeled after preprocessing module of scikit-learn. This library intends to implement the following transformers

  • StandardScaler
  • MinMaxScaler
  • MaxAbsScaler
  • Binarizer
  • Normalizer
  • OneHotEncoder


  • Generalize to handle 1D arrays
julia> using PreProcessing

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(StandardScaler, x)
PreProcessing.StandardScaler{Float64}([-1.875, 3.75, 1.5, -1.875], [5.55512, 5.06828, 6.61438, 5.92532], 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
  0.877569    0.838548   -0.52915   -1.37123
  0.877569    1.23316    -0.831522   0.822741
 -1.2826      0.0493264   1.13389   -0.527398
  0.337526   -0.147979    1.13389   -1.37123
  1.41761    -2.31834     0.377964  -0.358631
 -0.0225018   0.246632   -0.982708   1.49781
 -1.46261    -0.345285    1.13389    0.822741
 -0.742558    0.443937   -1.43626    0.485206

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0
julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(MinMaxScaler, x, range_min=-4, range_max=4)
PreProcessing.MinMaxScaler{Float64,Int64}([-10.0, -8.0, -8.0, -10.0], [6.0, 10.0, 9.0, 7.0], -4, 4, 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
 3.25  3.55556  1.41176   0.0    
 3.25  4.0      0.941176  3.05882
 0.25  2.66667  4.0       1.17647
 2.5   2.44444  4.0       0.0    
 4.0   0.0      2.82353   1.41176
 2.0   2.88889  0.705882  4.0    
 0.0   2.22222  4.0       3.05882
 1.0   3.11111  0.0       2.58824

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1

julia> clf = fit(Binarizer, x)
PreProcessing.Binarizer{Int64}(0, 4, 1)

julia> xnew = transform(clf, x)
8×4 Array{Int64,2}:
 1  1  0  0
 1  1  0  1
 0  1  1  0
 0  1  1  0
 1  0  1  0
 0  1  0  1
 0  1  1  1
 0  1  0  1

julia> x = rand(-10:10, 8,4)
8×4 Array{Int64,2}:
   3   8  -2  -10
   3  10  -4    3
  -9   4   9   -5
   0   3   9  -10
   6  -8   4   -4
  -2   5  -5    7
 -10   2   9    3
  -6   6  -8    1
julia> clf = fit(MaxAbsScaler, x)
MaxAbsScaler transformer with 4 features

julia> xnew = transform(clf, x)
8×4 Array{Float64,2}:
  0.3   0.8  -0.222222  -1.0
  0.3   1.0  -0.444444   0.3
 -0.9   0.4   1.0       -0.5
  0.0   0.3   1.0       -1.0
  0.6  -0.8   0.444444  -0.4
 -0.2   0.5  -0.555556   0.7
 -1.0   0.2   1.0        0.3
 -0.6   0.6  -0.888889   0.1

julia> inverse_transform(clf, xnew)
8×4 Array{Float64,2}:
   3.0   8.0  -2.0  -10.0
   3.0  10.0  -4.0    3.0
  -9.0   4.0   9.0   -5.0
   0.0   3.0   9.0  -10.0
   6.0  -8.0   4.0   -4.0
  -2.0   5.0  -5.0    7.0
 -10.0   2.0   9.0    3.0
  -6.0   6.0  -8.0    1.0


PreProcessing for ML modeled after sklearn.preprocessing






No releases published


No packages published
