Skip to content

update #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 97 commits into from
Feb 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
ad3ac49
Implement GaussianNB (#27)
morenol Nov 19, 2020
9db9939
Add serde to CategoricalNB (#30)
morenol Nov 19, 2020
583284e
feat: adds LASSO
Nov 25, 2020
f9056f7
lasso: minor change in unit test
Nov 25, 2020
89a5136
Change implementation of to_row_vector for nalgebra (#34)
morenol Nov 25, 2020
67e5829
simplifies generic matrix.ab implementation
Nov 25, 2020
c172c40
Merge pull request #35 from smartcorelib/lasso
VolodymyrOrlov Dec 3, 2020
4720a3a
MultinomialNB (#32)
morenol Dec 3, 2020
f0b348d
feat: BernoulliNB (#31)
morenol Dec 5, 2020
2650416
Add benches for GNB (#33)
morenol Dec 5, 2020
53351b2
fix needless-range and clippy::ptr_arg warnings. (#36)
morenol Dec 11, 2020
78673b5
feat: adds elastic net
Dec 12, 2020
a27c29b
Merge branch 'development' into elasticnet
Dec 12, 2020
cceb2f0
feat: lasso documentation
Dec 13, 2020
74a7c45
feat: adds SVD
Dec 14, 2020
d39b04e
fix: fmt
Dec 14, 2020
505f495
fix: Update ndarray version
morenol Dec 16, 2020
413f1a0
Merge pull request #39 from morenol/lmm/update_ndarray
morenol Dec 16, 2020
1ce18b5
Merge pull request #37 from smartcorelib/elasticnet
VolodymyrOrlov Dec 17, 2020
2c892aa
Merge pull request #38 from smartcorelib/svd
VolodymyrOrlov Dec 17, 2020
f76a1d1
feat: makes smartcore::error:FailedError non-exhaustive
Dec 17, 2020
5a18547
feat: NB documentation
Dec 18, 2020
8ca13a7
fix: criterion
Dec 18, 2020
97dece9
Merge pull request #41 from smartcorelib/nb_documentation
VolodymyrOrlov Dec 18, 2020
c9eb94b
Derive clone for NB Parameters
morenol Dec 17, 2020
d8d7519
Merge pull request #42 from morenol/python-development
morenol Dec 18, 2020
40dfca7
Merge pull request #40 from smartcorelib/non_exhaustive_failure
VolodymyrOrlov Dec 18, 2020
a2be9e1
feat: + cross_validate, trait Predictor, refactoring
Dec 22, 2020
9b22197
fix: clippy, documentation and formatting
Dec 23, 2020
f685f57
feat: + cross_val_predict
Dec 23, 2020
74f0d9e
fix: formatting
Dec 23, 2020
dd341f4
feat: + builders for algorithm parameters
Dec 23, 2020
32ae63a
feat: documentation adjusted to new builder
Dec 23, 2020
d22be7d
fix: post-review changes
Dec 24, 2020
a69fb3a
Merge pull request #43 from smartcorelib/kfold
VolodymyrOrlov Dec 24, 2020
810a5c4
feat: consolidates API
Dec 25, 2020
ba16c25
Merge pull request #44 from smartcorelib/api
VolodymyrOrlov Dec 27, 2020
9475d50
feat: version change + api documentation updated
Dec 28, 2020
c5a7bea
Merge pull request #45 from smartcorelib/api_doc
VolodymyrOrlov Dec 28, 2020
bb9a05b
fix: fixes a bug in DBSCAN, removes println's
Jan 3, 2021
051023e
Merge pull request #47 from smartcorelib/development
VolodymyrOrlov Jan 3, 2021
d91999b
Merge pull request #48 from smartcorelib/main
VolodymyrOrlov Jan 3, 2021
0e81663
Fix Matrix typo in documentation
atcol Jan 5, 2021
4a941d1
Merge pull request #56 from atcol/patch-1
VolodymyrOrlov Jan 5, 2021
eb76949
Add coverage check (#57)
morenol Jan 5, 2021
e0d46f4
feat: Make SerDe optional
ssorc3 Jan 17, 2021
762986b
Cargo format
ssorc3 Jan 17, 2021
f1cf8a6
Added serde feature flags to tests
ssorc3 Jan 18, 2021
fd00bc3
Run the pipeline with --all-features enabled
ssorc3 Jan 18, 2021
272aabc
Merge pull request #67 from ssorc3/development
VolodymyrOrlov Jan 18, 2021
bd5fbb6
feat: adds a new parameter to the logistic regression: solver
Jan 21, 2021
87d4e9a
Merge pull request #71 from smartcorelib/log_regression_solvers
VolodymyrOrlov Jan 21, 2021
40a92ee
feat: adds l2 regularization penalty to the Logistic Regression
Jan 21, 2021
9916318
build one-hot encoder
gaxler Jan 26, 2021
dbca6d4
fmt fix
gaxler Jan 26, 2021
139bbae
cliipy fixes
gaxler Jan 26, 2021
0df797c
fmt fix
gaxler Jan 26, 2021
7daf536
fixed docs
gaxler Jan 26, 2021
68e7162
Merge pull request #72 from smartcorelib/lr_reg
VolodymyrOrlov Jan 26, 2021
9833a2f
codecov-fix
gaxler Jan 26, 2021
244a724
Genertic make_one_hot. Current implementation returns BaseVector of R…
gaxler Jan 27, 2021
19088b6
remoe LabelDefinition, looks like unnecesery abstraction for now
gaxler Jan 27, 2021
6109fc5
Renaming fit/transform for API compatibility. Also rename label to ca…
gaxler Jan 27, 2021
408b97d
Rename series encoder and move to separate module file
gaxler Jan 28, 2021
5c400f4
Scaffold for turniing floats to hashable and fittinng to columns
gaxler Jan 28, 2021
f91b1f9
fit SeriesOneHotEncoders to predefined columns
gaxler Jan 28, 2021
3480e72
Documentation updates
gaxler Jan 31, 2021
3dc8a42
Adapt column numbers to the new columns introduced by categorical var…
gaxler Jan 31, 2021
dd39433
Categorizable trait defines logic of turning floats into hashable cat…
gaxler Jan 31, 2021
cd56110
Fit OneHotEncoder
gaxler Jan 31, 2021
fd6b2e8
Transform matrix
gaxler Jan 31, 2021
c987d39
tests + force Categorizable be RealNumber
gaxler Jan 31, 2021
2f03c1d
module name change
gaxler Jan 31, 2021
ca0816d
Clippy fixes
gaxler Jan 31, 2021
863be5e
style fixes
gaxler Jan 31, 2021
f4b5936
fmt
gaxler Jan 31, 2021
a882741
If transform fails - fail before copying the whole matrix
gaxler Feb 1, 2021
03b9f76
Doc+Naming Improvement
gaxler Feb 1, 2021
228b54b
fmt
gaxler Feb 1, 2021
19ff6df
Separate mapper object
gaxler Feb 3, 2021
d31145b
Define common series encoder behavior
gaxler Feb 3, 2021
237b116
doc update
gaxler Feb 3, 2021
ef06f45
Switch to use SeriesEncoder trait
gaxler Feb 3, 2021
700d320
simplify SeriesEncoder trait
gaxler Feb 3, 2021
3cc20fd
Move all functionality to CategoryMapper (one-hot and ordinal).
gaxler Feb 3, 2021
374dfec
No more SeriesEncoders.
gaxler Feb 3, 2021
828df4e
Use CategoryMapper to transform an iterator. No more passing iterator…
gaxler Feb 3, 2021
af6ec2d
rename categorical
gaxler Feb 10, 2021
6b5bed6
remove old
gaxler Feb 10, 2021
745d0b5
Merge pull request #76 from gaxler/OneHotEncoder
VolodymyrOrlov Feb 12, 2021
4af6987
fix: Fix new clippy warnings (#79)
morenol Feb 16, 2021
a30802e
fix: Change to compile for wasm32-unknown-unknown target (#80)
morenol Feb 17, 2021
4fb2625
Implemented make_moons generator per https://github.com/scikit-learn/…
cmccomb Feb 18, 2021
483a21b
Oops, test was failing due to typo. Fixed now.
cmccomb Feb 18, 2021
fed11f0
Fixed formatting to pass cargo format check.
cmccomb Feb 18, 2021
c0be45b
Merge pull request #82 from cmccomb/development
VolodymyrOrlov Feb 25, 2021
1b42f8a
feat: Add getters for naive bayes structs (#74)
morenol Feb 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ workflows:
jobs:
- build
- clippy
- coverage

jobs:
build:
docker:
Expand All @@ -21,10 +23,10 @@ jobs:
command: cargo fmt -- --check
- run:
name: Stable Build
command: cargo build --features "nalgebra-bindings ndarray-bindings"
command: cargo build --all-features
- run:
name: Test
command: cargo test --features "nalgebra-bindings ndarray-bindings"
command: cargo test --all-features
- save_cache:
key: project-cache
paths:
Expand All @@ -41,3 +43,17 @@ jobs:
- run:
name: Run cargo clippy
command: cargo clippy --all-features -- -Drust-2018-idioms -Dwarnings

coverage:
machine: true
steps:
- checkout
- run:
name: Generate report
command: >
docker run --security-opt seccomp=unconfined -v $PWD:/volume
xd009642/tarpaulin:latest-nightly cargo tarpaulin -v --ciserver circle-ci
--out Lcov --all-features -- --test-threads 1
- run:
name: Upload
command: bash <(curl -s https://codecov.io/bash) -Z -f
16 changes: 10 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "smartcore"
description = "The most advanced machine learning library in rust."
homepage = "https://smartcorelib.org"
version = "0.1.0"
version = "0.2.0"
authors = ["SmartCore Developers"]
edition = "2018"
license = "Apache-2.0"
Expand All @@ -19,14 +19,13 @@ nalgebra-bindings = ["nalgebra"]
datasets = []

[dependencies]
ndarray = { version = "0.13", optional = true }
nalgebra = { version = "0.22.0", optional = true }
ndarray = { version = "0.14", optional = true }
nalgebra = { version = "0.23.0", optional = true }
num-traits = "0.2.12"
num = "0.3.0"
rand = "0.7.3"
rand_distr = "0.3.0"
serde = { version = "1.0.115", features = ["derive"] }
serde_derive = "1.0.115"
serde = { version = "1.0.115", features = ["derive"], optional = true }

[dev-dependencies]
criterion = "0.3"
Expand All @@ -35,4 +34,9 @@ bincode = "1.3.1"

[[bench]]
name = "distance"
harness = false
harness = false

[[bench]]
name = "naive_bayes"
harness = false
required-features = ["ndarray-bindings", "nalgebra-bindings"]
73 changes: 73 additions & 0 deletions benches/naive_bayes.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
use criterion::BenchmarkId;
use criterion::{black_box, criterion_group, criterion_main, Criterion};

use nalgebra::DMatrix;
use ndarray::Array2;
use smartcore::linalg::naive::dense_matrix::DenseMatrix;
use smartcore::linalg::BaseMatrix;
use smartcore::linalg::BaseVector;
use smartcore::naive_bayes::gaussian::GaussianNB;

pub fn gaussian_naive_bayes_fit_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("GaussianNB::fit");

for n_samples in [100_usize, 1000_usize, 10000_usize].iter() {
for n_features in [10_usize, 100_usize, 1000_usize].iter() {
let x = DenseMatrix::<f64>::rand(*n_samples, *n_features);
let y: Vec<f64> = (0..*n_samples)
.map(|i| (i % *n_samples / 5_usize) as f64)
.collect::<Vec<f64>>();
group.bench_with_input(
BenchmarkId::from_parameter(format!(
"n_samples: {}, n_features: {}",
n_samples, n_features
)),
n_samples,
|b, _| {
b.iter(|| {
GaussianNB::fit(black_box(&x), black_box(&y), Default::default()).unwrap();
})
},
);
}
}
group.finish();
}

pub fn gaussian_naive_matrix_datastructure(c: &mut Criterion) {
let mut group = c.benchmark_group("GaussianNB");
let classes = (0..10000).map(|i| (i % 25) as f64).collect::<Vec<f64>>();

group.bench_function("DenseMatrix", |b| {
let x = DenseMatrix::<f64>::rand(10000, 500);
let y = <DenseMatrix<f64> as BaseMatrix<f64>>::RowVector::from_array(&classes);

b.iter(|| {
GaussianNB::fit(black_box(&x), black_box(&y), Default::default()).unwrap();
})
});

group.bench_function("ndarray", |b| {
let x = Array2::<f64>::rand(10000, 500);
let y = <Array2<f64> as BaseMatrix<f64>>::RowVector::from_array(&classes);

b.iter(|| {
GaussianNB::fit(black_box(&x), black_box(&y), Default::default()).unwrap();
})
});

group.bench_function("ndalgebra", |b| {
let x = DMatrix::<f64>::rand(10000, 500);
let y = <DMatrix<f64> as BaseMatrix<f64>>::RowVector::from_array(&classes);

b.iter(|| {
GaussianNB::fit(black_box(&x), black_box(&y), Default::default()).unwrap();
})
});
}
criterion_group!(
benches,
gaussian_naive_bayes_fit_benchmark,
gaussian_naive_matrix_datastructure
);
criterion_main!(benches);
33 changes: 15 additions & 18 deletions src/algorithm/neighbour/bbd_tree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,7 @@ impl<T: RealNumber> BBDTree<T> {

let (n, _) = data.shape();

let mut index = vec![0; n];
for i in 0..n {
index[i] = i;
}
let index = (0..n).collect::<Vec<_>>();

let mut tree = BBDTree {
nodes,
Expand All @@ -64,7 +61,7 @@ impl<T: RealNumber> BBDTree<T> {

pub(in crate) fn clustering(
&self,
centroids: &Vec<Vec<T>>,
centroids: &[Vec<T>],
sums: &mut Vec<Vec<T>>,
counts: &mut Vec<usize>,
membership: &mut Vec<usize>,
Expand Down Expand Up @@ -92,8 +89,8 @@ impl<T: RealNumber> BBDTree<T> {
fn filter(
&self,
node: usize,
centroids: &Vec<Vec<T>>,
candidates: &Vec<usize>,
centroids: &[Vec<T>],
candidates: &[usize],
k: usize,
sums: &mut Vec<Vec<T>>,
counts: &mut Vec<usize>,
Expand All @@ -117,15 +114,15 @@ impl<T: RealNumber> BBDTree<T> {
let mut new_candidates = vec![0; k];
let mut newk = 0;

for i in 0..k {
for candidate in candidates.iter().take(k) {
if !BBDTree::prune(
&self.nodes[node].center,
&self.nodes[node].radius,
centroids,
closest,
candidates[i],
*candidate,
) {
new_candidates[newk] = candidates[i];
new_candidates[newk] = *candidate;
newk += 1;
}
}
Expand Down Expand Up @@ -166,9 +163,9 @@ impl<T: RealNumber> BBDTree<T> {
}

fn prune(
center: &Vec<T>,
radius: &Vec<T>,
centroids: &Vec<Vec<T>>,
center: &[T],
radius: &[T],
centroids: &[Vec<T>],
best_index: usize,
test_index: usize,
) -> bool {
Expand Down Expand Up @@ -285,8 +282,8 @@ impl<T: RealNumber> BBDTree<T> {
}

let mut mean = vec![T::zero(); d];
for i in 0..d {
mean[i] = node.sum[i] / T::from(node.count).unwrap();
for (i, mean_i) in mean.iter_mut().enumerate().take(d) {
*mean_i = node.sum[i] / T::from(node.count).unwrap();
}

node.cost = BBDTree::node_cost(&self.nodes[node.lower.unwrap()], &mean)
Expand All @@ -295,11 +292,11 @@ impl<T: RealNumber> BBDTree<T> {
self.add_node(node)
}

fn node_cost(node: &BBDTreeNode<T>, center: &Vec<T>) -> T {
fn node_cost(node: &BBDTreeNode<T>, center: &[T]) -> T {
let d = center.len();
let mut scatter = T::zero();
for i in 0..d {
let x = (node.sum[i] / T::from(node.count).unwrap()) - center[i];
for (i, center_i) in center.iter().enumerate().take(d) {
let x = (node.sum[i] / T::from(node.count).unwrap()) - *center_i;
scatter += x * x;
}
node.cost + T::from(node.count).unwrap() * scatter
Expand Down
16 changes: 11 additions & 5 deletions src/algorithm/neighbour/cover_tree.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
//! use smartcore::algorithm::neighbour::cover_tree::*;
//! use smartcore::math::distance::Distance;
//!
//! #[derive(Clone)]
//! struct SimpleDistance {} // Our distance function
//!
//! impl Distance<i32, f64> for SimpleDistance {
Expand All @@ -23,6 +24,7 @@
//! ```
use std::fmt::Debug;

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

use crate::algorithm::sort::heap_select::HeapSelection;
Expand All @@ -31,7 +33,8 @@ use crate::math::distance::Distance;
use crate::math::num::RealNumber;

/// Implements Cover Tree algorithm
#[derive(Serialize, Deserialize, Debug)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug)]
pub struct CoverTree<T, F: RealNumber, D: Distance<T, F>> {
base: F,
inv_log_base: F,
Expand All @@ -55,7 +58,8 @@ impl<T, F: RealNumber, D: Distance<T, F>> PartialEq for CoverTree<T, F, D> {
}
}

#[derive(Debug, Serialize, Deserialize)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug)]
struct Node<F: RealNumber> {
idx: usize,
max_dist: F,
Expand All @@ -64,7 +68,7 @@ struct Node<F: RealNumber> {
scale: i64,
}

#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug)]
struct DistanceSet<F: RealNumber> {
idx: usize,
dist: Vec<F>,
Expand Down Expand Up @@ -436,7 +440,7 @@ impl<T: Debug + PartialEq, F: RealNumber, D: Distance<T, F>> CoverTree<T, F, D>
}
}

fn max(&self, distance_set: &Vec<DistanceSet<F>>) -> F {
fn max(&self, distance_set: &[DistanceSet<F>]) -> F {
let mut max = F::zero();
for n in distance_set {
if max < n.dist[n.dist.len() - 1] {
Expand All @@ -453,7 +457,8 @@ mod tests {
use super::*;
use crate::math::distance::Distances;

#[derive(Debug, Serialize, Deserialize)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone)]
struct SimpleDistance {}

impl Distance<i32, f64> for SimpleDistance {
Expand Down Expand Up @@ -499,6 +504,7 @@ mod tests {
}

#[test]
#[cfg(feature = "serde")]
fn serde() {
let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];

Expand Down
7 changes: 6 additions & 1 deletion src/algorithm/neighbour/linear_search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
//! use smartcore::algorithm::neighbour::linear_search::*;
//! use smartcore::math::distance::Distance;
//!
//! #[derive(Clone)]
//! struct SimpleDistance {} // Our distance function
//!
//! impl Distance<i32, f64> for SimpleDistance {
Expand All @@ -21,6 +22,7 @@
//!
//! ```

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};
use std::cmp::{Ordering, PartialOrd};
use std::marker::PhantomData;
Expand All @@ -31,7 +33,8 @@ use crate::math::distance::Distance;
use crate::math::num::RealNumber;

/// Implements Linear Search algorithm, see [KNN algorithms](../index.html)
#[derive(Serialize, Deserialize, Debug)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug)]
pub struct LinearKNNSearch<T, F: RealNumber, D: Distance<T, F>> {
distance: D,
data: Vec<T>,
Expand Down Expand Up @@ -137,6 +140,8 @@ mod tests {
use super::*;
use crate::math::distance::Distances;

#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone)]
struct SimpleDistance {}

impl Distance<i32, f64> for SimpleDistance {
Expand Down
8 changes: 6 additions & 2 deletions src/algorithm/neighbour/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#![allow(clippy::ptr_arg)]
//! # Nearest Neighbors Search Algorithms and Data Structures
//!
//! Nearest neighbor search is a basic computational tool that is particularly relevant to machine learning,
Expand Down Expand Up @@ -34,6 +35,7 @@ use crate::algorithm::neighbour::linear_search::LinearKNNSearch;
use crate::error::Failed;
use crate::math::distance::Distance;
use crate::math::num::RealNumber;
#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

pub(crate) mod bbd_tree;
Expand All @@ -44,15 +46,17 @@ pub mod linear_search;

/// Both, KNN classifier and regressor benefits from underlying search algorithms that helps to speed up queries.
/// `KNNAlgorithmName` maintains a list of supported search algorithms, see [KNN algorithms](../algorithm/neighbour/index.html)
#[derive(Serialize, Deserialize, Debug, Clone)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone)]
pub enum KNNAlgorithmName {
/// Heap Search algorithm, see [`LinearSearch`](../algorithm/neighbour/linear_search/index.html)
LinearSearch,
/// Cover Tree Search algorithm, see [`CoverTree`](../algorithm/neighbour/cover_tree/index.html)
CoverTree,
}

#[derive(Serialize, Deserialize, Debug)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug)]
pub(crate) enum KNNAlgorithm<T: RealNumber, D: Distance<Vec<T>, T>> {
LinearSearch(LinearKNNSearch<Vec<T>, T, D>),
CoverTree(CoverTree<Vec<T>, T, D>),
Expand Down
Loading