Skip to content

Roadmap to v0.5 #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 116 commits into from
Oct 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
2c7fab0
First draft of the new n-dimensional arrays + NB use case
May 31, 2021
09f5448
Fixes formatting, adjusts methods signatures
Jun 1, 2021
691c8a4
Improves default implementation of multiple Array methods
Jun 2, 2021
d805877
Refactors tree methods
Jun 16, 2021
aa86e3e
Fixes benchmarks
Jun 17, 2021
2237746
Adds matrix decomposition routines
Jun 18, 2021
135b5ae
Adds matrix decomposition methods to ndarray and nalgebra bindings
Jun 18, 2021
02da030
Refactoring + linear regression now uses array2
Jun 19, 2021
dedd2f1
Ridge & Linear regression
Jun 19, 2021
4e3cc72
LBFGS optimizer & logistic regression
Jun 21, 2021
fe30594
LBFGS optimizer & logistic regression
Jun 21, 2021
4796098
Changes linear methods, metrics and model selection methods to new n-…
Jul 2, 2021
8f1278d
Switches KNN and clustering algorithms to new n-d array layer
Jul 4, 2021
e5190da
Refactors distance metrics
Jul 7, 2021
4836b6a
Optimizes knn and clustering methods
Jul 7, 2021
7d07c1d
Refactors metrics module
Jul 8, 2021
7f3bea3
Switches decomposition methods to n-dimensional arrays
Jul 13, 2021
06b1a3d
Merge latest development branch
Mec-iS Sep 20, 2022
dbce1ed
Solve conflicts
Mec-iS Sep 20, 2022
4072592
Resolve remaining conflicts
Mec-iS Sep 20, 2022
23f8053
Solve other conflicts
Mec-iS Sep 20, 2022
7dcda3a
Fix first wave of compiler errors
Mec-iS Sep 21, 2022
95c59b8
Fix compiler errrors for src/linear
Mec-iS Sep 21, 2022
60065e6
Fix other compiler errors for src/linear
Mec-iS Sep 21, 2022
782d77c
Fix other calls. Exclude ensemble for now
Mec-iS Sep 22, 2022
7f6eaa1
Remove documentation warnings
Mec-iS Sep 22, 2022
8715934
Revert "Fix other calls. Exclude ensemble for now"
morenol Sep 22, 2022
b85e046
cargo fmt + remove nightly feature
morenol Sep 22, 2022
f48b5df
Fix some compilation errors
morenol Sep 22, 2022
5839bff
fix other compilation errors
morenol Sep 22, 2022
6697d47
Other fixes
morenol Sep 22, 2022
ba9398a
Comment code
morenol Sep 22, 2022
088f7e3
merge development
Sep 22, 2022
91de7f1
Linalg refactoring - cleanup rng merge (#172)
montanalow Sep 23, 2022
132c72d
Fix remaining compiler errors
Mec-iS Sep 23, 2022
885180f
Remove legacy DenseMatrix and BaseMatrix implementation. Port the new…
Mec-iS Oct 10, 2022
644bf2e
Fix imports in src/linear
Mec-iS Oct 10, 2022
68c3e36
Port src/math/distance
Mec-iS Oct 10, 2022
2f36438
Port src/metrics/distance
Mec-iS Oct 10, 2022
00ceb43
Port src/metrics
Mec-iS Oct 10, 2022
45f580b
Port src/algorithm
Mec-iS Oct 10, 2022
e2c6f2c
Port src/dataset src.linear src/optimization
Mec-iS Oct 10, 2022
dd44907
Port src/metrics
Mec-iS Oct 11, 2022
d7e9ff9
Porting tests
Mec-iS Oct 11, 2022
fd9b206
All active tests pass
Mec-iS Oct 11, 2022
9c9c62a
Add src/neighbors
Mec-iS Oct 11, 2022
734956e
Format
Mec-iS Oct 11, 2022
577b4f4
Merge branch 'development' into linalg-refactoring
Mec-iS Oct 11, 2022
78ae0df
Add missing documentation
Mec-iS Oct 11, 2022
8707402
Add Default to distance
Mec-iS Oct 11, 2022
90a3d60
More clippying
Mec-iS Oct 11, 2022
ea64f25
Fix tests. Exclude other for now.
Mec-iS Oct 11, 2022
52345cb
Add src/tree
Mec-iS Oct 11, 2022
eb4a478
More clippying
Mec-iS Oct 11, 2022
e2a3a9c
Implement typed metrics
Mec-iS Oct 12, 2022
fc78e97
Port src/metrics. All tests passing
Mec-iS Oct 13, 2022
c1d55ba
Cleanup
Mec-iS Oct 13, 2022
9a80d21
Port src/decomposition. See TODOs
Mec-iS Oct 13, 2022
6f52078
Exclude AUC metrics. Needs reimplementation
Mec-iS Oct 13, 2022
e99bd70
Improve developers walkthrough
Mec-iS Oct 13, 2022
7723bcd
Major refactoring to version 0.5 (#108)
VolodymyrOrlov Oct 13, 2022
d61cfdc
Merge development
Mec-iS Oct 13, 2022
ea3cfc6
Merge branch 'v0.5-wip' of github.com:smartcorelib/smartcore into v0.…
Mec-iS Oct 13, 2022
d454591
Add src/linear
Mec-iS Oct 17, 2022
9a569d3
Add ensemble
Mec-iS Oct 17, 2022
36ea99e
Exclude src/ensemble for now. Port src/preprocessing bu adding multip…
Mec-iS Oct 17, 2022
b9741fd
Fix problem with F1 score. All currently included tests pass
Mec-iS Oct 17, 2022
ae2eeaf
cargo fmt
Mec-iS Oct 17, 2022
62356c9
Add src/naive_bayes
Mec-iS Oct 17, 2022
81f23e4
Update version
Mec-iS Oct 18, 2022
e20afe4
Update README
Mec-iS Oct 18, 2022
84dfbcf
Add to contributing
Mec-iS Oct 18, 2022
3c30244
Fix Iris dataset returning y as f32 instead of i32
Mec-iS Oct 18, 2022
42d017e
Correct integer to u32 as required to represent classes
Mec-iS Oct 18, 2022
acb7bb3
Change datasets targets labels to integers
Mec-iS Oct 19, 2022
0c856f0
Provide SupervisedEstimator with a constructor to avoid explicit dyna…
Mec-iS Oct 20, 2022
52fd52e
Implement getters to use as_ref() in src/neighbors
Mec-iS Oct 20, 2022
98168c3
Implement getters to use as_ref() in src/naive_bayes
Mec-iS Oct 20, 2022
ca60721
Implement getters to use as_ref() in src/linear
Mec-iS Oct 20, 2022
aeb640b
Add Clone to src/naive_bayes
Mec-iS Oct 20, 2022
aef495e
Change signature for cross_validate and other model_selection functio…
Mec-iS Oct 20, 2022
e9c432a
Add some documentation
Mec-iS Oct 20, 2022
39f6e89
cargo fmt
Mec-iS Oct 21, 2022
bf56768
Implement borrowing for cross_validate
Mec-iS Oct 21, 2022
b670eb6
Add src/cluster
Mec-iS Oct 23, 2022
899e5cc
Start implementing src/svm
Mec-iS Oct 23, 2022
fd8ea91
Add tests
Mec-iS Oct 23, 2022
7a50e2b
Implement ndarray-bindings. Remove FloatNumber from implementations
Mec-iS Oct 25, 2022
471b732
Drop nalgebra-bindings support (as decided in conf-call to go for nda…
Mec-iS Oct 25, 2022
d14d86d
Remove benches. Benches will have their own repo at smartcore-benches
Mec-iS Oct 25, 2022
3631374
Clean up Cargo.toml
Mec-iS Oct 25, 2022
c7b2258
Remove utils
Mec-iS Oct 26, 2022
3f5f58b
Refactor usage of Kernels default() in src/svm/mod.rs
Mec-iS Oct 26, 2022
0821b07
Use vec in svm/mod.rs
Mec-iS Oct 27, 2022
7fd7a7d
WIP
Mec-iS Oct 27, 2022
73e2f23
Implement SVC
Mec-iS Oct 27, 2022
80ba2cd
Implement SVC serialization. Move search parameters in dedicated module
Mec-iS Oct 28, 2022
d355609
Implement SVR. Definitely too slow
Mec-iS Oct 28, 2022
cc6ae8f
Fix doc tests. Move to Rust 2021
Mec-iS Oct 29, 2022
608841c
cargo clippy 100%
Mec-iS Oct 29, 2022
6c6b829
cargo fmt 100%
Mec-iS Oct 29, 2022
e1a4f6c
Cleanup tests
Mec-iS Oct 29, 2022
a1d3cf9
Apply formatting
Mec-iS Oct 29, 2022
4cfa51e
Fix compilation issues for wasm (#202)
morenol Oct 29, 2022
59e1420
Fix tests (#203)
morenol Oct 29, 2022
4273176
update .gitignore
Mec-iS Oct 30, 2022
f3b97b6
Port linalg/traits/stats.rs
Mec-iS Oct 30, 2022
db60616
Merge branch 'v0.5-wip' of github.com:smartcorelib/smartcore into v0.…
Mec-iS Oct 30, 2022
d208d97
Merge branch 'development' of github.com:smartcorelib/smartcore into …
Mec-iS Oct 30, 2022
c702231
Fix tests
Mec-iS Oct 30, 2022
58e2e46
Improve methods naming
Mec-iS Oct 30, 2022
a020209
Improve Display for DenseMatrix
Mec-iS Oct 30, 2022
24be152
Add assertions to tests
Mec-iS Oct 30, 2022
850741f
Fix tests
Mec-iS Oct 30, 2022
5922c8c
fix linter
Mec-iS Oct 31, 2022
4f3632b
Add compiling options to Cargo
Mec-iS Oct 31, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,17 @@ email, or any other method with the owners of this repository before making a ch

Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project.

## Background

We try to follow these principles:
* follow as much as possible the sklearn API to give a frictionless user experience for practitioners already familiar with it
* use only pure-Rust implementations for safety and future-proofing (with some low-level limited exceptions)
* do not use macros in the library code to allow readability and transparent behavior
* priority is not on "big data" dataset, try to be fast for small/average dataset with limited memory footprint.

## Pull Request Process

1. Open a PR following the template.
1. Open a PR following the template (erase the part of the template you don't need).
2. Update the CHANGELOG.md with details of changes to the interface if they are breaking changes, this includes new environment variables, exposed ports useful file locations and container parameters.
3. Pull Request can be merged in once you have the sign-off of one other developer, or if you do not have permission to do that you may request the reviewer to merge it for you.

Expand All @@ -16,6 +24,7 @@ Take a look to the conventions established by existing code:
* Every module should come with some reference to scientific literature that allows relating the code to research. Use the `//!` comments at the top of the module to tell readers about the basics of the procedure you are implementing.
* Every module should provide a Rust doctest, a brief test embedded with the documentation that explains how to use the procedure implemented.
* Every module should provide comprehensive tests at the end, in its `mod tests {}` sub-module. These tests can be flagged or not with configuration flags to allow WebAssembly target.
* Run `cargo doc --no-deps --open` and read the generated documentation in the browser to be sure that your changes reflects in the documentation and new code is documented.

## Issue Report Process

Expand All @@ -29,6 +38,7 @@ Take a look to the conventions established by existing code:
1. After a PR is opened maintainers are notified
2. Probably changes will be required to comply with the workflow, these commands are run automatically and all tests shall pass:
* **Coverage** (optional): `tarpaulin` is used with command `cargo tarpaulin --out Lcov --all-features -- --test-threads 1`
* **Formatting**: run `rustfmt src/*.rs` to apply automatic formatting
* **Linting**: `clippy` is used with command `cargo clippy --all-features -- -Drust-2018-idioms -Dwarnings`
* **Testing**: multiple test pipelines are run for different targets
3. When everything is OK, code is merged.
Expand Down
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,13 @@ smartcore.code-workspace

# OS
.DS_Store


flamegraph.svg
perf.data
perf.data.old
src.dot
out.svg

FlameGraph/
out.stacks
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Added
- Seeds to multiple algorithims that depend on random number generation.
- Added feature `js` to use WASM in browser
- Drop `nalgebra-bindings` feature
- Complete refactoring with *extensive API changes* that includes:
* moving to a new traits system, less structs more traits
* adapting all the modules to the new traits system
* moving towards Rust 2021, in particular the use of `dyn` and `as_ref`
* reorganization of the code base, trying to eliminate duplicates

## BREAKING CHANGE
- Added a new parameter to `train_test_split` to define the seed.
Expand Down
41 changes: 20 additions & 21 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
name = "smartcore"
description = "The most advanced machine learning library in rust."
homepage = "https://smartcorelib.org"
version = "0.2.1"
version = "0.4.0"
authors = ["SmartCore Developers"]
edition = "2018"
edition = "2021"
license = "Apache-2.0"
documentation = "https://docs.rs/smartcore"
repository = "https://github.com/smartcorelib/smartcore"
Expand All @@ -13,48 +13,47 @@ keywords = ["machine-learning", "statistical", "ai", "optimization", "linear-alg
categories = ["science"]

[features]
default = ["datasets"]
default = ["datasets", "serde"]
ndarray-bindings = ["ndarray"]
nalgebra-bindings = ["nalgebra"]
datasets = ["rand_distr", "std"]
fp_bench = ["itertools"]
std = ["rand/std", "rand/std_rng"]
# wasm32 only
js = ["getrandom/js"]

[dependencies]
approx = "0.5.1"
cfg-if = "1.0.0"
ndarray = { version = "0.15", optional = true }
nalgebra = { version = "0.31", optional = true }
num-traits = "0.2"
num-traits = "0.2.12"
num = "0.4"
rand = { version = "0.8", default-features = false, features = ["small_rng"] }
rand_distr = { version = "0.4", optional = true }
serde = { version = "1", features = ["derive"], optional = true }
itertools = { version = "0.10.3", optional = true }
cfg-if = "1.0.0"

[target.'cfg(target_arch = "wasm32")'.dependencies]
getrandom = { version = "0.2", optional = true }

[dev-dependencies]
smartcore = { path = ".", features = ["fp_bench"] }
criterion = { version = "0.4", default-features = false }
serde_json = "1.0"
bincode = "1.3.1"

[target.'cfg(target_arch = "wasm32")'.dev-dependencies]
wasm-bindgen-test = "0.3"

[[bench]]
name = "distance"
harness = false
[profile.bench]
debug = true

resolver = "2"

[[bench]]
name = "naive_bayes"
harness = false
required-features = ["ndarray-bindings", "nalgebra-bindings"]
[profile.test]
debug = 1
opt-level = 3
split-debuginfo = "unpacked"

[[bench]]
name = "fastpair"
harness = false
required-features = ["fp_bench"]
[profile.release]
strip = true
debug = 1
lto = true
codegen-units = 1
overflow-checks = true
45 changes: 41 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,48 @@

-----

## Developers
Contributions welcome, please start from [CONTRIBUTING and other relevant files](.github/CONTRIBUTING.md).

## Current status
* Current working branch is `development` (if you want something that you can test right away).
* Breaking changes are undergoing development at [`v0.5-wip`](https://github.com/smartcorelib/smartcore/tree/v0.5-wip#readme) (if you are a newcomer better to start from [this README](https://github.com/smartcorelib/smartcore/tree/v0.5-wip#readme) as this will be the next major release).

To start getting familiar with the new Smartcore v0.5 API, there is now available a [**Jupyter Notebook environment repository**](https://github.com/smartcorelib/smartcore-jupyter).
To start getting familiar with the new Smartcore v0.5 API, there is now available a [**Jupyter Notebook environment repository**](https://github.com/smartcorelib/smartcore-jupyter). Please see instructions there, your feedback is valuable for the future of the library.

## Developers
Contributions welcome, please start from [CONTRIBUTING and other relevant files](.github/CONTRIBUTING.md).

### Walkthrough: traits system and basic structures

#### numbers
The library is founded on basic traits provided by `num-traits`. Basic traits are in `src/numbers`. These traits are used to define all the procedures in the library to make everything safer and provide constraints to what implementations can handle.

#### linalg
`numbers` are made at use in linear algebra structures in the **`src/linalg/basic`** module. These sub-modules define the traits used all over the code base.

* *arrays*: In particular data structures like `Array`, `Array1` (1-dimensional), `Array2` (matrix, 2-D); plus their "views" traits. Views are used to provide no-footprint access to data, they have composed traits to allow writing (mutable traits: `MutArray`, `ArrayViewMut`, ...).
* *matrix*: This provides the main entrypoint to matrices operations and currently the only structure provided in the shape of `struct DenseMatrix`. A matrix can be instantiated and automatically make available all the traits in "arrays" (sparse matrices implementation will be provided).
* *vector*: Convenience traits are implemented for `std::Vec` to allow extensive reuse.

These are all traits and by definition they do not allow instantiation. For instantiable structures see implementation like `DenseMatrix` with relative constructor.

#### linalg/traits
The traits in `src/linalg/traits` are closely linked to Linear Algebra's theoretical framework. These traits are used to specify characteristics and constraints for types accepted by various algorithms. For example these allow to define if a matrix is `QRDecomposable` and/or `SVDDecomposable`. See docstring for referencese to theoretical framework.

As above these are all traits and by definition they do not allow instantiation. They are mostly used to provide constraints for implementations. For example, the implementation for Linear Regression requires the input data `X` to be in `smartcore`'s trait system `Array2<FloatNumber> + QRDecomposable<TX> + SVDDecomposable<TX>`, a 2-D matrix that is both QR and SVD decomposable; that is what the provided strucure `linalg::arrays::matrix::DenseMatrix` happens to be: `impl<T: FloatNumber> QRDecomposable<T> for DenseMatrix<T> {};impl<T: FloatNumber> SVDDecomposable<T> for DenseMatrix<T> {}`.

#### metrics
Implementations for metrics (classification, regression, cluster, ...) and distance measure (Euclidean, Hamming, Manhattan, ...). For example: `Accuracy`, `F1`, `AUC`, `Precision`, `R2`. As everything else in the code base, these implementations reuse `numbers` and `linalg` traits and structures.

These are collected in structures like `pub struct ClassificationMetrics<T> {}` that implements `metrics::Metrics`, these are groups of functions (classification, regression, cluster, ...) that provide instantiation for the structures. Each of those instantiation can be passed around using the relative function, like `pub fn accuracy<T: Number + RealNumber + FloatNumber, V: ArrayView1<T>>(y_true: &V, y_pred: &V) -> T`. This provides a mechanism for metrics to be passed to higher interfaces like the `cross_validate`:
```rust
let results =
cross_validate(
BiasedEstimator::fit, // custom estimator
&x, &y, // input data
NoParameters {}, // extra parameters
cv, // type of cross validator
&accuracy // **metrics function** <--------
).unwrap();
```


TODO: complete for all modules
18 changes: 0 additions & 18 deletions benches/distance.rs

This file was deleted.

56 changes: 0 additions & 56 deletions benches/fastpair.rs

This file was deleted.

73 changes: 0 additions & 73 deletions benches/naive_bayes.rs

This file was deleted.

Loading