Release: | |version| |
---|---|
Date: | |today| |
Author: | peter.prettenhofer@gmail.com |
Contents:
.. toctree:: :maxdepth: 1 install.rst overview.rst using-cli.rst using-api.rst whatsnew.rst
Introduction
Bolt features discriminative learning of linear predictors (e.g. SVM or Logistic Regression) using fast online learning algorithms. Bolt is aimed at large-scale, high-dimensional and sparse machine-learning problems. In particular, problems encountered in information retrieval and natural language processing.
Bolt considers linear models (:class:`bolt.model.LinearModel`) for binary classification,
f(\mathbf{x}) = \operatorname{sign}(\mathbf{w}^T \mathbf{x} + b) ,
and generalized linear models (:class:`bolt.model.GeneralizedLinearModel`) for multi-class classification,
f(\mathbf{x}) = \operatorname*{arg\,max}_y \mathbf{w}^T \Phi(\mathbf{x},y) + b_y .
Where \mathbf{w} and b are the model parameters that are learned from training data. In Bolt the model parameters are learned by minimizing the regularized training error given by,
E(\mathbf{w},b) = \sum_{i=1}^n L(y_i,f(\mathbf{x}_i)) + \lambda R(\mathbf{w}) ,
where L is a loss function that measures model fit and R is a regularization term that measures model complexity.
Features
Bolt supports the following trainers for binary classification:
- Stochastic Gradient Descent (:class:`bolt.trainer.sgd.SGD`)
- Supports various loss functions L : Hinge, Modified Huber, Log.
- Supports various regularization terms R : L2, L1, and Elastic Net.
- PEGASOS (:class:`bolt.trainer.sgd.PEGASOS`)
For multi-class classification:
- One-versus-all (:class:`bolt.trainer.OVA`)
- Averaged Perceptron (:class:`bolt.trainer.avgperceptron.AveragedPerceptron`)
- Maximum Entropy (:class:`bolt.trainer.maxent.MaxentSGD`)
- aka Multinomial Logistic Regression
- Trained via SGD.
Benchmark
The following RCV1-CCAT benchmark results show that Bolt is competitive to state-of-the-art linear SVM solvers such as SVMPerf, liblinear, or sgd. The dataset comprises 781.264 training documents, each represented by a 47.152 dimensional feature vector.
Algorithm | Training time | Accuracy |
---|---|---|
SVMlight | >600.00 sec | |
SVMPerf [1] | 11.60 sec | 94.79 |
liblinear [2] | 9.00 sec | 94.77 |
bolt [3] | 2.33 sec | 94.79 |
sgd [4] | 1.09 sec | 94.77 |
[1] | Uses C=1000 |
[2] | Uses SVM (Dual), B=1 |
[3] | Uses E=5, r=0.00001, l=0, b |
[4] | Uses epochs=5, lambda=0.00001 |