Copyright 2017- Tatsuhiro Aoshima (hiro4bbh@gmail.com).
Package sticker provides a framework for multi-label classification.
sticker is written in golang, so everyone can easily modify and compile it on almost every environments. You can see sticker's document on GoDoc.
First, download golang, and install it. Next, get and install sticker as follows:
go get github.com/hiro4bbh/sticker
go install github.com/hiro4bbh/sticker/sticker-util
Everything has been installed, then you can try sticker's utility command-line tool sticker-util
now!
First of all, you should prepare datasets. sticker assumes the following directory structure for a dataset:
+ dataset-root
|-- train.txt: training dataset
|-- text.txt: test dataset
|-- feature_map.txt: feature map (optional)
|-- label_map.txt: label map (optional)
Training and test datasets should be formatted as ReadTextDataset
can handle (see documentation for data format).
Feature and label maps should enumerate the name of each feature and label in order of identifier, respectively.
You can check the summary of the dataset at localhost:8080/summary
as follows (you can change the port number with option addr
):
sticker-util -verbose -debug <dataset-root> @summarize -table=<table-filename-relative-to-root>
If featureMap
and labelMap
is empty string, then feature and label maps are ignored, respectively.
LabelNearest
is Weighted Sparse Nearest-Neighbor Method (Aoshima+ 2018) which achieved SOTA performances on several XMLC datasets (Bhatia+ 2016).
For example, you can test this method on Amazon-3M dataset (Bhatia+ 2016) as follows:
sticker-util -verbose -debug ./data/Amazon-3M/ @trainNearest @testNearest -S=75 -alpha=2.0 -beta=1
See help of @trainNearest
and @testNearest
for sub-command options.
LabelConst
: Multi-label constant model (see godoc)LabelOne
: One-versus-rest classifier for multi-label ranking (see godoc)
LabelBoost
: Multi-label Boosting model (see godoc)LabelForest
: Variously-modified FastXML model (see godoc)LabelNext
: Your next-generation model (you can add your own train and test commands, see plugin/next/init.go)
L1Logistic_PrimalSGD
: L1-logistic regression with stochastic gradient descent (SGD) solving the primal problem (see godoc)L1SVC_PrimalSGD
: L1-Support Vector Classifier with SGD solving the primal problem (see godoc)
L1SVC_DualCD
: L1-Support Vector Classifier with coordinate descent (CD) solving the dual problem (see godoc)L2SVC_PrimalCD
: L2-Support Vector Classifier with CD solving the primal problem (see godoc)
- (Aoshima+ 2018) T. Aoshima, K. Kobayashi, and M. Minami. "Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification." arXiv:1802.03938, 2018.
- (Bhatia+ 2016) K. Bhatia, H. Jain, Y. Prabhu, and M. Varma. The Extreme Classification Repository. 2016. Retrieved January 4, 2018 from http://manikvarma.org/downloads/XC/XMLRepository.html