A text format where each line represents a labeled feature vector using the following format:
label index1:value1 index2:value1 index3:value3 ...
- label
- type: Int
- when the input is training data, it is the label of the data point; for binary classification, a label should be {0, 1}; for multiclass classification, labels should be class indices starting from zero: {0, 1, ..., n}
- when the input is test data, it is the index of the data point
- index:value
- represents the feature's index-value pair, where index type is Int and value type is Double
- feature index starts from 1, similar to libsvm' style
# libsvm example
1 1:0.5 3:3.1 7:1.0
0 2:0.1 3:2.3 5:2.0
1 4:0.2 7:1.1 9:0.0
....
A text format where each line represents a labeled feature vector, separated by
(space) using the following format:
label index1 index2 index3
- label
- type: Int
- when the input is training data, it is the label of the data point; for binary classification, a label should be {0, 1}; for multiclass classification, labels should be class indices starting from zero: {0, 1, ..., n}
- when the input is test data, it is the index of the data point
- index
- type: Int/Long
- feature index, starting from 0
- represents the indices of features that are 1 (unrepresented features are 0)
# dummy type example
0 3 7 999 666
1 0 2 88 77
...
Note: if the row splitor is not
(space), you can specify a splitor (say ",") by using the following option:
ml.data.splitor=,
For multi-class problem, such as softmax regression, angel require the labels start from 0, while for binary classification problom, the labels are required +1 and -1. As a result, a transform may be reqired.
- ml.data.label.trans.class: specify a transfrom class, by default "NoTrans", other options such as "ZeroOneTrans"(0-1), "PosNegTrans"(+1, -1), "AddOneTrans", "SubOneTrans" are also available.
- ml.data.label.trans.threshold: for "ZeroOneTrans" and "PosNegTrans" a threshold is require to trans labels, if the label is large than the threshold, yeild 1. the default threshold is 0