Closed
Description
/test/data folder contains two files related to the iris data set:
- iris.data which source looks to be UC Irvine Machine Learning Repository
- iris.txt that is tab-separated and contains the header with the column titles
Problem: iris.txt does not match iris.data.
Let's forget about the Label column and consider only the feature columns. While those columns in iris.txt are named in the same order as they are in the iris.data, the data values were somehow mixed.
First lines of the iris.data:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
First lines of the iris.txt:
#Label Sepal length Sepal width Petal length Petal width
0 3.5 1.4 0.2 5.1
0 3.0 1.4 0.2 4.9
0 3.2 1.3 0.2 4.7
The last column in the iris.txt must be second shifting other feature columns by one to the right. Petals of length 0.2 cm and width 5.1 cm are not natural :).
//cc @OliaG as the iris data sets in the dotnet/machinelearning-samples look to be produced from the iris.txt file.