Description
System information
ML.Net 1.5.2
.Net Framework 4.7.2
Issue
I have hundreds of projects, and they all have tree data structure like this:
A
AA
AAA
BB
BBB
Or like this:
A
AA1
AAA1
BB2
BBB2
Each project has its own tree structure which is modified from a standard tree structure. What I am trying to do is to map project's tree structure to the standard tree structure, like this:
A <--- A
AA <--- AA1
AAA <--- AAA1
BB <--- BB2
BBB <--- BBB2
Or like this:
(The mapping really depends on the text instead of the node's level. )
Now I'm using multi class classification in ML.Net. First I map the existing projects' tree to the standard tree manually and save the results in the database, like this:
| Label | Level1 | Level2 | Level3 |
| -------- | -------------- | -------------- | -------------- |
| A | A | * | * |
| A-AA | A | AA1 | * |
| A-AA-AAA | A | AA1 | AAA1 |
| A-BB | A | BB2 | * |
| A-BB-BBB | A | BB2 | BBB2 |
| A | A | * | * |
| A-AA-AAA | A | AAA1 | * |
| A-BB | A | BB2 | * |
| A-BB-BBB | A | BB2 | BBB2 |
Because data in the column in ML.Net cannot be a missing value, so I replace them with *. And my tree has 15 levels (feature columns).
The multi class classification algorithm I choose is SdcaMaximumEntropy. Hopefully I can use the prediction to map the tree instead of doing this manually.
I successfully implemented the prediction. However, the prediction result is really poor.
So my question is:
- Is the way I do this right?
- If yes, should I remove the duplicate rows and should I replace the missing value with
*
?
Thanks in advance.