Skip to content

ChunkerTrain fails with auto selection of training rounds using a separate development set #685

Closed
@qiangning

Description

@qiangning

When discussing pull req 682 with Christos, I tested ChunkerTrain for something else, but I found that it's now failing.
Here's the result I got standing at the commit: 99971bd This is a commit before any of my recent changes to chunker.

Iteration number : 1. F1 score on devset: 0.9995071926407434
Iteration number : 2. F1 score on devset: 0.9995071926407434
Iteration number : 3. F1 score on devset: 0.9995071926407434
Iteration number : 4. F1 score on devset: 0.9995071926407434
Iteration number : 5. F1 score on devset: 0.9995071926407434

(See the F1 on dev is not changing at all, which is abnormal.)

The benchmark test is still good:

With Gold POS
 Label   Precision Recall   F1   LCount PCount
----------------------------------------------
ADJP        76.633 69.635 72.967    438    398
ADVP        81.862 79.215 80.516    866    838
CONJP       45.455 55.556 50.000      9     11
INTJ        50.000 50.000 50.000      2      2
LST          0.000  0.000  0.000      5      1
NP          94.106 93.962 94.034  12422  12403
PP          96.770 97.776 97.270   4811   4861
PRT         72.072 75.472 73.733    106    111
SBAR        88.280 87.290 87.782    535    529
UCP          0.000  0.000  0.000      0      5
VP          93.416 93.517 93.466   4658   4663
----------------------------------------------
O            0.000  0.000  0.000   1244   1274
----------------------------------------------
Overall     93.510 93.393 93.451  23852  23822
Accuracy    88.763   -      -      -     25096

With NO POS
 Label   Precision Recall   F1   LCount PCount
----------------------------------------------
ADJP        78.608 69.635 73.850    438    388
ADVP        80.427 78.291 79.345    866    843
CONJP       45.455 55.556 50.000      9     11
INTJ       100.000 50.000 66.667      2      1
LST          0.000  0.000  0.000      5      0
NP          94.193 94.019 94.106  12422  12399
PP          96.656 97.942 97.295   4811   4875
PRT         60.417 82.075 69.600    106    144
SBAR        86.813 88.598 87.697    535    546
UCP          0.000  0.000  0.000      0      4
VP          94.105 94.246 94.176   4658   4665
----------------------------------------------
O            0.000  0.000  0.000   1231   1207
----------------------------------------------
Overall     93.529 93.623 93.576  23852  23876
Accuracy    89.028   -      -      -     25083

(The numbers are exactly the same as chunker/doc/performance.txt)

I guess maybe the changes in other parts of the repository failed ChunkerTrain since ChunkerTrain itself hasn't been updated at all since 11/18/2016 (see this log).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions