-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store metadata of project training #329
Comments
These are all good ideas. Also it could be useful to have a "failed" status for projects/backends where the initialization fails for some reason - but that means making the initialization more careful. I think it makes sense to start small and implement status features one (or two) at a time in separate PRs. For example trained vs. not-trained would be a good start. Marking the issue as Long term because there are so many ideas here, but that doesn't mean we could implement some of them very soon. |
Tagging with 0.48 to indicate that some work in this direction (but not everything) should be done in that release |
Does this element "number of examples in the training data" mean the number of records with which the model is trained? It will be very helpful if available in the show-project output. Another thought - the 'eval' command may store major metrics like F1@5, NDCG etc in the annif_metadata file every time it runs, otherwise (not run once,) F1@5 or NDCG may display 'NA'. |
Yes, I actually just modified that item in the list to "number of documents used to train the project" (for clarity) and added "name(s) of the file(s) via which the documents was/were given when training the project" (I have needed that information often; however, the full path to the file(s) should not be stored for security reasons), and reordered the list slightly to. |
It could be valuable to store some info about training of a project, which could also be shown by the CLI commands and REST API
/projects/{project_id}
method.Currently
show-project
outputs the following:Maybe some of the following data could be added:
size of the model on diskAlso some more details on the training data (what?) and something that now goes to the debug log (what?).
These data for could be stored e.g. in a metadata file
data/projects/the_project/annif_metadata
along with the model file(s).Edit: converted bullet list to checkbox list.
The text was updated successfully, but these errors were encountered: