Store metadata of project training #329

juhoinkinen · 2019-09-18T12:29:04Z

It could be valuable to store some info about training of a project, which could also be shown by the CLI commands and REST API /projects/{project_id} method.

Currently show-project outputs the following:

Project ID:        tfidf-fi
Project Name:      TF-IDF Finnish
Language:          fi
Vocabulary:        yso
Vocab language:    fi
Access:            public
Trained:           True
Modification time: 2023-04-21 10:33:16

Maybe some of the following data could be added:

Also some more details on the training data (what?) and something that now goes to the debug log (what?).

These data for could be stored e.g. in a metadata file data/projects/the_project/annif_metadata along with the model file(s).

Edit: converted bullet list to checkbox list.

The text was updated successfully, but these errors were encountered:

osma · 2019-09-30T12:21:36Z

These are all good ideas. Also it could be useful to have a "failed" status for projects/backends where the initialization fails for some reason - but that means making the initialization more careful.

I think it makes sense to start small and implement status features one (or two) at a time in separate PRs. For example trained vs. not-trained would be a good start.

Marking the issue as Long term because there are so many ideas here, but that doesn't mean we could implement some of them very soon.

osma · 2020-05-12T11:05:24Z

Tagging with 0.48 to indicate that some work in this direction (but not everything) should be done in that release

psmukhopadhyay · 2024-10-28T13:46:13Z

Does this element "number of examples in the training data" mean the number of records with which the model is trained? It will be very helpful if available in the show-project output. Another thought - the 'eval' command may store major metrics like F1@5, NDCG etc in the annif_metadata file every time it runs, otherwise (not run once,) F1@5 or NDCG may display 'NA'.

juhoinkinen · 2024-10-28T16:12:32Z

Does this element "number of examples in the training data" mean the number of records with which the model is trained? It will be very helpful if available in the show-project output.

Yes, I actually just modified that item in the list to "number of documents used to train the project" (for clarity) and added "name(s) of the file(s) via which the documents was/were given when training the project" (I have needed that information often; however, the full path to the file(s) should not be stored for security reasons), and reordered the list slightly to.

juhoinkinen added the enhancement label Sep 18, 2019

osma added this to the Long term milestone Sep 30, 2019

osma mentioned this issue Dec 18, 2019

Fasttext with Gunicorn problem #328

Closed

osma modified the milestones: Long term, 0.48 May 12, 2020

juhoinkinen mentioned this issue May 18, 2020

Show project train state and modification time #415

Merged

osma modified the milestones: 0.48, Long term Jan 8, 2021

juhoinkinen mentioned this issue Apr 21, 2023

Improve outputs of project inspection CLI commands #694

Merged

juhoinkinen mentioned this issue Jun 17, 2024

Automatically add metadata to Hugging Face Hub repos when uploading projects #793

Merged

juhoinkinen changed the title ~~Store data about training a project to show by list-projects/show-project command~~ Store metadata of project training Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store metadata of project training #329

Store metadata of project training #329

juhoinkinen commented Sep 18, 2019 •

edited

Loading

osma commented Sep 30, 2019

osma commented May 12, 2020

psmukhopadhyay commented Oct 28, 2024

juhoinkinen commented Oct 28, 2024

Store metadata of project training #329

Store metadata of project training #329

Comments

juhoinkinen commented Sep 18, 2019 • edited Loading

osma commented Sep 30, 2019

osma commented May 12, 2020

psmukhopadhyay commented Oct 28, 2024

juhoinkinen commented Oct 28, 2024

juhoinkinen commented Sep 18, 2019 •

edited

Loading