Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically add metadata to Hugging Face Hub repos when uploading projects #793

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jun 17, 2024

With this PR, when running annif upload:

  • if README.md (Model Card) does not exist in the destination repository, then README.md is created with default contents, project list (containing projects that have configuration (*.cfg) files in the repo) and some metadata of the uploaded projects,
  • if README.md exists, its project list and metadata are updated as necessary.

Closes #790.

The metadata includes these:

language:
- <language-code tags automatically obtained from the uploaded projects>
tags:
- annif   # custom tag
pipeline_tag: text-classification  # HFH tag

The text content is like this (from here https://huggingface.co/juhoinkinen/Annif-models-upload-testing):

Annif-models-upload-testing

Usage

Use the annif download command to download selected projects with Annif; for example, to download all projects in this repository run

annif download "*" juhoinkinen/Annif-models-upload-testing

Projects

Project ID          Project Name            Vocabulary ID   Language   
--------------------------------------------------------------------
dummy-en            Dummy English           dummy           en         
dummy-fi            Dummy Finnish           dummy           fi         
dummy-sv            Dummy Swedish           dummy           sv

This text should retain on updates of project list.

@juhoinkinen juhoinkinen added this to the 1.2 milestone Jun 17, 2024
@juhoinkinen
Copy link
Member Author

About @osma's suggestions in #790 (comment):

For example it could include the Annif version used for training, the backend, vocabulary name and size, possibly some of the hyperparameters / configuration settings as well.

  • Annif version:
    • The Annif version used for training is not stored anywhere at the moment; the version performing the upload is not necessarily the same. This kind of metadata should be first stored somewhere, for which there is the issue Store metadata of project training #329
  • Backend, vocabulary name and other project configuration:

Copy link

sonarcloud bot commented Jun 18, 2024

Quality Gate Passed Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

codecov bot commented Jun 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.65%. Comparing base (3b5f7a1) to head (e4febab).
Report is 51 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
+ Coverage   99.64%   99.65%   +0.01%     
==========================================
  Files          91       93       +2     
  Lines        6817     7058     +241     
==========================================
+ Hits         6793     7034     +241     
  Misses         24       24              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juhoinkinen
Copy link
Member Author

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

CodiumAI-Agent commented Jun 18, 2024

PR Reviewer Guide 🔍

(Review updated until commit 845f53d)

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Key issues to review

Error Handling
The function upsert_modelcard lacks error handling for potential failures during the push_to_hub operation. Consider adding try-except blocks to handle exceptions that might arise during the push operation, ensuring that the function can gracefully handle errors and provide meaningful feedback to the user.

Configuration Error Handling
The error handling in _read_config might not provide clear feedback to the user since it directly raises ConfigurationException with err.message, which might not be defined. It's recommended to ensure that the exception message is informative and user-friendly.

tests/test_hfh_util.py Outdated Show resolved Hide resolved
@juhoinkinen juhoinkinen marked this pull request as ready for review June 18, 2024 10:31
@juhoinkinen
Copy link
Member Author

Possible Bug:
Ensure that the upsert_modelcard function handles cases where project language data might be missing or malformed. > The current implementation assumes that proj.vocab_lang is always available and valid.

Good point by the AI, but I think the project language is always set if this point is reached...?

@juhoinkinen juhoinkinen requested a review from osma June 18, 2024 10:38
@CodiumAI-Agent
Copy link

Persistent review updated to latest commit 845f53d

@juhoinkinen
Copy link
Member Author

I added an automatically updating Projects section to the modelcard, like this: https://huggingface.co/juhoinkinen/Annif-models-upload-testing#projects

annif/config.py Fixed Show fixed Hide fixed
annif/config.py Fixed Show fixed Hide fixed
annif/config.py Fixed Show fixed Hide fixed
annif/config.py Fixed Show fixed Hide fixed
Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything to improve; I think we need to find out through actual use whether there is something that could be fixed or made better.

I did find two typos though :)

annif/cli.py Outdated Show resolved Hide resolved
annif/cli.py Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Sep 27, 2024

@juhoinkinen juhoinkinen merged commit 24485af into main Sep 27, 2024
16 of 17 checks passed
@juhoinkinen juhoinkinen deleted the issue790-automatically-add-metadata-to-hugging-face-hub-repos-when-uploading-projects branch September 27, 2024 06:57
juhoinkinen added a commit that referenced this pull request Sep 30, 2024
PR #793 updated docstring of download command, when docstring of upload should have been updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically add metadata to Hugging Face Hub repos when uploading projects
3 participants