The verified 48 model cards are available inside the verified_model_cards directory. Inside the directory, there are 48 directories, each named after the model card (replaced the '' from model id with '@'). Each directory contains 11 files. The files are:
1_raw.md: The raw readme file after download.2_original_model_card.md: The readme file after processing (removing YAML metadata at the beginning, which is not visible from the ui).3_model_generated_info_list.md: The checklist of information generated byGPT 4o minifrom the original model card.4_corrected_info_list.md: The checklist of information after manual correction.5_reorganized_model_card.md: The reorganized model card usingGemini 2 Flash Thinkingafter reorganization.6_manually_removed_extra_information.md: The reorganized model card after manually removing extra information from it.7_manually_removed_misinterpretation.md: The reorganized model card after manually removing misinterpretations from it.8_manually_added_missing_information.md: The reorganized model card after manually adding missing information to it.9_gemini_jury_result.json: The content misplacement verification results of the reorganized model card byGemini 2.5 Pro.10_o4_mini_jury_result.json: The content misplacement verification results of the reorganized model card byO4-mini.11_deepseek_r1_jury_result.json: The content misplacement verification results of the reorganized model card byDeepSeek R1.
Install the required packages
pip install -r requirements.txtWorks with Python 3.11; backward compatibility has not been tested.
- Run
data_collector/repo_lister.pyto list all the models available in Hugging Face. A file namedall_models.csvwith the model list will be created inside thedatadirectory.
python data_collector/repo_lister.py- Run
data_collector/repo_selector.pyto order the models and select top 1000 models. The list of the top models will be saved indata/top_1000_models.csv.
python data_collector/repo_selector.py- Run
data_collector/repo_readme_collector.pyto download readme files of the selected top 1000 models. The readme files will be saved inside thedata/readmesdirectory. Each raw readme files will be saved insidedata/readmes/rawdirectory. The further processed readme files will be saved insidedata/readmes/processeddirectory.
python data_collector/repo_readme_collector.py- Run
data_collector/readme_selector.pyto process and select automated quality model cards. The list will be saved indata/top_one_model_per_organization.csv.
python data_collector/readme_selector.py- Manually verify models listed in
data/top_one_model_per_organization.csvand list the unwanted models indata/excluding_repos.csv. If you don't have any unwanted models, just leave it empty with amodel_idas header of the file. Now, Rundata_collector/exclude_unwamted_repos.pyto get the final selected list of quality model cards saved indata/selected_repos.csv.
python data_collector/exclude_unwanted_repos.pyRun model_card_reorganizer/gemini_reorganizer.py to reorganize the selected model cards. The reorganized model cards will be saved inside data/readmes/reorganized directory. Insert your API key into the GEMINI_API_KEY placeholder in the util/constants.py file to enable Gemini model access.
python model_card_reorganizer/gemini_reorganizer.pyThe reorganization instruction and template structure with section description is available in model_card_reorganizer/gemini_prompt_template.md and model_card_reorganizer/model_card_template_with_description.md respectively.
Run model_card_info_lister/gpt_4o_mini_lister.py to create checklists of information from the original model cards. The checklists will be saved in data/readmes/info_list directory.
python model_card_info_lister/gpt_4o_mini_lister.pyThe instruction for the checklist creation is available in model_card_info_lister/system_instruction.md.
Run relevance_verifier/gemini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/gemini_relevance_verifier.pyRun relevance_verifier/o_mini_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/o_mini_relevance_verifier.pyRun relevance_verifier/deepseek_relevance_verifier.py to verify the sections of all the reorganized model cards. The verification results will be saved in ``.
python relevance_verifier/deepseek_relevance_verifier.py