-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: metadata dataset degradation and make it work #2186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sd3
Are you sure you want to change the base?
Conversation
… with batch support and output options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes metadata dataset degradation issues and enhances the WD14 tagger functionality to support both JSON and JSONL metadata formats. The changes improve the handling of image metadata, add support for new tag categories, and provide better image preprocessing.
- Fix null comparison issues in metadata handling to prevent dataset degradation
- Add comprehensive JSONL metadata format support alongside existing JSON format
- Extend WD14 tagger with new tag categories (copyright, meta, model, quality) and improved model support
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
library/train_util.py | Fixes null checks in metadata comparison and refactors dataset initialization to better handle image paths and cache files |
finetune/tag_images_by_wd14_tagger.py | Major enhancement adding JSONL support, new tag categories, improved image preprocessing, and better model handling |
docs/wd14_tagger_README-ja.md | Updates Japanese documentation with current dependency versions and help command reference |
docs/wd14_tagger_README-en.md | Updates English documentation with current dependency versions and help command reference |
Comments suppressed due to low confidence (1)
finetune/tag_images_by_wd14_tagger.py:1
- JSOL should be JSONL (JSON Lines format).
import argparse
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…and ensure tags are returned
The documentation also needs to be updated.