-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Script] Images folder convert script to data_info.json #57
[Script] Images folder convert script to data_info.json #57
Conversation
976eec2
to
2256a99
Compare
Pretty good and useful scripts. Thx a lot. Let's add a how-to-use in the Readme file? @frutiemax92 |
2256a99
to
d0888b1
Compare
d0888b1
to
b6dcc1a
Compare
fb9474b
to
0a27daa
Compare
The people that are most likely to train Pixart-Sigma tend to have SDXL structured (image + .txt caption) training data. Such a script should be officially included and documented. Else maybe the functionality needed to be able to use SDXL structured training data could be in train.py? But I think the empty sharegptv4 values it generates are currently triggering an assertion error. |
610e4af
to
dff44c4
Compare
really nice work. Thank you so much for your PR.🥰 @frutiemax92 |
2. update README.md
This script transforms a folder with images and captions to the correct folder structure with the data_info.json file. It also copies the image files to an indexed file name with the same extension as the original in the InternImgs folder. It also supports recursivity i.e. you can put multiple dataset folders in the root folder.
There is also an optional argument --caption_extension which is by default .txt but the user can change it if he wishes.
I thought this would be a useful script as I am more used to the other folder structure.