update Danbooru2017 for pre-train models.

Former-commit-id: 94b8d1e Former-commit-id: ce546e2fb7a762c38d724f37b2b315fe25bbb3ca
yu45020 · Aug 19, 2018 · dd0bcef · dd0bcef
1 parent 89ce2d8
commit dd0bcef
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 1 deletion.
diff --git a/Danbooru2017/113k_imgs_512tags_encoded.7z b/Danbooru2017/113k_imgs_512tags_encoded.7z
diff --git a/Danbooru2017/ReadMe.md b/Danbooru2017/ReadMe.md
@@ -1,3 +1,7 @@
 [Danbooru 2017 database](https://www.gwern.net/Danbooru2017)
 
-The txt file contains  113k 512x512 image file names for training a CNN-LSTM classifier. You may download them by `rsync` with `-- files-from`
+The ```[Danbooru2017] training image list_``` file contains  113k 512x512 image file names for training a CNN-LSTM classifier. You may download them by `rsync` with `-- files-from`
+
+The ```113k_imgs_512tags_encoded.7z``` is a json file containing tags for 113k images. The ```sk-LabelEncoder_512tags.pk``` is used for one hot encoding from scikit-learn. 
+
+If you need a complete list of tags, please open an issue. I might be able to provide it. For a reference, if you process the large meta file from Danbooru2017, you may need around 8G to unzip the file and around 40 minutes with 8 cores to run through all json files. Good luck. 
diff --git a/Danbooru2017/[Danbooru2017] training image list_.7z b/Danbooru2017/[Danbooru2017] training image list_.7z
diff --git a/Danbooru2017/sk-LabelEncoder_512tags.pk b/Danbooru2017/sk-LabelEncoder_512tags.pk