Official implementation of RveRNet
The structure of the proposed RveRNet
We used the robust SAM foundation model to segment the ROI of input images. Then, we processed the images to produce complementary cut-out pairs that were used as inputs for both the ROI and extra-ROI modules. The ROI and extra-ROI modules can have different architectures that encode different inductive biases.
We trained and evaluated our proposed RveRNet on the preprocessed FoodSeg103 dataset. To quantify the advantage of our proposed model’s architectures’ unique inductive biases, we avoided selecting a dataset that was too large for fine-tuning. Preprocessing the selected dataset created complementary cut-out images that masked the ROIs and were input into the extra-ROI module while the ROI images were input into the ROI module.
In addition, to determine the degree to which the extra-ROI module in RveRNet enhanced the classification performance of ambiguous foods, we added images of ketchup and chili paste either photographed or collected from the internet, to the dataset. There were 69 ketchup and 72 chili paste train images and 38 ketchup and 34 chili paste test images. Thus, 18,320 train images and 7,769 test images across 105 categories were used in this study.
Unless otherwise specified, the train image dimensions were
Here is the structure of FoodSeg103 dataset folders:
FoodSeg103/
|--Images/
| |--ann_dir/
| |__img_dir/
|
|--ImageSets/
| |--test.txt
| |__train.txt
|
|--category_id.txt
|--Readme.txt
|--test_recipe1m_id.txt
|__train_test_recipe1m_id.txt
Your data directory for RveRNet should be like this after preprocessing:
dataset_root/
|
|--train/
| |--roi/
| | |--category1/
| | | |--image_name1.jpg
| | | |--image_name2.jpg
| | | |--image_name3.jpg
| | | |__...
| | |
| | |--category2/
| | |--category3/
| | |__...
| |
| |__extra-roi/
| |--category1/
| | |--image_name1.jpg
| | |--image_name2.jpg
| | |--image_name3.jpg
| | |__...
| |
| |--category2/
| |--category3/
| |__...
|
|
|__test/
|--roi/
| |--category1/
| |--category2/
| |--category3/
| |__...
|
|__extra-roi/
|--category1/
|--category2/
|--category3/
|__...
For an off-the-shelf model training, run :
python3 train_Off-the-shelf.py --config=./Off_the_shelfs/train_cfgs/train_config.yaml
For a RveRNet training, run :
python3 train_RveRNet.py --config=./RveRNets/train_cfgs/train_config_FoodSeg103.yaml
For inference of RveRNet, run:
python3 inference.py --config=./RveRNets/test_cfgs/test_config.yaml
In the configuration YAML file, you can choose the model for inference using ckpt_path
. If you want batch inference for your test dataset, set batch_inference
to True
.
If you use this code for a paper please cite:
@misc{jin2024knowledgedistillationeffectivelyattain,
title={Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear},
author={Seonwhee Jin},
year={2024},
eprint={2407.08257},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.08257},
}