Interaction-aware Scene Debiasing Method for Action Recognition
- Clone this repository.
git clone --recursive https://github.com/rendicahya/intercutmix.git
cd intercutmix/- Create virtual environment.
python3 -m venv ~/venv/intercutmix/
source ~/venv/intercutmix/bin/activate
pip install -U pip- Install MMAction2.
pip install openmim
mim install mmengine mmcv
pip install -v -e mmaction2/Link the project's data/ directory with MMAction2's data/ directory.
cd mmaction2/
ln -s ../data/ ./
cd -- Download videos.
cd mmaction2/tools/data/ucf101/
bash download_videos.shThis downloads UCF101.rar and unrars it to data/ucf101/videos.
- Verify the number of videos.
find videos/ -type f | wc -lExpected: 13,320.
- Download annotations.
bash download_annotations.sh
cd -This creates the annotations/ directory and puts files in it.
- Generate splits.
python3 mmaction2/tools/data/build_file_list.py ucf101 data/ucf101/videos/ --format videos --shuffle --seed 0This creates:
ucf101_train_split_{1-3}_videos.txtucf101_val_split_{1-3}_videos.txt
- Verify files and directories.
intercutmix/data/ucf101/
├── annotations/
│ ├── classInd.txt
│ ├── testlist01.txt
│ ├── testlist02.txt
│ ├── testlist03.txt
│ ├── trainlist01.txt
│ ├── trainlist02.txt
│ └── trainlist03.txt
├── videos/
│ ├── ApplyEyeMakeup/
│ │ ├── v_ApplyEyeMakeup_g01_c01.avi
│ │ ├── v_ApplyEyeMakeup_g01_c02.avi
│ │ ├── v_ApplyEyeMakeup_g01_c03.avi
│ │ └── ...
│ └── ...
├── ucf101_train_split_1_videos.txt
├── ucf101_train_split_2_videos.txt
├── ucf101_train_split_3_videos.txt
├── ucf101_val_split_1_videos.txt
├── ucf101_val_split_2_videos.txt
└── ucf101_val_split_3_videos.txt- Download videos.
cd mmaction2/tools/data/hmdb51/
bash download_videos.shThis downloads hmdb51_org.rar and unrars it to data/hmdb51/videos.
- Verify the number of videos.
find videos/ -type f | wc -lExpected: 6,766.
- Download annotations.
bash download_annotations.sh
cd -This creates the annotations/ directory and puts files in it.
- Generate splits.
python3 mmaction2/tools/data/build_file_list.py hmdb51 data/hmdb51/videos/ --format videos --shuffle --seed 0This creates:
hmdb51_train_split_{1-3}_videos.txthmdb51_val_split_{1-3}_videos.txt
- Verify files and directories.
intercutmix/data/hmdb51/
├── annotations/
│ ├── brush_hair_test_split1.txt
│ ├── brush_hair_test_split2.txt
│ ├── brush_hair_test_split3.txt
│ └── ...
├── frames/
│ ├── brush_hair/
│ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0/
│ │ │ ├── img_00001.png
│ │ │ ├── img_00002.png
│ │ │ ├── img_00003.png
│ │ │ └── ...
│ │ └── ...
│ └── ...
├─── videos/
│ ├── brush_hair/
│ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_0.avi
│ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_1.avi
│ │ ├── April_09_brush_hair_u_nm_np1_ba_goo_2.avi
│ │ └── ...
│ └── ...
├── hmdb51_train_split_1_videos.txt
├── hmdb51_train_split_2_videos.txt
├── hmdb51_train_split_3_videos.txt
├── hmdb51_val_split_1_videos.txt
├── hmdb51_val_split_2_videos.txt
└── hmdb51_val_split_3_videos.txt- Download videos.
mkdir -p data/kinetics100
cd data/kinetics100
gdown 1_gPSZDo_yasyEbtfs0m2edZAzhsr8dU4
tar xzf kinetics100.tar.gz
mv kinetics_100/ videos/
rm kinetics100.tar.gzThis downloads kinetics100.tar.gz and extracts it to data/kinetics100/videos.
- Verify the number of videos.
find videos/ -type f | wc -lExpected: 9,999.
-
Open
settings.jsonand setactive.dataset: kinetics100. -
Generate annotations.
python3 tools/data/classind.pyThis creates data/kinetics100/annotations/classInd.txt.
- Generate splits.
python3 tools/data/split.pyThis creates:
data/kinetics100/kinetics100_train_split_1_videos.txtdata/kinetics100/kinetics100_val_split_1_videos.txt
- Download checkpoints.
pip install gdown
mkdir checkpoints/
gdown -O checkpoints/ <download-key>Refer the following table for <download-key>.
- Run inference.
python3 mmaction/tools/test.py <config-path> <checkpoint-path>Refer the following table for <config-path> and <checkpoint-path>.
| Dataset | Top-1 | Top-5 | Config Path | Checkpoint Path | Download Key |
|---|---|---|---|---|---|
| UCF101 | 87.84% | 97.12% | mmaction2/configs/recognition/c3d-ucf101-soft/c3d_sports1m-pretrained_8xb64-16x1x1-100e_ucf101-rgb-intercutmix-p0.5-mmr0.05-a2.py | checkpoints/ucf101-icm-p0.5-mmr0.05-a2.pth | 1Aynmc64VpLJEXBeNe-Bq_u787h5pX2aq |
| HMDB51 | 55.75% | 85.49% | mmaction2/configs/recognition/c3d-hmdb51-soft/c3d_sports1m-pretrained_8xb64-16x1x1-100e_hmdb51-rgb-intercutmix-p0.5-mmr0.05-a2.py | checkpoints/hmdb51-icm-p0.5-mmr0.05.pth | 1cb3gfG3qJUAAsrMSXcXm0LKHLSuuDGd6 |
| Kinetics100 |
@ARTICLE{11045925,
author={Wihandika, Randy Cahya and Mendonça, Israel and Aritsugi, Masayoshi},
journal={IEEE Access},
title={Interaction-Aware Scene Debiasing for Action Recognition},
year={2025},
volume={13},
number={},
pages={107856-107871},
keywords={Semantics;Training;Punching;Data models;Data augmentation;Visualization;Correlation;Training data;Spatiotemporal phenomena;Representation learning;Action recognition;scene debiasing;video augmentation},
doi={10.1109/ACCESS.2025.3581931}}