FCMAE-torch

A minimal yet flexible PyTorch implementation of the Fully Convolutional Masked Autoencoder (FCMAE) with ConvNeXtV2, designed to run without MinkowskiEngine and to handle non-square images natively. This repository aims to serve as both a research baseline and a demonstration of clean, extensible implementation practices.

Features

🚀 Pure PyTorch – No dependency on MinkowskiEngine or other heavy frameworks
🖼️ Flexible Input Shapes – Now supports non-square image sizes
🛠️ Readable & Extensible Codebase – Easy to adapt for custom datasets, architectures, or research experiments

Getting Started

Installation

Clone the repository and build the environment:

git clone https://github.com/BSP36/fcmae.git
cd fcmae
bash docker_run.sh

This script sets up a Docker container with all required dependencies pre-installed, ensuring reproducibility across environments.

Pre-training

To launch pre-training (default: STL10 dataset):

python pretrain.py --name test

Results are stored in ./experiments/test.
The code is preconfigured for STL10 but can be easily extended to other datasets.

To evaluate and visualize reconstructions:

python test_fcmae.py --name test

Inference results are saved in ./experiments/test/results.

Fine-tuning

After pre-training, fine-tune the encoder on classification:

python finetune.py --pre test --name test_ft

--pre specifies the name of the pre-trained experiment.
Fine-tuned checkpoints and logs are stored in ./experiments/test_ft.
For baseline comparison, supervised learning from scratch can be run by adding the --without_pre flag.

Checkpoints

We provide checkpoints and YAML configuration files under ./experiments for reproducibility. All checkpoint files are stored in float16 format to reduce disk usage while maintaining negligible performance loss.

Pre-training Results

We pre-train three FCMAE variants of different scales:

Model	stem_stride	depths	dims	Encoder Params
zepto	4	[2, 2, 4]	[40, 80, 160]	1.07M
atto	2	[2, 2, 6, 2]	[40, 80, 160, 320]	3.39M
nano	2	[2, 2, 8, 2]	[80, 160, 320, 640]	14.98M

Default hyperparameters are used unless otherwise noted.
Complete hyperparameter settings are provided as YAML files under ./experiments.
atto and nano configurations follow the original FCMAE paper for direct comparability.

Dataset

We use the STL-10 dataset, which consists of:

unlabeled: 100,000 images
train: 10 classes × 500 images
test: 10 classes × 800 images

Pre-training is performed only on the unlabeled split.

Training Dynamics

The reconstruction loss decreases stably throughout training:

Quantitative Results

For each model, we report the final MAE loss across all dataset splits, with and without data augmentation:

	w/o Aug.			w/ Aug.
Model	Unlabeled	Train	Test	Unlabeled	Train	Test
zepto	0.540	0.537	0.537	0.498	0.492	0.488
atto	0.527	0.523	0.523	0.483	0.477	0.473
nano	0.513	0.509	0.509	0.467	0.461	0.456

Observations:

Larger models consistently achieve lower reconstruction loss.
Data augmentation slightly reduces loss, though the relative trend across model sizes remains unchanged.

Qualitative Results

Reconstruction visualizations are shown as [masked, original, zepto, atto, nano]:

Unlabeled:
Train:
Test:

Fine-tuning vs. Supervised Learning

We evaluate the effect of FCMAE pre-training on downstream classification using STL-10. Models fine-tuned from FCMAE are compared against those trained from scratch.

⚠️ The goal here is not to reach SOTA, but to analyze learning dynamics under limited data.

Fine-tuning uses the STL-10 train split (10 classes × 500 images).
The STL-10 test split (10 classes × 800 images) is used for validation.
Optimization: AdamW (lr = 1.5e-4, batch size = 128, 300 epochs).
All encoder weights are updated during fine-tuning.

Training Dynamics

_ft = fine-tuned after FCMAE pre-training
_sl = supervised learning from scratch

Observations:

FCMAE pre-training accelerates convergence and improves final accuracy.
Gains are most notable in the early epochs, suggesting stronger initialization.
Pre-trained encoders capture robust, abstract features transferable to classification.

Accuracy & Macro F1

Top-1 Accuracy and Macro F1 on the STL-10 test set:

Model	Accuracy (w/ FCMAE)	Accuracy (scratch)	Macro F1 (w/ FCMAE)	Macro F1 (scratch)
zepto	0.778	0.605	0.778	0.603
atto	0.831	0.639	0.831	0.636
nano	0.875	0.675	0.876	0.673

Summary: FCMAE pre-training improves both Accuracy and Macro F1 by ~20 points across all model sizes.

Note: We did not apply advanced augmentation (e.g., CutMix) or extensive hyperparameter tuning. Furthermore, larger backbones (e.g., ConvNeXtV2-Base, 89M params) are expected to surpass 90% accuracy, though at significantly higher computational cost.

Confusion Matrix

Confusion matrices illustrate class-level behavior:

Zepto:
Atto:
Nano:

Observations:

Common error patterns (e.g., confusing dog with cat or car with truck)
Pre-training with FCMAE contributes to consistent improvements across all regions of the confusion matrix, including these challenging cases.

Conclusion

FCMAE substantially enhances classification performance under limited labeled data by leveraging unlabeled images.
Gains are consistent across scales, highlighting the method’s robustness.

References

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
dataloader		dataloader
experiments		experiments
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker_run.sh		docker_run.sh
finetune.py		finetune.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
test_fcmae.py		test_fcmae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FCMAE-torch

Features

Getting Started

Installation

Pre-training

Fine-tuning

Checkpoints

Pre-training Results

Dataset

Training Dynamics

Quantitative Results

Qualitative Results

Fine-tuning vs. Supervised Learning

Training Dynamics

Accuracy & Macro F1

Confusion Matrix

Conclusion

References

License

About

Uh oh!

Releases

Packages

Languages

License

BSP36/fcmae

Folders and files

Latest commit

History

Repository files navigation

FCMAE-torch

Features

Getting Started

Installation

Pre-training

Fine-tuning

Checkpoints

Pre-training Results

Dataset

Training Dynamics

Quantitative Results

Qualitative Results

Fine-tuning vs. Supervised Learning

Training Dynamics

Accuracy & Macro F1

Confusion Matrix

Conclusion

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages