PyTorch implementation of I-JEPA
This implementation seems to not work during fine-tuning/downstream tasks although the loss decreases steadily during the pre-training. Perhaps, I'll come back to attend this problem in the near future. Do let me know if you found anything wrong with my implementation.