[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment
-
Updated
Apr 5, 2025 - Python
[CVPR 2025, Highlight] CrossOver: 3D Scene Cross-Modal Alignment
[Reproduce] Code for the ACL2019 paper "Multimodal Transformer for Unaligned Multimodal Language Sequences".
Graph Aligned Large Language Models for Improved Source Code Understanding
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.
Add a description, image, and links to the multimodal-alignment topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-alignment topic, visit your repo's landing page and select "manage topics."