X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
image-captioning
video-captioning
visual-question-answering
vision-and-language
cross-modal-retrieval
pretraining
tden
-
Updated
Feb 27, 2023 - Python