Stars
Reference implementation for DPO (Direct Preference Optimization)
Train transformer language models with reinforcement learning.
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Mixture-of-Experts for Large Vision-Language Models
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
A curated list of different papers and datasets in various areas of audio-visual processing
An unofficial PyTorch implementation of the audio LM VALL-E
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
a pytorch implementation of Google GEDLoss
Google Research
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
Efficient Image Captioning code in Torch, runs on GPU
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
Awesome Vision-Language Pretraining Papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
Facial-Expression-Recognition in TensorFlow. Detecting faces in video and recognize the expression(emotion).