Stars
On-device Speech Recognition for Apple Silicon
TheBoringNotch: Not so boring notch That Rocks 🎸🎶
The smartest way to learn touch typing and improve your typing speed.
Simple image captioning model
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Schedule-Free Optimization in PyTorch
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
PyTorch implementation of the InfoNCE loss for self-supervised learning.
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
Django Channels based WebSocket GraphQL server with Graphene-like subscriptions
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Framework for benchmarking fully-managed vector databases
Perceptual video quality assessment based on multi-method fusion.
Highly commented implementations of Transformers in PyTorch
Applying the latest advancements in AI and machine learning to solve complex business problems.
a state-of-the-art-level open visual language model | 多模态预训练模型
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Incredibly descriptive audiovisual summaries for videos
Automatically optimize SQL queries in Graphene-Django schemas.
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Learning audio concepts from natural language supervision
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Cap…
A language for constraint-guided and efficient LLM programming.
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.