Stars
An implementation of chunked, compressed, N-dimensional arrays for Python.
Variational Animal Motion Embedding - A tool for time series embedding and clustering
Frontier Multimodal Foundation Models for Image and Video Understanding
EthoML / VAME
Forked from LINCellularNeuroscience/VAMEVariational Animal Motion Embedding - A tool for time series embedding and clustering
Janus-Series: Unified Multimodal Understanding and Generation Models
Fast and accurate automatic speech recognition (ASR) for edge devices
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
a state-of-the-art-level open visual language model | 多模态预训练模型
[ECCVW'24] Long-form Video Understanding by Bridging Episodic Memory and Semantic Knowledge
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Agno is a lightweight framework for building multi-modal Agents
Developer-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!
The official code of "CSTA: CNN-based Spatiotemporal Attention for Video Summarization"
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Repository for 2nd generation Bpod platform (formerly beta branch of Bpod repository)
Simple, unified interface to multiple Generative AI providers
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'