rsomani95

Follow

Rahul Somani rsomani95

Follow

61 followers · 33 following

Achievements

Achievements

Organizations

Stars

argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon

Swift 4,060 343 Updated Dec 21, 2024

TheBoredTeam / boring.notch

TheBoringNotch: Not so boring notch That Rocks 🎸🎶

Swift 1,542 92 Updated Dec 22, 2024

aradzie / keybr.com

The smartest way to learn touch typing and improve your typing speed.

TypeScript 2,209 201 Updated Dec 23, 2024

rmokady / CLIP_prefix_caption

Simple image captioning model

Jupyter Notebook 1,333 220 Updated Jun 9, 2024

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python 188 20 Updated Jan 28, 2024

facebookresearch / schedule_free

Schedule-Free Optimization in PyTorch

Python 2,023 69 Updated Dec 2, 2024

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,678 161 Updated Dec 26, 2024

RElbers / info-nce-pytorch

PyTorch implementation of the InfoNCE loss for self-supervised learning.

Python 507 41 Updated Nov 17, 2023

beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 715 34 Updated Aug 13, 2024

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 897 125 Updated Apr 12, 2024

taoyang1122 / adapt-image-models

Forked from amazon-science/adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition

Python 278 21 Updated Sep 17, 2023

datadvance / DjangoChannelsGraphqlWs

Django Channels based WebSocket GraphQL server with Graphene-like subscriptions

Python 281 85 Updated Jul 19, 2024

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 761 53 Updated Mar 25, 2024

silver-ymz / vector-db-benchmark

Forked from myscale/vector-db-benchmark

Framework for benchmarking fully-managed vector databases

Python 1 2 Updated Feb 2, 2024

Netflix / vmaf

Perceptual video quality assessment based on multi-method fusion.

Python 4,715 758 Updated Nov 8, 2024

warner-benjamin / commented-transformers

Highly commented implementations of Transformers in PyTorch

Python 130 9 Updated Aug 2, 2023

prolego-team / neo-sophia

Applying the latest advancements in AI and machine learning to solve complex business problems.

Python 73 29 Updated Mar 13, 2024

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,220 420 Updated May 29, 2024

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,485 91 Updated Dec 11, 2024

sieve-community / describe

Incredibly descriptive audiovisual summaries for videos

Python 40 2 Updated Aug 2, 2024

vikhyat / moondream

tiny vision language model

Jupyter Notebook 6,150 507 Updated Dec 10, 2024

MrThearMan / graphene-django-query-optimizer

Automatically optimize SQL queries in Graphene-Django schemas.

Python 14 4 Updated Dec 23, 2024

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 966 72 Updated Nov 18, 2024

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 988 64 Updated Oct 6, 2024

microsoft / CLAP

Learning audio concepts from natural language supervision

Python 511 39 Updated Sep 18, 2024

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,666 304 Updated Oct 28, 2024

Vaibhavs10 / insanely-fast-whisper

Jupyter Notebook 7,868 553 Updated Jun 16, 2024

google-research-datasets / videoCC-data

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Cap…

76 3 Updated Dec 5, 2022

eth-sri / lmql

A language for constraint-guided and efficient LLM programming.

Python 3,743 203 Updated Jun 3, 2024

Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,015 517 Updated Sep 6, 2024