Skip to content
View rsomani95's full-sized avatar

Organizations

@Synopsis

Block or report rsomani95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

On-device Speech Recognition for Apple Silicon

Swift 4,060 343 Updated Dec 21, 2024

TheBoringNotch: Not so boring notch That Rocks 🎸🎶

Swift 1,542 92 Updated Dec 22, 2024

The smartest way to learn touch typing and improve your typing speed.

TypeScript 2,209 201 Updated Dec 23, 2024

Simple image captioning model

Jupyter Notebook 1,333 220 Updated Jun 9, 2024

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python 188 20 Updated Jan 28, 2024

Schedule-Free Optimization in PyTorch

Python 2,023 69 Updated Dec 2, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,678 161 Updated Dec 26, 2024

PyTorch implementation of the InfoNCE loss for self-supervised learning.

Python 507 41 Updated Nov 17, 2023

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 715 34 Updated Aug 13, 2024

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 897 125 Updated Apr 12, 2024

[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition

Python 278 21 Updated Sep 17, 2023

Django Channels based WebSocket GraphQL server with Graphene-like subscriptions

Python 281 85 Updated Jul 19, 2024

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Python 761 53 Updated Mar 25, 2024

Framework for benchmarking fully-managed vector databases

Python 1 2 Updated Feb 2, 2024

Perceptual video quality assessment based on multi-method fusion.

Python 4,715 758 Updated Nov 8, 2024

Highly commented implementations of Transformers in PyTorch

Python 130 9 Updated Aug 2, 2023

Applying the latest advancements in AI and machine learning to solve complex business problems.

Python 73 29 Updated Mar 13, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,220 420 Updated May 29, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,485 91 Updated Dec 11, 2024

Incredibly descriptive audiovisual summaries for videos

Python 40 2 Updated Aug 2, 2024

tiny vision language model

Jupyter Notebook 6,150 507 Updated Dec 10, 2024

Automatically optimize SQL queries in Graphene-Django schemas.

Python 14 4 Updated Dec 23, 2024

A family of lightweight multimodal models.

Python 966 72 Updated Nov 18, 2024

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 988 64 Updated Oct 6, 2024

Learning audio concepts from natural language supervision

Python 511 39 Updated Sep 18, 2024

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,666 304 Updated Oct 28, 2024
Jupyter Notebook 7,868 553 Updated Jun 16, 2024

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Cap…

76 3 Updated Dec 5, 2022

A language for constraint-guided and efficient LLM programming.

Python 3,743 203 Updated Jun 3, 2024

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,015 517 Updated Sep 6, 2024
Next