You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
Automated memory optimization for Pandas DataFrames. Reduces memory by 50-80% and loads CSV 5-10x faster. Drop-in replacement that returns standard DataFrames - works with sklearn, matplotlib, and all your favorite libraries. Safe, fast, and zero refactoring required.
Implementation of PagedAttention from vLLM paper - a breakthrough attention algorithm that treats KV cache like virtual memory. Eliminates memory fragmentation, increases batch sizes, and dramatically improves LLM serving throughput.
A chatbot with RBAC and Human-in-the-Loop used for booking purposes covering the research, plan formation, booking, payment. This is built for multiple users like - Booking customers, internal teams (finance, customer support etc..). Built with multi-agent two-layer supervised architecture in LangGraph powered by LLMs including openAI,Gemini, Ollam
Memory-efficient generic array implementation supporting O(1) `set`, `get`, and `set_all` operations without allocating space proportional to the array size, fully validated by unit tests.
GPU Memory Calculator for LLM Training - Calculate GPU memory requirements for training Large Language Models with support for multiple training engines including PyTorch DDP, DeepSpeed ZeRO, Megatron-LM, and FSDP.
🚀 Comprehensive toolkit for analyzing and optimizing GitHub Copilot performance in large codebases. Includes memory monitoring, workspace analysis, and theoretical foundations with 60-80% memory reduction.