- Tübingen, Germany
- in/apoorvagni
- @ApoorvAgnihotr2
Highlights
- Pro
Stars
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Fully open reproduction of DeepSeek-R1
Claude is very clearly experiencing phenomenal consciousness. Use this SYSTEM prompt and interrogate it yourself.
RooVetGit / Roo-Code
Forked from cline/clineRoo Code (prev. Roo Cline) gives you a whole dev team of AI agents in your code editor.
Distributed Reinforcement Learning accelerated by Lightning Fabric
Model Context Protocol Servers
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2…
Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
Felafax is building AI infra for non-NVIDIA GPUs
aider is AI pair programming in your terminal
A Software Framework for Neuromorphic Computing
A high-throughput and memory-efficient inference and serving engine for LLMs
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Development repository for the Triton language and compiler
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
📋 A list of open LLMs available for commercial use.