-
04:09
(UTC +01:00) - https://gum1h0x.com/
- @gum1h0x
Stars
TOPLOC: is a novel method for verifiable inference that enables users to verify that LLM providers are using the correct model configurations and settings
(WIP) A small but powerful, homemade PyTorch from scratch.
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
An app that brings language models directly to your phone.
Dynamic Memory Management for Serving LLMs without PagedAttention
Refine high-quality datasets and visual AI models
Efficient Triton Kernels for LLM Training
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
The official evaluation suite and dynamic data release for MixEval.
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
Results of the Tiny Chess Bot Challenge
juvi21 / llama2.jl
Forked from karpathy/llama2.cInference Llama 2 in one file of pure C. Nahh wait, now fresh in Julia!
GEF (GDB Enhanced Features) - a modern experience for GDB with advanced debugging capabilities for exploit devs & reverse engineers on Linux
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Development repository for the Triton language and compiler
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.