Skip to content
View juvi21's full-sized avatar

Sponsoring

@teknium1
@MarioSieg

Block or report juvi21

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TOPLOC: is a novel method for verifiable inference that enables users to verify that LLM providers are using the correct model configurations and settings

Python 10 4 Updated Jan 28, 2025

(WIP) A small but powerful, homemade PyTorch from scratch.

C++ 521 24 Updated Feb 3, 2025

[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Python 84 4 Updated Nov 22, 2024

An app that brings language models directly to your phone.

TypeScript 2,116 181 Updated Feb 5, 2025

Use your Neovim like using Cursor AI IDE!

Lua 9,678 380 Updated Feb 8, 2025

Dynamic Memory Management for Serving LLMs without PagedAttention

C 280 22 Updated Feb 2, 2025

Refine high-quality datasets and visual AI models

Python 9,150 594 Updated Feb 9, 2025

A torch-based, universal tensor-parallel library.

Python 3 Updated May 31, 2024

Efficient Triton Kernels for LLM Training

Python 4,369 260 Updated Feb 8, 2025
Python 1,859 133 Updated Nov 8, 2024

An Open Source Toolkit For LLM Distillation

Python 467 52 Updated Jan 7, 2025

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

C++ 168 8 Updated Feb 7, 2025

Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline

JavaScript 223 25 Updated Jun 5, 2024

Tile primitives for speedy kernels

Cuda 1,994 106 Updated Feb 9, 2025

Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719

Python 22 Updated Jun 5, 2024

The official evaluation suite and dynamic data release for MixEval.

Python 234 39 Updated Nov 10, 2024

CUDA implementation of Wavelet KAN.

Cuda 11 2 Updated Jun 8, 2024

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Python 4,216 379 Updated Aug 1, 2024

T2 SDE Linux

C 367 52 Updated Feb 8, 2025
Python 725 49 Updated Jun 13, 2024

Results of the Tiny Chess Bot Challenge

C# 121 14 Updated Dec 28, 2023

Inference Llama 2 in one file of pure C. Nahh wait, now fresh in Julia!

Python 23 1 Updated Aug 2, 2023

GEF (GDB Enhanced Features) - a modern experience for GDB with advanced debugging capabilities for exploit devs & reverse engineers on Linux

Python 7,217 755 Updated Jan 26, 2025

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,869 1,925 Updated Jul 26, 2024

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 288,540 48,036 Updated Dec 2, 2024

Development repository for the Triton language and compiler

C++ 14,320 1,778 Updated Feb 9, 2025

HPC Container Maker

Python 465 94 Updated Jan 10, 2025

CUDA Kernel Benchmarking Library

Cuda 555 71 Updated Nov 20, 2024

Inference Llama 2 in one file of pure C

C 17,998 2,192 Updated Aug 6, 2024

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,029 517 Updated Sep 6, 2024