- All languages
- Bikeshed
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Crystal
- Dart
- Dockerfile
- Elixir
- Elm
- Emacs Lisp
- Erlang
- Go
- HCL
- HTML
- Hack
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- Less
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Nunjucks
- OCaml
- Objective-C
- PHP
- Perl
- Python
- ReScript
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Starlark
- Swift
- TeX
- TypeScript
- Vim Script
- Vue
- YAML
Starred repositories
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Build Conversational AI in minutes ⚡️
A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Open source codebase powering the HuggingChat app
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
The Triton TensorRT-LLM Backend
Fast inference engine for Transformer models
LLMPerf is a library for validating and benchmarking LLMs
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Declarative Continuous Deployment for Kubernetes
Netflix's Hystrix latency and fault tolerance library, for Go
✨ Textbase is a simple framework for building AI chatbots. ✨
Rich is a Python library for rich text and beautiful formatting in the terminal.
The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
Run your favourite LLMs locally on macOS from Swift
Chat with your favourite LLaMA models in a native macOS app
A Gradio web UI for Large Language Models with support for multiple inference backends.
French instruction-following and chat models