FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
-
Updated
Sep 4, 2024 - Python
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Commented source code of Texas Instrument's original Speak and Spell™
A Raspberry Pi Pico (RP2040)-based 2114 SRAM Emulator for the Busch 2090 Microtronic Computer System
a 4 bit TTL computer
4 bit ALU in verilog
Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"
This emulates the 4bit computer and be able to run on the browser.
📚 This repository demonstrates how to interface a 16x2 alphanumeric LCD with the 8051 Microcontrollers (AT89S52) using Assembly Language Programming. The project is designed to showcase a practical example of sending data from the 8051 microcontroller to an LCD and includes Proteus simulation files for testing, modify, debug and visualization.
A cycle-accurate VHDL model for COP400 devices
These are VHDL codes for a signed 4bit multiplier using 4bit adders. Base on Baugh-Wooley Method.
Add a description, image, and links to the 4bit topic page so that developers can more easily learn about it.
To associate your repository with the 4bit topic, visit your repo's landing page and select "manage topics."