GitHub - lucidrains/g-mlp-gpt at 246fc353d1782aa2207c83d6bbdbf597263c415a

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
data		data
g_mlp_gpt		g_mlp_gpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
train.py		train.py

Repository files navigation

GPT - gMLP

This repository will attempt to crack long context autoregressive language modeling (GPT) using variations of gMLPs. Specifically, it will contain a variant that does gMLP for local sliding windows. The hope is to be able to stretch a single GPU to be able to train context lengths of 4096 and above efficiently and well.

GPT is technically a misnomer now, since there will be no attention (transformer) at all contained in the architecture.

Install

$ pip install g-mlp-gpt

Usage

import torch
from g_mlp_gpt import gMLPGPT

model = gMLPGPT(
    num_tokens = 20000,
    dim = 512,
    depth = 4,
    seq_len = 1024,
    window_size = (128, 256, 512, 1024) # window sizes for each depth
)

x = torch.randint(0, 20000, (1, 1000))
logits = model(x) # (1, 1000, 20000)

Citations

@misc{liu2021pay,
    title   = {Pay Attention to MLPs}, 
    author  = {Hanxiao Liu and Zihang Dai and David R. So and Quoc V. Le},
    year    = {2021},
    eprint  = {2105.08050},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT - gMLP

Install

Usage

Citations

About

Releases 14

Packages

Languages

License

lucidrains/g-mlp-gpt

Folders and files

Latest commit

History

Repository files navigation

GPT - gMLP

Install

Usage

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Languages

Packages