Adaptive Sparse Causal Attention for Character-Level Language Modeling

This project implements Adaptive Sparse Causal Attention (ASCA) for character-level language modeling on the enwik8 dataset. The research explores enhancing computational efficiency and performance by introducing content-based adaptive sparsity into attention mechanisms.

Introduction

The enwik8 dataset is a widely recognized benchmark for evaluating compression capabilities in character-level language modeling. This project focuses on improving model efficiency and performance through a novel modification—Adaptive Sparse Causal Attention.

Goals

Develop a character-level language model that captures sequential dependencies in data efficiently.
Introduce a content-based sparsity mechanism to reduce computational burden.
Demonstrate improved bits-per-character (BPC) performance over baseline transformer models.

Key Features

Implementation of Content-Based Adaptive Sparsity Module to dynamically prune attention weights.
Integration with Causal Self-Attention while preserving autoregressive properties.
Evaluation of model performance using BPC and sparsity metrics on the enwik8 dataset.

Implementation Details

Content-Based Adaptive Sparsity Module

This module predicts and applies a sparsity mask to prune irrelevant attention connections dynamically. Key features include:

Content-based mask prediction via a small neural network.
Dynamic adjustment of sparsity during training.
Reduced computational overhead with efficient pruning.

class ContentBasedAdaptiveSparsity(nn.Module):
    def forward(self, x, att):
        # Predict and apply sparsity mask
        ...
        return sparse_att

Adaptive Sparse Causal Self-Attention

This module integrates adaptive sparsity into a standard causal self-attention mechanism:

Ensures autoregressive properties with causal masking.
Dynamically prunes attention connections based on input content.

class AdaptiveSparseCausalSelfAttention(nn.Module):
    def forward(self, x, layer_past=None):
        # Apply causal attention with adaptive sparsity
        ...
        return y

Results and Discussion

Performance Metrics

Baseline Transformer Model: BPC ~5.53 on enwik8 test set.
Adaptive Sparse Causal Attention Model: BPC ~4.26, demonstrating better text compression with reduced computational cost.

Sparsity Analysis

Average sparsity: ~30% during training.
Visualization shows dynamic adjustment of attention weights to retain critical connections.

Key Benefits

Improved BPC performance.
Significant reduction in computational complexity.

Installation

Clone the repository:

git clone https://github.com/Itssshikhar/SAEs.git
cd SAEs

Install dependencies:
```
pip install -r requirements.txt
```
Prepare the enwik8 dataset:
```
python prepare_dataset.py
```

Usage

Train Baseline Model:

python baseline_model.py

Train Adaptive Sparse Model:

python novel_model.py

Evaluate Results:

Use provided scripts to evaluate BPC and visualize sparsity.

Project Structure

baseline_model.py: Baseline transformer implementation.
novel_model.py: Adaptive Sparse Causal Attention implementation.
prepare_dataset.py: Script for preparing the enwik8 dataset.
configurator.py: Configurations for models and training.
extraction.py: Data extraction tools.
*.png & *.pdf: Visualizations of sparsity and performance.

References

Character-Level Language Modeling with Deeper Self-Attention

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
README.md		README.md
__init__.py		__init__.py
attention_sparsity.png		attention_sparsity.png
base-graph.png		base-graph.png
baseline_graph.py		baseline_graph.py
baseline_model.py		baseline_model.py
bench.py		bench.py
configurator.py		configurator.py
enwik8		enwik8
enwik8.zip		enwik8.zip
extraction.py		extraction.py
extraction_mfu.py		extraction_mfu.py
mfu_base.csv		mfu_base.csv
mfu_base.png		mfu_base.png
mfu_novel.csv		mfu_novel.csv
mfu_novel.png		mfu_novel.png
novel approaches		novel approaches
novel-graph.png		novel-graph.png
novel_model.py		novel_model.py
novel_train.py		novel_train.py
output.csv		output.csv
output_novel.csv		output_novel.csv
prepare_dataset.py		prepare_dataset.py
recurrent-attention.pdf		recurrent-attention.pdf
test losses bpc		test losses bpc
test.bin		test.bin
train.bin		train.bin
train.py		train.py
val.bin		val.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Sparse Causal Attention for Character-Level Language Modeling

Table of Contents

Introduction

Goals

Key Features

Implementation Details

Content-Based Adaptive Sparsity Module

Adaptive Sparse Causal Self-Attention

Results and Discussion

Performance Metrics

Sparsity Analysis

Key Benefits

Installation

Usage

Train Baseline Model:

Train Adaptive Sparse Model:

Evaluate Results:

Project Structure

References

License

About

Releases

Packages

Languages

Itssshikhar/ASCA

Folders and files

Latest commit

History

Repository files navigation

Adaptive Sparse Causal Attention for Character-Level Language Modeling

Table of Contents

Introduction

Goals

Key Features

Implementation Details

Content-Based Adaptive Sparsity Module

Adaptive Sparse Causal Self-Attention

Results and Discussion

Performance Metrics

Sparsity Analysis

Key Benefits

Installation

Usage

Train Baseline Model:

Train Adaptive Sparse Model:

Evaluate Results:

Project Structure

References

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages