Precious Package

Overview

The Precious package provides a minimal model showcasing three tokenizer-free approaches for natural language processing tasks. It includes implementations for T-FREE, CANINE, and byte-level embeddings, along with attention mechanisms for enhanced performance.

Installation

From PyPI (Recommended)

pip install precious-nlp

From Source (Development)

git clone https://github.com/bimri/precious.git
cd precious
pip install -e .

With Optional Dependencies

# For development tools
pip install precious-nlp[dev]

# For benchmarking
pip install precious-nlp[benchmarks]

# For documentation
pip install precious-nlp[docs]

# All optional dependencies
pip install precious-nlp[all]

Quick Start

Installation and Import

# Install the package
pip install precious-nlp

# Import the package (note: install as 'precious-nlp', import as 'precious')
import precious
from precious import PreciousModel, PreciousConfig

Usage

Here is a basic example of how to use the PreciousModel:

import precious
from precious import PreciousModel, PreciousConfig

# Initialize the model with the desired configuration
config = PreciousConfig(mode="byte", d_model=256)  # or "tfree", "canine"
model = PreciousModel(config)

# Prepare your input data
inputs = ["Hello, tokenizer-free world!"]
outputs = model(inputs)

# Access the logits
logits = outputs["logits"]
print(f"Output shape: {logits.shape}")  # [batch_size, seq_len, vocab_size]

# Training with targets
targets = ["Hello, tokenizer-free universe!"]
outputs = model(inputs, targets=targets)
loss = outputs["loss"]
print(f"Training loss: {loss.item()}")

Three Tokenizer-Free Approaches

1. Byte-Level Processing

import precious
config = precious.PreciousConfig(mode="byte", d_model=256)
model = precious.PreciousModel(config)
# Processes text at byte level - universal and memory efficient

2. CANINE Approach

import precious
config = precious.PreciousConfig(mode="canine", d_model=256)
model = precious.PreciousModel(config)
# Character-level processing with Unicode support

3. T-FREE Method

import precious
config = precious.PreciousConfig(mode="tfree", d_model=256, tfree_vocab_v=8192)
model = precious.PreciousModel(config)
# Vocabulary-aware with character-level fallback

Key Features

🚀 Three tokenizer-free approaches in one unified library
🎯 Production-ready with comprehensive testing and documentation
🌍 Universal text support - handles any Unicode text
⚡ Efficient processing with configurable model architectures
🧪 Research-friendly with benchmarking and comparison tools
📚 Well-documented with extensive examples and API reference

Quick Performance Comparison

Mode	Memory	Speed	Best For
Byte	Lowest	Fastest	General purpose, production
CANINE	Medium	Medium	Multilingual, character-aware
T-FREE	Highest	Research	Vocabulary analysis, interpretability

Documentation

For complete documentation, visit the docs directory or browse individual guides:

📖 API Reference - Complete API documentation
📝 Examples - From basic to advanced usage

Requirements

Python >= 3.8
PyTorch >= 1.9.0
NumPy >= 1.19.0

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them.
Push your branch and create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
src/precious		src/precious
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
setup_git_repo.sh		setup_git_repo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Precious Package

Overview

Installation

From PyPI (Recommended)

From Source (Development)

With Optional Dependencies

Quick Start

Installation and Import

Usage

Three Tokenizer-Free Approaches

1. Byte-Level Processing

2. CANINE Approach

3. T-FREE Method

Key Features

Quick Performance Comparison

Documentation

Requirements

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

bimri/precious

Folders and files

Latest commit

History

Repository files navigation

Precious Package

Overview

Installation

From PyPI (Recommended)

From Source (Development)

With Optional Dependencies

Quick Start

Installation and Import

Usage

Three Tokenizer-Free Approaches

1. Byte-Level Processing

2. CANINE Approach

3. T-FREE Method

Key Features

Quick Performance Comparison

Documentation

Requirements

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages