Skip to content
Merged

Dev #23

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,32 @@ config.toml
.vectorless.toml

# Test fixtures
test_workspace/
test_workspace/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
.pytest_cache/
.mypy_cache/
.ruff_cache/
.venv/
venv/
ENV/
45 changes: 23 additions & 22 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
[package]
name = "vectorless"
[workspace]
members = ["rust", "python"]
resolver = "2"

[workspace.package]
version = "0.1.19"
edition = "2024"
authors = ["zTgx <beautifularea@gmail.com>"]
description = "Hierarchical, reasoning-native document intelligence engine"
license = "Apache-2.0"
repository = "https://github.com/vectorlessflow/vectorless"
homepage = "https://vectorless.dev"
documentation = "https://docs.rs/vectorless"
keywords = ["rag", "document", "retrieval", "indexing", "llm"]
categories = ["text-processing", "data-structures", "algorithms"]
readme = "README.md"
exclude = ["samples/", "docs/", ".*"]

[dependencies]
[workspace.dependencies]
# Async runtime
tokio = { version = "1", features = ["full"] }
async-trait = "0.1"
Expand All @@ -26,7 +24,7 @@ toml = "0.8"

# Error handling
thiserror = "2"
anyhow = { version = "1", optional = true }
anyhow = "1"

# OpenAI-compatible API client
async-openai = { version = "0.34", features = ["chat-completion"] }
Expand Down Expand Up @@ -62,18 +60,17 @@ lru = "0.12"
# Checksum
sha2 = "0.10"

# BLAKE2b hashing for fingerprints
# BLAKE2b hashing
blake2 = "0.10"
base64 = "0.22"

# Synchronization primitives (for memo store)
# Synchronization primitives
parking_lot = "0.12"

# Compression
flate2 = "1.0"

# File locking (Unix)
[target.'cfg(unix)'.dependencies]
libc = "0.2"

# PDF processing
Expand All @@ -84,7 +81,7 @@ lopdf = "0.34"
zip = "2.2"
roxmltree = "0.20"

# Random number generation (for sampling)
# Random number generation
rand = "0.8"

# BM25 scoring
Expand All @@ -93,11 +90,23 @@ bm25 = { version = "2.3.2", features = ["parallelism"] }
# HTML parsing
scraper = "0.22"

[dev-dependencies]
# Python bindings
pyo3 = { version = "0.22", features = ["extension-module"] }

# Dev dependencies
tempfile = "3.10"
tokio-test = "0.4"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

[workspace.lints.rust]
missing_docs = "warn"
unsafe_code = "warn"

[workspace.lints.clippy]
all = "warn"
pedantic = "warn"

# Profile settings (must be at root level, not under workspace)
[profile.release]
opt-level = 3
lto = "thin"
Expand All @@ -115,11 +124,3 @@ debug = true

[profile.release.package."*"]
opt-level = 3

[lints.rust]
missing_docs = "warn"
unsafe_code = "warn"

[lints.clippy]
all = "warn"
pedantic = "warn"
58 changes: 41 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
<div align="center">

![Vectorless](docs/design/logo-horizontal.svg)
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/logo-horizontal.svg" alt="Vectorless">

[![PyPI](https://img.shields.io/pypi/v/vectorless.svg)](https://pypi.org/project/vectorless/)
[![Python](https://img.shields.io/pypi/pyversions/vectorless.svg)](https://pypi.org/project/vectorless/)
[![PyPI Downloads](https://static.pepy.tech/badge/vectorless/month)](https://pepy.tech/projects/vectorless)
[![Crates.io](https://img.shields.io/crates/v/vectorless.svg)](https://crates.io/crates/vectorless)
[![Downloads](https://img.shields.io/crates/d/vectorless.svg)](https://crates.io/crates/vectorless)
[![Documentation](https://docs.rs/vectorless/badge.svg)](https://docs.rs/vectorless)
[![Crates.io Downloads](https://img.shields.io/crates/d/vectorless.svg)](https://crates.io/crates/vectorless)
[![Docs](https://docs.rs/vectorless/badge.svg)](https://docs.rs/vectorless)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org/)

</div>

## What is Vectorless?

**Vectorless** is a Rust library for querying structured documents using natural language — without vector databases or embedding models.
**Vectorless** is a library for querying structured documents using natural language — without vector databases or embedding models. Core engine written in Rust, with Python bindings.

Instead of chunking documents into vectors, Vectorless preserves the document's tree structure and uses a **hybrid algorithm + LLM approach** to navigate it — like how a human reads a table of contents:

Expand All @@ -22,7 +25,7 @@ Instead of chunking documents into vectors, Vectorless preserves the document's

## How It Works

![How it works](docs/design/how-it-works.svg)
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/how-it-works.svg" alt="How it works">

### 1. Index: Build a Navigable Tree

Expand All @@ -48,7 +51,7 @@ When you ask "How do I reset the device?":

## Traditional RAG vs Vectorless

![Traditional RAG vs Vectorless](docs/design/comparison.svg)
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/comparison.svg" alt="Traditional RAG vs Vectorless">

| Aspect | Traditional RAG | Vectorless |
|--------|----------------|------------|
Expand Down Expand Up @@ -90,44 +93,65 @@ Source: Chapter 4 > Section 4.2 > Reset Procedure

## Quick Start

### Installation
<details open>
<summary><b>Python</b></summary>

```bash
pip install vectorless
```

```python
from vectorless import Engine, IndexContext

# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")

# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)

# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")
```

</details>

<details>
<summary><b>Rust</b></summary>

```toml
[dependencies]
vectorless = "0.1"
```

### Configuration

```bash
cp vectorless.example.toml ./vectorless.toml
```

### Usage

```rust
use vectorless::Engine;

#[tokio::main]
async fn main() -> vectorless::Result<()> {
// Create client
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;

// Index a document (PDF, Markdown, DOCX, HTML)
let doc_id = client.index("./document.pdf").await?;

// Query with natural language
let result = client.query(&doc_id, "What are the system requirements?").await?;
let result = client.query(&doc_id,
"What are the system requirements?").await?;

println!("Answer: {}", result.content);
println!("Source: {}", result.path); // e.g., "Chapter 2 > Section 2.1"
println!("Source: {}", result.path);

Ok(())
}
```

</details>

## Features

| Feature | Description |
Expand All @@ -142,7 +166,7 @@ async fn main() -> vectorless::Result<()> {

## Architecture

![Architecture](docs/design/architecture.svg)
<img src="https://raw.githubusercontent.com/vectorlessflow/vectorless/main/docs/design/architecture.svg" alt="Architecture">

### Core Components

Expand Down
Loading