Skip to content

Commit

Permalink
docs: add Indexing section to PERFORMANCE.md
Browse files Browse the repository at this point in the history
[skip ci]
  • Loading branch information
jqnatividad committed Sep 29, 2024
1 parent d33815e commit 804145a
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions docs/PERFORMANCE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# Performance Tuning

## Index! Index! Index!

Indexing your CSV files is a game-changer for performance. Here's why:

1. **Faster Slicing**: `slice` uses the index to directly retrieve relevant rows.

2. **Instant Row Counts**: `count` the total number of rows instantly.

3. **Parallel Processing**: Indexing enables multithreading, dramatically speeding up supported commands like `stats`, `frequency`, `sample`, `split` and `tojsonl`.

4. **Random Access**: The `luau` command gains random access capabilities.

5. **Low Overhead**: Creating an index is fast and efficient, even for large files.

To enable automatic indexing:
- Set the `QSV_AUTOINDEX_SIZE` environment variable
- Specify the minimum file size (in bytes) for auto-indexing

```bash
# automatically create an index for files larger than 10MB
export QSV_AUTOINDEX_SIZE=10000000
```

## CPU Optimization

Modern CPUs have various features that the Rust compiler can take advantage
Expand Down

0 comments on commit 804145a

Please sign in to comment.