Add README

welf · welf · commit 0961bbf0cd4c · 2025-01-14T13:24:58.000+01:00
diff --git a/README.md b/README.md
@@ -0,0 +1,160 @@
+# Code Context Generator for LLMs
+
+![Build Status](https://img.shields.io/github/workflow/status/welf/code-context/CI)
+![License](https://img.shields.io/github/license/welf/code-context)
+
+A CLI tool designed to process Rust source code, creating a high-level context
+suitable for Large Language Models (LLMs). It eliminates non-essential
+information that allows you share with LLMs large codebases.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Features](#features)
+- [Installation](#installation)
+- [Usage](#usage)
+- [Examples](#examples)
+- [FAQ](#faq)
+- [Contributing](#contributing)
+- [License](#license)
+
+## Overview
+
+When working with LLMs on large codebases, it's crucial to balance providing
+enough context while staying within context window limits and optimizing for
+cost and performance. This tool processes Rust code to remove unnecessary
+implementation details while preserving the essential structure and interfaces.
+
+### Considerations
+
+- **Context Window Management**: By stripping down the code to its essential
+  structure, the tool helps fit more relevant information within the LLM's
+  context window, which is crucial for effective processing and understanding.
+- **Focus on Essentials**: The tool preserves the module structure, type
+  definitions, function signatures, and important comments, which are often
+  sufficient for understanding the overall architecture and design of the
+  project.
+- **Reduced Noise**: Removing implementation details and test code reduces
+  noise, allowing the LLM to focus on the high-level structure and relationships
+  within the codebase.
+- **Scalability**: This approach scales well with large projects, as it avoids
+  overwhelming the LLM with unnecessary details, making it easier to handle and
+  process large codebases.
+- **Incremental Sharing**: The tool's approach of sharing small parts of the
+  codebase as needed ensures that the LLM has access to detailed information
+  when required, without overwhelming it with the entire codebase.
+
+## Features
+
+- **Removes**:
+  - Function bodies (with specific exceptions)
+  - Test functions (`#[test]`) and test modules (`#[cfg(test)]`)
+  - Doc comments and module-level documentation when the `--no-comments` flag is
+    used
+  - Implementation details of derived traits
+- **Preserves**:
+  - Module structure and imports
+  - Type definitions (structs, enums, traits)
+  - Function signatures and interfaces
+  - Non-test attributes (e.g., `#[derive]`)
+  - Doc comments and module-level documentation (unless `--no-comments` is
+    specified)
+  - Function bodies for:
+    - String-like return types (`String`, `&str`, `Cow<str>`)
+    - `Result<T, E>` where `T` is string-like
+    - `Option<T>` where `T` is string-like
+    - Custom `Serialize` trait implementations
+  - Special trait method annotations:
+    - `/// This is a required method` for required trait methods
+    - `/// There is a default implementation` for methods with default
+      implementations
+  - File paths relative to the `src` directory with `main.rs` and `lib.rs` files
+    if the `--single-file` flag is used
+
+## Installation
+
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/code-context.git
+cd code-context
+
+# Build the project
+cargo build --release
+```
+
+## Usage
+
+```bash
+# Basic usage
+code-context <input_path>
+
+# With options
+code-context <input_path> --output-dir <suffix_for_output_dir_name> --no-comments --stats --dry-run --single-file
+```
+
+### Command Line Options
+
+```
+Options:
+  -o, --output-dir <NAME>  Output directory name [default: code-context]
+      --no-comments        Remove all comments (including doc comments)
+      --stats              Show processing statistics
+      --dry-run            Run without writing output files
+      --single-file        Output all files into a single combined file
+  -h, --help               Print help
+  -V, --version            Print version
+```
+
+## Examples
+
+Generated output files can be found in the
+[`src-code-context`](./src-code-context/) and
+[`src-custom-suffix`](./src-custom-suffix/) directories.
+
+- The file
+  [`src-code-context/code_context.rs.txt`](./src-code-context/code_context.rs.txt)
+  was generated by passing the path to the `src` directory of this repo with the
+  `--single-file` flag.
+- Files in the [`src-custom-suffix`](./src-custom-suffix/) directory were
+  generated by passing the path to the `src` directory with the
+  `--output-dir custom-suffix` flag.
+
+In both cases, the size reduction is 85.8% (from 37416 bytes to 5330 bytes).
+
+### Before and After Example
+
+**Before:**
+
+```rust
+fn add(a: i32, b: i32) -> i32 {
+    a + b
+}
+```
+
+**After:**
+
+```rust
+fn add(a: i32, b: i32) -> i32 {}
+```
+
+## FAQ
+
+**Q: What types of files does this tool process?**\
+A: The tool processes files with the `.rs` extension only. It does not process
+files with `.toml`, `.json`, or other extensions.
+
+**Q: Can I run the tool without writing output files?**\
+A: Yes, use the `--dry-run` flag to run the tool without writing output files.
+
+**Q: Why output file(s) have an extension `.rs.txt`. Why not generate `.rs`
+file(s)?**\
+A: If the tool generates `.rs` files, the `rust-analyzer` will generate a lot of
+compilation errors. To avoid this, the tool generates `.rs.txt` files.
+
+## Contributing
+
+Contributions are welcome! Please feel free to submit a Pull Request.
+
+## License
+
+MIT License