|
| 1 | +# Code Context Generator for LLMs |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | +A CLI tool designed to process Rust source code, creating a high-level context |
| 7 | +suitable for Large Language Models (LLMs). It eliminates non-essential |
| 8 | +information that allows you share with LLMs large codebases. |
| 9 | + |
| 10 | +## Table of Contents |
| 11 | + |
| 12 | +- [Overview](#overview) |
| 13 | +- [Features](#features) |
| 14 | +- [Installation](#installation) |
| 15 | +- [Usage](#usage) |
| 16 | +- [Examples](#examples) |
| 17 | +- [FAQ](#faq) |
| 18 | +- [Contributing](#contributing) |
| 19 | +- [License](#license) |
| 20 | + |
| 21 | +## Overview |
| 22 | + |
| 23 | +When working with LLMs on large codebases, it's crucial to balance providing |
| 24 | +enough context while staying within context window limits and optimizing for |
| 25 | +cost and performance. This tool processes Rust code to remove unnecessary |
| 26 | +implementation details while preserving the essential structure and interfaces. |
| 27 | + |
| 28 | +### Considerations |
| 29 | + |
| 30 | +- **Context Window Management**: By stripping down the code to its essential |
| 31 | + structure, the tool helps fit more relevant information within the LLM's |
| 32 | + context window, which is crucial for effective processing and understanding. |
| 33 | +- **Focus on Essentials**: The tool preserves the module structure, type |
| 34 | + definitions, function signatures, and important comments, which are often |
| 35 | + sufficient for understanding the overall architecture and design of the |
| 36 | + project. |
| 37 | +- **Reduced Noise**: Removing implementation details and test code reduces |
| 38 | + noise, allowing the LLM to focus on the high-level structure and relationships |
| 39 | + within the codebase. |
| 40 | +- **Scalability**: This approach scales well with large projects, as it avoids |
| 41 | + overwhelming the LLM with unnecessary details, making it easier to handle and |
| 42 | + process large codebases. |
| 43 | +- **Incremental Sharing**: The tool's approach of sharing small parts of the |
| 44 | + codebase as needed ensures that the LLM has access to detailed information |
| 45 | + when required, without overwhelming it with the entire codebase. |
| 46 | + |
| 47 | +## Features |
| 48 | + |
| 49 | +- **Removes**: |
| 50 | + - Function bodies (with specific exceptions) |
| 51 | + - Test functions (`#[test]`) and test modules (`#[cfg(test)]`) |
| 52 | + - Doc comments and module-level documentation when the `--no-comments` flag is |
| 53 | + used |
| 54 | + - Implementation details of derived traits |
| 55 | +- **Preserves**: |
| 56 | + - Module structure and imports |
| 57 | + - Type definitions (structs, enums, traits) |
| 58 | + - Function signatures and interfaces |
| 59 | + - Non-test attributes (e.g., `#[derive]`) |
| 60 | + - Doc comments and module-level documentation (unless `--no-comments` is |
| 61 | + specified) |
| 62 | + - Function bodies for: |
| 63 | + - String-like return types (`String`, `&str`, `Cow<str>`) |
| 64 | + - `Result<T, E>` where `T` is string-like |
| 65 | + - `Option<T>` where `T` is string-like |
| 66 | + - Custom `Serialize` trait implementations |
| 67 | + - Special trait method annotations: |
| 68 | + - `/// This is a required method` for required trait methods |
| 69 | + - `/// There is a default implementation` for methods with default |
| 70 | + implementations |
| 71 | + - File paths relative to the `src` directory with `main.rs` and `lib.rs` files |
| 72 | + if the `--single-file` flag is used |
| 73 | + |
| 74 | +## Installation |
| 75 | + |
| 76 | +```bash |
| 77 | +# Clone the repository |
| 78 | +git clone https://github.com/yourusername/code-context.git |
| 79 | +cd code-context |
| 80 | + |
| 81 | +# Build the project |
| 82 | +cargo build --release |
| 83 | +``` |
| 84 | + |
| 85 | +## Usage |
| 86 | + |
| 87 | +```bash |
| 88 | +# Basic usage |
| 89 | +code-context <input_path> |
| 90 | + |
| 91 | +# With options |
| 92 | +code-context <input_path> --output-dir <suffix_for_output_dir_name> --no-comments --stats --dry-run --single-file |
| 93 | +``` |
| 94 | + |
| 95 | +### Command Line Options |
| 96 | + |
| 97 | +``` |
| 98 | +Options: |
| 99 | + -o, --output-dir <NAME> Output directory name [default: code-context] |
| 100 | + --no-comments Remove all comments (including doc comments) |
| 101 | + --stats Show processing statistics |
| 102 | + --dry-run Run without writing output files |
| 103 | + --single-file Output all files into a single combined file |
| 104 | + -h, --help Print help |
| 105 | + -V, --version Print version |
| 106 | +``` |
| 107 | + |
| 108 | +## Examples |
| 109 | + |
| 110 | +Generated output files can be found in the |
| 111 | +[`src-code-context`](./src-code-context/) and |
| 112 | +[`src-custom-suffix`](./src-custom-suffix/) directories. |
| 113 | + |
| 114 | +- The file |
| 115 | + [`src-code-context/code_context.rs.txt`](./src-code-context/code_context.rs.txt) |
| 116 | + was generated by passing the path to the `src` directory of this repo with the |
| 117 | + `--single-file` flag. |
| 118 | +- Files in the [`src-custom-suffix`](./src-custom-suffix/) directory were |
| 119 | + generated by passing the path to the `src` directory with the |
| 120 | + `--output-dir custom-suffix` flag. |
| 121 | + |
| 122 | +In both cases, the size reduction is 85.8% (from 37416 bytes to 5330 bytes). |
| 123 | + |
| 124 | +### Before and After Example |
| 125 | + |
| 126 | +**Before:** |
| 127 | + |
| 128 | +```rust |
| 129 | +fn add(a: i32, b: i32) -> i32 { |
| 130 | + a + b |
| 131 | +} |
| 132 | +``` |
| 133 | + |
| 134 | +**After:** |
| 135 | + |
| 136 | +```rust |
| 137 | +fn add(a: i32, b: i32) -> i32 {} |
| 138 | +``` |
| 139 | + |
| 140 | +## FAQ |
| 141 | + |
| 142 | +**Q: What types of files does this tool process?**\ |
| 143 | +A: The tool processes files with the `.rs` extension only. It does not process |
| 144 | +files with `.toml`, `.json`, or other extensions. |
| 145 | + |
| 146 | +**Q: Can I run the tool without writing output files?**\ |
| 147 | +A: Yes, use the `--dry-run` flag to run the tool without writing output files. |
| 148 | + |
| 149 | +**Q: Why output file(s) have an extension `.rs.txt`. Why not generate `.rs` |
| 150 | +file(s)?**\ |
| 151 | +A: If the tool generates `.rs` files, the `rust-analyzer` will generate a lot of |
| 152 | +compilation errors. To avoid this, the tool generates `.rs.txt` files. |
| 153 | + |
| 154 | +## Contributing |
| 155 | + |
| 156 | +Contributions are welcome! Please feel free to submit a Pull Request. |
| 157 | + |
| 158 | +## License |
| 159 | + |
| 160 | +MIT License |
0 commit comments