Skip to content

welf/code-context

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Context Generator for LLMs

Build Status License

A CLI tool designed to process Rust source code, creating a high-level context suitable for Large Language Models (LLMs). It eliminates non-essential information that allows you share with LLMs large codebases.

Table of Contents

Overview

When working with LLMs on large codebases, it's crucial to balance providing enough context while staying within context window limits and optimizing for cost and performance. This tool processes Rust code to remove unnecessary implementation details while preserving the essential structure and interfaces.

Considerations

  • Context Window Management: By stripping down the code to its essential structure, the tool helps fit more relevant information within the LLM's context window, which is crucial for effective processing and understanding.
  • Focus on Essentials: The tool preserves the module structure, type definitions, function signatures, and important comments, which are often sufficient for understanding the overall architecture and design of the project.
  • Reduced Noise: Removing implementation details and test code reduces noise, allowing the LLM to focus on the high-level structure and relationships within the codebase.
  • Scalability: This approach scales well with large projects, as it avoids overwhelming the LLM with unnecessary details, making it easier to handle and process large codebases.
  • Incremental Sharing: The tool's approach of sharing small parts of the codebase as needed ensures that the LLM has access to detailed information when required, without overwhelming it with the entire codebase.

Features

  • Removes:
    • Function bodies (with specific exceptions)
    • Test functions (#[test]) and test modules (#[cfg(test)])
    • Doc comments and module-level documentation when the --no-comments flag is used
    • Implementation details of derived traits
  • Preserves:
    • Module structure and imports
    • Type definitions (structs, enums, traits)
    • Function signatures and interfaces
    • Non-test attributes (e.g., #[derive])
    • Doc comments and module-level documentation (unless --no-comments is specified)
    • Function bodies for:
      • String-like return types (String, &str, Cow<str>)
      • Result<T, E> where T is string-like
      • Option<T> where T is string-like
      • Custom Serialize trait implementations
    • Special trait method annotations:
      • /// This is a required method for required trait methods
      • /// There is a default implementation for methods with default implementations
    • File paths relative to the src directory with main.rs and lib.rs files if the --single-file flag is used

Installation

# Clone the repository
git clone https://github.com/yourusername/code-context.git
cd code-context

# Build the project
cargo build --release

Usage

# Basic usage
code-context <input_path>

# With options
code-context <input_path> --output-dir <suffix_for_output_dir_name> --no-comments --stats --dry-run --single-file

Command Line Options

Options:
  -o, --output-dir <NAME>  Output directory name [default: code-context]
      --no-comments        Remove all comments (including doc comments)
      --stats              Show processing statistics
      --dry-run            Run without writing output files
      --single-file        Output all files into a single combined file
  -h, --help               Print help
  -V, --version            Print version

Examples

Generated output files can be found in the src-code-context and src-custom-suffix directories.

  • The file src-code-context/code_context.rs.txt was generated by passing the path to the src directory of this repo with the --single-file flag.
  • Files in the src-custom-suffix directory were generated by passing the path to the src directory with the --output-dir custom-suffix flag.

In both cases, the size reduction is 85.8% (from 37416 bytes to 5330 bytes).

Before and After Example

Before:

fn add(a: i32, b: i32) -> i32 {
    a + b
}

After:

fn add(a: i32, b: i32) -> i32 {}

FAQ

Q: What types of files does this tool process?
A: The tool processes files with the .rs extension only. It does not process files with .toml, .json, or other extensions.

Q: Can I run the tool without writing output files?
A: Yes, use the --dry-run flag to run the tool without writing output files.

Q: Why output file(s) have an extension .rs.txt. Why not generate .rs file(s)?
A: If the tool generates .rs files, the rust-analyzer will generate a lot of compilation errors. To avoid this, the tool generates .rs.txt files.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

About

Code context generator for LLMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages