-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy path.clinerules
289 lines (222 loc) · 8.86 KB
/
.clinerules
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
# Probe Project Guidelines
This file provides guidelines for AI assistants and developers working on the probe project.
## Project Overview
Probe is a tool for searching code repositories with powerful filtering and ranking capabilities. It provides:
- Regex-based code search with stemming and stopword removal
- Language-aware code block extraction
- Frequency-based search for better relevance
- Flexible term matching with boundary control
- Result ranking using TF-IDF and BM25 algorithms
## Project Structure
```
probe/
├── src/ # Main source code
│ ├── language/ # Language-specific parsing
│ ├── search/ # Core search functionality
│ │ ├── file_processing.rs # File content processing
│ │ ├── file_search.rs # File searching logic
│ │ ├── query.rs # Query processing
│ │ ├── result_ranking.rs # Result ranking algorithms
│ │ └── search_execution.rs # Main search execution
│ ├── agent.rs # AI agent implementation
│ ├── lib.rs # Library entry point
│ └── main.rs # CLI application entry point
├── tests/ # Integration tests
│ ├── mocks/ # Mock files for testing
│ └── ... # Various test files
└── mcp/ # MCP server for probe
```
## Code Style Guide
### Rust Conventions
1. **Naming**:
- Use `snake_case` for variables, functions, and modules
- Use `CamelCase` for types and traits
- Use `SCREAMING_SNAKE_CASE` for constants
2. **Documentation**:
- All public functions, structs, and modules should have doc comments
- Use `///` for doc comments
- Include examples where appropriate
3. **Error Handling**:
- Use `Result<T, E>` for functions that can fail
- Prefer `?` operator for error propagation
- Use `anyhow::Result` for functions that can return multiple error types
4. **Formatting**:
- Follow standard Rust formatting (rustfmt)
- Use 4 spaces for indentation
- Maximum line length of 100 characters
### Project-Specific Conventions
1. **Pattern Generation**:
- When generating regex patterns, use `HashSet` to avoid duplicates
- For term patterns, generate three variants: start boundary, end boundary, and no boundary
- For multi-term queries, generate concatenated combinations
2. **File Processing**:
- Use AST parsing when possible for more accurate code block extraction
- Fall back to line-based context when AST parsing fails
- Apply the 80% rule: if more than 80% of a file is matched, return the entire file
## Testing Approach
### Test Organization
1. **Unit Tests**:
- Place unit tests in the same file as the code they test
- Use `#[cfg(test)]` module at the end of each file
- For complex modules, use a separate `*_tests.rs` file in the same directory
- Include the tests using `include!("module_tests.rs")` in a `#[cfg(test)]` module
2. **Integration Tests**:
- Place integration tests in the `tests/` directory
- Use descriptive names for test files
- Group related tests in the same file
### Running Tests
1. **Unit Tests**:
```bash
cargo test --lib
```
2. **Integration Tests**:
```bash
cargo test --test integration_tests
```
3. **All Tests**:
```bash
cargo test
```
4. **Specific Tests**:
```bash
cargo test test_name
```
### Test Coverage
- Aim for high test coverage, especially for core functionality
- Include tests for edge cases and error conditions
- Use property-based testing for functions with complex input domains
## Common Commands
### Building
```bash
cargo build
```
### Running
```bash
cargo run -- search "query" path/to/search
```
### Debug Mode
Enable debug mode to see detailed logging:
```bash
DEBUG=1 cargo run -- search "query" path/to/search
```
### MCP Server
Start the MCP server:
```bash
cd mcp && npm run build && node build/index.js
```
## Dependency Management
1. **Adding Dependencies**:
- Add new dependencies to `Cargo.toml`
- Prefer well-maintained crates with good documentation
- Consider the impact on build time and binary size
2. **Versioning**:
- Use semantic versioning for dependencies
- Specify version constraints to avoid breaking changes
## File Organization
1. **Module Structure**:
- Use `mod.rs` files for module organization
- Group related functionality in the same module
- Export public items from the module root
2. **Code Organization**:
- Place related functions and types together
- Use private helper functions for complex logic
- Keep functions focused on a single responsibility
## Making Changes
When making changes to the codebase:
1. **Pattern Generation**:
- When modifying `create_term_patterns`, ensure it handles:
- Individual terms with flexible boundaries
- Concatenated term combinations for multi-term queries
- Proper regex escaping for special characters
2. **Search Execution**:
- When modifying `perform_probe`, ensure it:
- Properly maps patterns to terms
- Handles both "any term" and "all terms" modes
- Correctly processes filename matches
3. **Result Ranking**:
- When modifying ranking algorithms, ensure they:
- Properly calculate TF-IDF and BM25 scores
- Handle edge cases (empty documents, rare terms)
- Maintain backward compatibility with existing code
## Debugging Tips
1. **Debug Mode**:
- Set `DEBUG=1` to enable debug logging
- Debug logs include detailed information about:
- Pattern generation
- File matching
- Term frequencies
- Result ranking
2. **Common Issues**:
- If search returns no results, check:
- Query preprocessing (stemming, stopwords)
- Pattern generation
- File matching logic
- If ranking seems incorrect, check:
- Term frequency calculation
- Document length normalization
- Score combination logic
## AI Agent
The `agent.rs` file implements an AI-powered assistant that can interact with the code search functionality.
### Agent Overview
- **Purpose**: Provides an interactive AI assistant that can search code repositories and answer questions about the codebase
- **Models**: Supports both Anthropic's Claude and OpenAI's GPT models
- **Features**:
- Interactive CLI interface
- Token usage tracking
- Conversation history management
- Tool integration with code search functionality
- Colored output formatting
### Components
1. **ProbeSearch Tool**:
- Implements the `Tool` trait from the `rig` crate
- Allows the AI to search code using the core probe functionality
- Configurable with various search parameters (pattern, path, exact matching, etc.)
- Returns formatted search results to the AI
2. **ModelType Enum**:
- Represents different LLM backends:
- `Anthropic`: Uses Claude models via Anthropic's API
- `OpenAI`: Uses GPT models via OpenAI's API
3. **ProbeAgent Struct**:
- Main agent implementation
- Handles model initialization based on available API keys
- Tracks token usage for requests and responses
- Implements conversation handling
4. **Chat Implementation**:
- Implements the `RigChat` trait for handling conversations
- Manages conversation history with a limit to prevent context explosion
- Processes tool calls within AI responses
- Handles continuation prompts after tool execution
### Usage
1. **Starting the Agent**:
```bash
cargo run -- agent
```
2. **Environment Variables**:
- `ANTHROPIC_API_KEY`: API key for Anthropic (Claude models)
- `OPENAI_API_KEY`: API key for OpenAI (GPT models)
- `MODEL_NAME`: Override the default model (optional)
- `ANTHROPIC_API_URL`: Override the default Anthropic API URL (optional)
- `OPENAI_API_URL`: Override the default OpenAI API URL (optional)
- `DEBUG`: Enable debug output (set to any value)
3. **CLI Commands**:
- `help`: Display help information
- `quit`: Exit the assistant
### Implementation Details
1. **Token Tracking**:
- Uses `tiktoken_rs::cl100k_base` for token counting
- Tracks request and response tokens separately
- Displays token usage information after each interaction
2. **Conversation Management**:
- Limits history to `MAX_HISTORY_MESSAGES` (4) to prevent context explosion
- Filters out tool outputs from history to save tokens
- Uses continuation prompts to maintain context after tool execution
3. **Tool Execution**:
- Detects "tool:" calls in AI responses
- Executes the ProbeSearch tool with provided parameters
- Returns results to the AI for further processing
- Handles errors and provides appropriate error messages
4. **Model-Specific Handling**:
- Different prompt formats for Anthropic vs OpenAI
- Adjusts token limits based on model capabilities
- Handles API-specific requirements and response formats
- Score combination logic