|
| 1 | +# LoCoBench-Agent Dataset Exploration |
| 2 | + |
| 3 | +This document describes the structure and contents of the LoCoBench-Agent dataset for the Harbor adapter implementation. |
| 4 | + |
| 5 | +## Directory Structure |
| 6 | + |
| 7 | +``` |
| 8 | +data/ |
| 9 | +├── generated/ # 1000 synthetic code projects |
| 10 | +│ ├── <project_id>/ # e.g., c_api_gateway_easy_009 |
| 11 | +│ │ ├── <project_name>/ # e.g., EduGate_ScholarLink (actual code files) |
| 12 | +│ │ └── project_metadata.json |
| 13 | +│ └── ... |
| 14 | +│ |
| 15 | +└── output/ |
| 16 | + ├── scenarios/ # 8000 task scenario JSON files |
| 17 | + │ └── <scenario_id>.json |
| 18 | + │ |
| 19 | + ├── agent_scenarios/ # 8000 extended multi-turn agent scenarios |
| 20 | + │ └── <scenario_id>.json |
| 21 | + │ |
| 22 | + └── validation/ |
| 23 | + └── test_suites/ # 8000 test suite definitions |
| 24 | + └── <scenario_id>_tests.json |
| 25 | +``` |
| 26 | + |
| 27 | +## Scenario File Format (data/output/scenarios/*.json) |
| 28 | + |
| 29 | +Each scenario file contains a single task definition with the following fields: |
| 30 | + |
| 31 | +| Field | Type | Description | |
| 32 | +|-------|------|-------------| |
| 33 | +| `id` | string | Unique identifier (e.g., `c_api_gateway_easy_009_architectural_understanding_expert_01`) | |
| 34 | +| `task_category` | string | One of 8 categories (see below) | |
| 35 | +| `difficulty` | string | `easy`, `medium`, `hard`, or `expert` | |
| 36 | +| `title` | string | Human-readable task title | |
| 37 | +| `description` | string | Detailed description of the task context and requirements | |
| 38 | +| `context_files` | array | List of file paths in the synthetic project (uses `//` as separator) | |
| 39 | +| `context_length` | integer | Total token count of all context files | |
| 40 | +| `task_prompt` | string | The actual task/question for the agent to solve | |
| 41 | +| `expected_approach` | string | How an expert would approach the task | |
| 42 | +| `ground_truth` | string or object | Expected answer/solution (format varies by task category) | |
| 43 | +| `evaluation_criteria` | array | List of criteria for judging responses | |
| 44 | +| `metadata` | object | Additional info including files_count, coverage metrics, timestamp | |
| 45 | + |
| 46 | +### Sample Scenario JSON |
| 47 | + |
| 48 | +```json |
| 49 | +{ |
| 50 | + "id": "c_api_gateway_easy_009_architectural_understanding_expert_01", |
| 51 | + "task_category": "architectural_understanding", |
| 52 | + "difficulty": "expert", |
| 53 | + "title": "Architectural Refactoring for Dynamic Route Configuration", |
| 54 | + "description": "EduGate ScholarLink is an API gateway...", |
| 55 | + "context_files": [ |
| 56 | + "EduGate_ScholarLink//src//main.c", |
| 57 | + "EduGate_ScholarLink//src//components//router.c", |
| 58 | + "EduGate_ScholarLink//include//edugate.h", |
| 59 | + ... |
| 60 | + ], |
| 61 | + "context_length": 128233, |
| 62 | + "task_prompt": "Your task is to analyze the existing architecture...", |
| 63 | + "expected_approach": "An expert developer would approach this...", |
| 64 | + "ground_truth": "The core of a correct solution involves...", |
| 65 | + "evaluation_criteria": [ |
| 66 | + "**Analysis Correctness:** Accurately identifies...", |
| 67 | + "**Architectural Viability:** Proposes a sound...", |
| 68 | + ... |
| 69 | + ], |
| 70 | + "metadata": { |
| 71 | + "context_length": 128233, |
| 72 | + "files_count": 11, |
| 73 | + "information_coverage": 0.95, |
| 74 | + "coverage_range": [0.8, 1.0], |
| 75 | + "generation_timestamp": "2025-08-05T15:07:11.561371" |
| 76 | + } |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +## Task Categories (8 total) |
| 81 | + |
| 82 | +The dataset contains 8 distinct task categories, each representing a different type of software engineering challenge: |
| 83 | + |
| 84 | +1. **architectural_understanding** - Analyze and propose architectural changes or refactoring |
| 85 | +2. **bug_investigation** - Identify root causes of bugs from symptoms and propose fixes |
| 86 | +3. **code_comprehension** - Understand and explain how existing code works |
| 87 | +4. **cross_file_refactoring** - Refactor code that spans multiple files |
| 88 | +5. **feature_implementation** - Add new functionality to existing codebase |
| 89 | +6. **integration_testing** - Design or implement integration tests |
| 90 | +7. **multi_session_development** - Tasks requiring iterative development across sessions |
| 91 | +8. **security_analysis** - Identify vulnerabilities and propose security improvements |
| 92 | + |
| 93 | +## Programming Languages (10 total) |
| 94 | + |
| 95 | +Tasks span 10 programming languages, identified by the prefix in the scenario ID: |
| 96 | + |
| 97 | +- `c` - C |
| 98 | +- `cpp` - C++ |
| 99 | +- `csharp` - C# |
| 100 | +- `go` - Go |
| 101 | +- `java` - Java |
| 102 | +- `javascript` - JavaScript |
| 103 | +- `php` - PHP |
| 104 | +- `python` - Python |
| 105 | +- `rust` - Rust |
| 106 | +- `typescript` - TypeScript |
| 107 | + |
| 108 | +## Dataset Statistics |
| 109 | + |
| 110 | +- **Total scenarios**: 8,000 task files |
| 111 | +- **Synthetic projects**: 1,000 generated codebases |
| 112 | +- **Tasks per project**: 8 (one per task category) |
| 113 | +- **Difficulty levels**: easy, medium, hard, expert |
| 114 | +- **Context length range**: Varies from ~40K to 600K+ tokens |
| 115 | + |
| 116 | +## ID Format Convention |
| 117 | + |
| 118 | +Scenario IDs follow the pattern: |
| 119 | +``` |
| 120 | +{language}_{domain}_{complexity}_{project_num}_{task_category}_{difficulty}_{variant} |
| 121 | +``` |
| 122 | + |
| 123 | +Example: `python_api_gateway_expert_045_bug_investigation_hard_01` |
| 124 | +- Language: `python` |
| 125 | +- Domain: `api_gateway` |
| 126 | +- Project complexity: `expert` |
| 127 | +- Project number: `045` |
| 128 | +- Task category: `bug_investigation` |
| 129 | +- Task difficulty: `hard` |
| 130 | +- Variant: `01` |
| 131 | + |
| 132 | +## Extended Agent Scenarios (data/output/agent_scenarios/) |
| 133 | + |
| 134 | +The `agent_scenarios/` folder contains extended versions of each scenario designed for multi-turn agent evaluation. These include: |
| 135 | + |
| 136 | +- `scenario_id` - Matches the base scenario |
| 137 | +- `conversation_phases` - Structured phases for agent interaction: |
| 138 | + 1. **exploration** - Code exploration phase |
| 139 | + 2. **analysis** - Deep analysis phase |
| 140 | + 3. **implementation** - Implementation phase |
| 141 | + 4. **documentation** - Documentation creation phase |
| 142 | +- `dynamic_prompts` - Context-aware follow-up prompts |
| 143 | +- `max_turns_in_phase` - Turn limits per phase |
| 144 | + |
| 145 | +## Validation Test Suites (data/output/validation/test_suites/) |
| 146 | + |
| 147 | +Each scenario has a corresponding test suite JSON with evaluation tests: |
| 148 | + |
| 149 | +- **compilation** - Syntax validation, import resolution, type checking |
| 150 | +- **unit** - Function signatures, error handling, input validation, output correctness |
| 151 | +- **integration** - Module integration, database integration, API integration |
| 152 | +- **performance** - Execution time, memory usage, scalability |
| 153 | +- **security** - Injection prevention, input sanitization, access control |
| 154 | + |
| 155 | +## Key Fields for Task Selection |
| 156 | + |
| 157 | +For selecting high-complexity tasks that demonstrate MCP value: |
| 158 | + |
| 159 | +1. **context_length** - Higher values indicate more complex projects requiring better context management |
| 160 | +2. **metadata.files_count** - More files suggest cross-file reasoning requirements |
| 161 | +3. **task_category** - Some categories inherently require more complex reasoning |
| 162 | +4. **difficulty** - Expert/hard tasks are more challenging |
| 163 | + |
| 164 | +## Notes for Adapter Implementation |
| 165 | + |
| 166 | +1. **File Path Format**: Context file paths use `//` as separator, needs normalization to `/` |
| 167 | +2. **Ground Truth Format**: Varies by task category (string for analysis tasks, object for bug investigation) |
| 168 | +3. **Language Parsing**: Extract from ID prefix (first `_`-separated token) |
| 169 | +4. **Project Location**: Match project from scenario ID prefix (e.g., `c_api_gateway_easy_009`) to find code in `generated/` |
0 commit comments