Add dataset discovery tool

## Feature Description

Add a `describe_dataset` tool that helps agents understand what data is available and how files relate to each other before querying.

**Problem:** Agents currently need to call multiple tools (get_schema, find_relationships, sample_data) to understand a dataset before they can formulate queries. This creates cognitive load and errors.

## API Design

```typescript
describe_dataset({
  directory: "LiveTest"
})
```

## Expected Output

```json
{
  "directory": "LiveTest",
  "files": [
    {
      "name": "chars.json",
      "sizeMB": 54.5,
      "entityCount": 210,
      "entityType": "PlayerCharacter",
      "keyFields": ["entityId", "payload.playerId", "payload.character.characterClassId"],
      "sampleValues": {
        "characterClassId": ["ChacChel_Class", "Thor_Class"]
      }
    },
    {
      "name": "live.json",
      "sizeMB": 4.8,
      "entityCount": 127,
      "entityType": "Player",
      "keyFields": ["entityId", "payload.playerLevel", "payload.totalIapSpend"],
      "metricsAvailable": ["loginHistory", "deviceHistory", "stats.createdAt"]
    }
  ],
  "relationships": [
    {
      "description": "chars.playerId -> live.entityId",
      "leftFile": "chars.json",
      "leftKey": "payload.playerId",
      "rightFile": "live.json", 
      "rightKey": "entityId",
      "coverage": "100%",
      "type": "many-to-one"
    }
  ],
  "suggestedQueries": [
    "Group by characterClassId to compare player segments",
    "Use loginHistory for retention analysis",
    "Join on playerId <-> entityId for cross-file queries"
  ]
}
```

## Implementation Details

### Existing Code to Reuse

1. **Schema extraction** - `json_genius/src/analyzer/schema-extractor.ts`
   - `extractSchema(filePath, options)` - Returns `SchemaNode` with type info, patterns, examples
   - Already detects `$type` fields which indicate entity type

2. **Relationship detection** - `json_genius/src/analyzer/relationship-finder.ts`
   - `findRelationships(leftFile, rightFile, options)` - Returns `RelationshipResult`
   - Scans for ID fields using patterns: `/Id$/`, `/entityId/i`, etc.
   - Already computes coverage percentage and relationship type

3. **Entity counting** - `json_genius/src/query/aggregate.ts`
   - `count(filePath, options)` - Returns `{ total, scanned }`

4. **Tool pattern** - `json_genius/src/mcp/tools.ts`
   - Follow existing pattern: add tool to `tools` array, create `handleDescribeDataset` function
   - Use `validateFile()` for path resolution (handles relative paths via `dataDir`)
   - Return via `successResult()` / `errorResult()` helpers

### New File: `json_genius/src/analyzer/dataset-discovery.ts`

```typescript
export interface FileInfo {
  name: string;
  sizeMB: number;
  entityCount: number;
  entityType?: string;
  keyFields: string[];
  sampleValues: Record<string, string[]>;
  metricsAvailable?: string[];
}

export interface DatasetDescription {
  directory: string;
  files: FileInfo[];
  relationships: RelationshipSummary[];
  suggestedQueries: string[];
}

export async function describeDataset(
  directory: string
): Promise<DatasetDescription>
```

### Implementation Steps

1. **Scan directory for JSON files** - Use `fs.readdir` + filter for `.json`
2. **For each file:**
   - Get file size via `fs.stat`
   - Count entities using existing `count()` from aggregate.ts
   - Extract schema using `extractSchema()` - look for `$type` to get entity type
   - Identify key fields from schema (fields ending in `Id`, `Ids`, or containing entity ID patterns)
   - Sample unique values for categorical fields (use schema `examples` or light sampling)
3. **Find relationships between all file pairs** - Use `findRelationships()` between each pair
4. **Generate suggested queries** based on:
   - Groupable fields (enums, categorical strings with few unique values)
   - Numeric fields (for stats)
   - Detected relationships (for joins)

### Files to Modify

1. `json_genius/src/mcp/tools.ts` - Add `describe_dataset` tool definition and handler
2. **New:** `json_genius/src/analyzer/dataset-discovery.ts` - Core discovery logic

## Success Criteria

- [ ] Single tool call provides complete dataset overview
- [ ] Relationships auto-detected between files
- [ ] Key fields identified for grouping/joining
- [ ] Sample values shown for categorical fields
- [ ] Suggested queries help agents get started
- [ ] Works with LiveTest directory (if available) or any directory with JSON files
- [ ] Follows existing code patterns (streaming where appropriate, proper error handling)
- [ ] TypeScript compiles without errors (`pnpm typecheck`)

---

*Created from TD analysis of issue #21 - Priority 3: Reduce agent cognitive load*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset discovery tool #24

Feature Description

API Design

Expected Output

Implementation Details

Existing Code to Reuse

New File: `json_genius/src/analyzer/dataset-discovery.ts`

Implementation Steps

Files to Modify

Success Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add dataset discovery tool #24

Description

Feature Description

API Design

Expected Output

Implementation Details

Existing Code to Reuse

New File: json_genius/src/analyzer/dataset-discovery.ts

Implementation Steps

Files to Modify

Success Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

New File: `json_genius/src/analyzer/dataset-discovery.ts`