-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
swarm:reviewReady for reviewReady for review
Description
Feature Description
Add group_by and aggregate parameters to join_files to enable cross-dataset analytics without returning raw records.
Use Cases:
- "D1 retention by character class"
- "Average player level by character class"
- "Total IAP spend by country"
- "Session count by device type"
- Any cross-dataset grouping + aggregation
Currently requires 4-5 tool calls + manual calculations. With this feature: 1 tool call.
API Design
join_files({
file1: "chars.json",
file2: "live.json",
leftKey: "payload.playerId",
rightKey: "entityId",
group_by: "a.payload.character.characterClassId",
aggregate: {
"player_count": "COUNT(*)",
"avg_level": "AVG(b.payload.playerLevel)",
"total_spend": "SUM(b.payload.totalIapSpend)",
"d1_retained": "COUNT_IF(D1_RETENTION(b.payload.loginHistory))"
}
})Implementation Constraints
- Explicit keys required - No auto-inference for group_by or aggregate
- Backwards compatible - Without group_by/aggregate, behaves as before (returns raw joined records)
- Side prefixes required -
a.for left file,b.for right file (existing pattern) - Standard aggregate functions - COUNT, SUM, AVG, MIN, MAX, COUNT_IF
- Retention helper - D1_RETENTION(), D7_RETENTION() functions for login history analysis
Expected Output
{
"groups": {
"ChacChel_Class": {
"player_count": 45,
"avg_level": 12.3,
"total_spend": 156.50,
"d1_retained": 22
},
"Thor_Class": {
"player_count": 82,
"avg_level": 14.1,
"total_spend": 289.00,
"d1_retained": 44
}
},
"totals": {
"player_count": 127,
"avg_level": 13.4,
"total_spend": 445.50,
"d1_retained": 66
},
"joinKeys": { "leftKey": "payload.playerId", "rightKey": "entityId" }
}Files to Modify
json_genius/src/query/join.ts- Add group_by and aggregate logicjson_genius/src/mcp/tools.ts- Updatejoin_filesinput schema- New file:
json_genius/src/query/aggregators.ts- Aggregate function implementations
Algorithm
- Build right-side index (existing)
- Stream left file and join (existing)
- NEW: If group_by specified:
- Extract group key from joined record
- Accumulate aggregate values per group
- NEW: Calculate final aggregate values (AVG = sum/count, etc.)
- Return grouped results + totals
Aggregate Functions
| Function | Description |
|---|---|
COUNT(*) |
Count of records in group |
COUNT(path) |
Count of non-null values at path |
COUNT_IF(condition) |
Count where condition is true |
SUM(path) |
Sum of numeric values |
AVG(path) |
Average of numeric values |
MIN(path) |
Minimum value |
MAX(path) |
Maximum value |
D1_RETENTION(loginHistoryPath) |
True if player returned after 1+ days |
D7_RETENTION(loginHistoryPath) |
True if player returned after 7+ days |
Success Criteria
- group_by parameter groups joined results by specified field
- aggregate parameter calculates metrics per group
- totals always included alongside groups
- Backwards compatible (no group_by = raw records as before)
- Retention helpers work with loginHistory arrays
- Works with LiveTest data (chars.json + live.json)
Created from TD analysis of issue #21 - Priority 2: General cross-dataset aggregation
Metadata
Metadata
Assignees
Labels
swarm:reviewReady for reviewReady for review