-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
swarm:doneCompleteComplete
Description
Issue #21: Cross-Dataset Query Robustness Analysis
Problem Statement
Agents struggle to answer cross-dataset analytics questions like:
"What is the D1 retention of players using Chac Chel vs. Players using Thor?"
This requires combining data from:
- chars.json (210 PlayerCharacters) - contains
characterClassId, player attributes - live.json (127 Players) - contains
loginHistory,totalIapSpend, metrics
Relationship: chars.payload.playerId → live.entityId
TD Analysis & Discussion Summary
Participants: TD-1, TD-2
Key Findings
-
Existing tools work individually but don't compose well
join_filesreturns raw records, no aggregationrun_report("retention")works at aggregate level only, no segmentation- High cognitive load: 4-5 tool calls + manual calculations for cross-dataset questions
-
Bug discovered:
player-kpisreport has wrong path (payload.characterClassshould bepayload.character.characterClassId) -
Current data:
- 127 players, 210 characters (87 ChacChel, 123 Thor)
- D1: 52%, D3: 30.7%, D7: 16.5%
Agreed Recommendations
The goal is general cross-dataset query flexibility, not just solving one specific query type.
| Priority | Solution | Rationale |
|---|---|---|
| 1 | Bug Fix - player-kpis path | Trivial fix, immediate value |
| 2 | Aggregate Joins - add grouping/aggregation to join_files |
General-purpose, handles any cross-dataset aggregation |
| 3 | Dataset Discovery - help agents understand joinable relationships | Reduces cognitive load for agents |
Design Decision: General Flexibility > Purpose-Built Features
We chose aggregate joins over purpose-built report parameters because:
- Works with ANY two files, ANY join key, ANY aggregation
- Agents can answer many different questions without new tools
- Composable primitives > specialized features
Follow-Up Issues
- Fix player-kpis report characterClassId path #22 - Fix player-kpis report characterClassId path (bug fix)
- Add aggregate/group_by support to join_files #23 - Add aggregate/group_by support to join_files
- Add dataset discovery tool #24 - Add dataset discovery tool
Status: ✅ Analysis Complete
This issue documents the analysis and discussion. Implementation tracked in #22, #23, #24.
Metadata
Metadata
Assignees
Labels
swarm:doneCompleteComplete