Skip to content

LLM memory decay benchmark: add EdTech domain scenario (student/teacher conversation memory) #13

@Neal006

Description

@Neal006

Domain

Education / EdTech. A student AI tutor that needs to remember a learner's profile across a long tutoring session:

  • Subject performance (struggling with calculus derivatives, strong at statistics)
  • Learning style (prefers visual examples, needs step-by-step breakdowns)
  • Session goals (preparing for exam on March 15)
  • Past mistakes (confused about chain rule at T=20, corrected at T=35)

Why This Matters for LLM Memory Evaluation

The BENCHMARK_FACTS in simulator/facts.py only covers a personal assistant scenario (name, city, occupation). An EdTech scenario tests:

  • Hierarchical facts: subject → topic → subtopic
  • Evolving understanding: student's mastery level changes over the session
  • Multi-update facts: the same concept can be "not understood" → "partial" → "mastered"

Implementation

# simulator/edtech_facts.py
EDTECH_FACTS = [
    Fact("student_name",    "Priya Nair",              injected_at=0),
    Fact("subject",         "calculus",                injected_at=1),
    Fact("weak_topic",      "chain rule",              injected_at=2,
         updated_at=35, updated_value="integration by parts"),
    Fact("exam_date",       "March 15",                injected_at=3),
    Fact("learning_style",  "visual learner",          injected_at=4),
    Fact("grade_target",    "A",                       injected_at=5),
    Fact("last_score",      "72%",                     injected_at=7,
         updated_at=60, updated_value="84%"),
    Fact("preferred_pace",  "slow with examples",      injected_at=9),
]

Acceptance Criteria

  • simulator/edtech_facts.py with EDTECH_FACTS list
  • python main.py --scenario edtech flag (or just use --facts edtech)
  • Update README.md results table with EdTech scenario numbers
  • CI passes with new scenario

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement to existing featuregood first issuePerfect starting point for new contributors

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions