Skip to content

UCSC-VLAA/ReasoningEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge 📖 or Reasoning 🤔? A Close Look at How LLMs Think Across Domains

Juncheng Wu*, Sheng Liu*, Haoqin Tu*, Hang yu*, Xiaoke Huang, James Zou, Cihang Xie, Yuyin Zhou

⚡️ Introduction

eval_pipeline

In this work, we propose a fine-grained evaluation framework include two novel step-by-step metrics for LLMs reasoning:

  1. Knowledge Index (KI): the correctness of the knowledge used
  2. Information Gain (Info Gain): the quality of the reasoning.

🚀 Quick Start

Example Code:

# reasoning decomposition
# reasoning_steps = llms_output_reasoning
decomposed_steps = utils.llm_decompose_reasoning(reasoning_steps)

# init the metrics
retrieval_scorer = RetrievalScorer()
information_gain_scorer = InformationGainScorer(model_name='Qwen/Qwen2.5-7B')

# metrics calculation
KI = retrieval_scorer.forward(decomposed_steps)
InfoGain = information_gain_scorer.forward(decomposed_steps)

🙏🏼 Acknowledgement

This work was partially funded by an unrestricted gift from Google. We thank the Microsoft Accelerate Foundation Models Research Program for supporting our computing needs.

We gratefully thank the MedRAG for the knowledge retrieval model and toolkit!

📖 Citation

@misc{wu2025knowledgereasoningcloselook,
      title={Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains}, 
      author={Juncheng Wu and Sheng Liu and Haoqin Tu and Hang Yu and Xiaoke Huang and James Zou and Cihang Xie and Yuyin Zhou},
      year={2025},
      eprint={2506.02126},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.02126}, 
}

About

Official repo of Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages