This repository contains the code for our "Mining Misconception in Mathematics" project. Detailed project report is included in [ML]7. Mining Misconception in Mathematics.pdf, written in NeurIPS format.
Multiple-choice questions are widely used to evaluate student knowledge. Well-designed questions use distractors that are associated with common misconceptions. Large language models (LLMs) have performed well in math reasoning benchmarks, but they struggle with understanding misconceptions. In this work, we propose a method to determine the misconception that leads to an incorrect answer. We use an LLM of the Qwen 2.5 family to hypothesize a potential misconception, which is used to assist in the retrieval of the related misconceptions from a list of 2,586 categories. The retrieval process leverages embeddings generated by a fine-tuned Mistral-based LLM trained with a synthetic dataset. The relevant misconceptions are then analyzed by Qwen 2.5, which uses a logits processor to determine the most likely misconception. We evaluate this method using mean average precision on a Kaggle dataset of 1,868 math-related multiple-choice questions, achieving a maximum score of 0.4706. Our results demonstrate the potential of LLMs for assessing incorrect answers and identifying misconceptions in math education.
Before running the code, there are some models that are required to be downloaded
There are also some required datasets:
- Mining-Misconception-Dataset. There will be 4 datasets to be downloaded:
mining_misconception_mapping.csv,sample_submission.csv,test.csv,train.csv. - MATH-Dataset All of the notebooks are guaranteed to be executed under 1 H100 GPU.
This project is divided into three main phases: Dataset Generation, Finetuning, and Inference Pipeline. Below are the descriptions and instructions for each Jupyter notebook used. Note that the path of the files to be read in these notebooks might be correspond to our local address. In case of running the notebook locally, user might need to change these address to reflect the file location on the device.
The notebook data_generation.ipynb is going to create synthetic dataset in Mining-Misconception-Dataset format. In this notebook, we reformatted MATH-Dataset to follow Mining-Misconception-Dataset's format. The output of our synthetic dataset is pasted in this project as eedi_synthetic.csv.
In this project, we opted to finetune our embedding model SFR-Embedding-Mistral_2R with LoRA to query related misconception. The notebook train_and_infer_hardNegatives.ipynb is the script to our model finetuning. Note that for the input of the finetuning, Initial Reasoning of training data is needed, which the script is contained in the Inference phase, specifically in LLM Reasoning section, see the corresponding section for more detail. Our best LoRA model is included in this project as SFR-Embedding-2_R_ZeroShot_CleanLatex_UsingNEWLINES_Quantization_HardNegatives_12batch_8accumulation_20negatives_moreSteps\
The notebook train_and_infer_hardNegatives.ipynb is for the final inference of the whole misconception classification process. First, we use LLM to generate initial reasoning for HyDE. The output of this part is also essential for the model finetuning input. Next, we run the finetuned embedding model to query 25 most similar misconceptions. Finally, we implemented a reranking algorithm for the LLM to judge top k most accurate misconception.
Detailed project report is included in [ML]7. Mining Misconception in Mathematics.pdf, written in NeurIPS format.