Large Language Models for Mathematical Analysis

Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on computational aspects of mathematics, there remains a significant gap in addressing proof-based tasks—particularly those found in mathematical analysis. Rigorous reasoning, precise definitions, and formal proofs are essential components of real analysis, yet these aspects have been largely overlooked in existing datasets and methodologies.

To bridge this gap, we introduce the DEMI-MathAnalysis dataset, a collection of proof-oriented problems drawn from topics such as Sequences and Limits, Infinite Series, and Convex Functions. Accompanying this dataset is a guiding framework designed to help LLMs navigate the complexities of formal reasoning, ensuring that they not only produce answers but also justify their steps in a logically consistent and mathematically sound manner.

By fine-tuning LLMs on DEMI-MathAnalysis and employing our framework, we have observed significant improvements in their capacity to generate logical, complete, and elegant proofs. This work not only highlights critical areas of mathematical reasoning previously underserved by AI research but also contributes to the development of more trustworthy AI models capable of understanding and producing formalized mathematical language.

Overview

The DEMI-MathAnalysis project provides:

A Dataset: A curated collection of proof-based real analysis problems sourced from authoritative texts.
A Framework: A pipeline to guide LLMs in understanding problems, retrieving relevant theorems and definitions, and generating logically sound proofs.

Motivation

Most mathematical datasets for LLMs focus on computational questions, leaving a significant gap in training for formal reasoning tasks. DEMI-MathAnalysis addresses this gap by supplying proof-oriented challenges. By doing so, we help models develop the precise and structured thinking required in advanced mathematics.

Dataset

DEMI-MathAnalysis covers topics often studied in undergraduate real analysis courses:

Sequences and Limits
Infinite Series
Continuous Functions
Differentiation
Integration and Improper Integrals
Series of Functions
Approximation by Polynomials
Convex Functions

The dataset is split into:

Training Set: Used for fine-tuning models.
Benchmark Set: Reserved for unbiased evaluation.

Each problem includes a detailed statement and metadata. Data files are available here.

Framework and Model

A special framework is built to guide LLMs through a structured approach to solving proof-based mathematical problems. It consists of the following key steps:

Problem Identification:
The LLM classifies each problem into a specific category (e.g., Sequences and Limits, Infinite Series, Convex Functions). This step ensures that the reasoning strategies and retrieved knowledge are tailored to the nature of the problem.
Prompt Construction:
Once categorized, the system constructs a comprehensive prompt. This includes the original problem statement, the identified category, and relevant domain knowledge from a curated knowledge base (KB). These structured prompts provide the model with the precise context and definitions needed for logical reasoning.
Knowledge Base Integration:
The KB contains essential definitions, theorems, and lemmas from real analysis. By dynamically retrieving the most relevant pieces of information, the framework equips the LLM with a foundation of reliable mathematical principles, ensuring that solutions rest on correct formal underpinnings.
Solution Generation:
The fine-tuned LLM, now guided by explicit reasoning paths and precise knowledge, produces a step-by-step proof. This involves leveraging methods like ε–δ arguments, limit definitions, and convergence criteria. The aim is to ensure that the solution is not only correct but also rigorously and formally justified.

To Run the Models

Install the required packages：

pip install -r requirements.txt

Run the code in RealAnalysis_Final_Code.ipynb

If you want to change the model, change the model_name below:

if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "MathAnalysis_Qwen_Classifier", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 10240,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "MathAnalysis_Qwen_ProblemSolver", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 10240,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

"MathAnalysis_Qwen_Classifier" and "MathAnalysis_Qwen_Classifier": fine-tuned Qwen2.5-Math-7B-bnb-4bit
"MathAnalysis_Llama_Classifier" and "MathAnalysis_Llama_Classifier": fine-tuned Llama-3.2-3B-Instruct

If you want to choose another LLM and fine-tune it, run the code in pretraining.ipynb.

Reference

@software{unsloth,
  author = {Daniel Han, Michael Han and Unsloth team},
  title = {Unsloth},
  url = {http://github.com/unslothai/unsloth},
  year = {2023}
}

@book{demidovich1964problems,
    title={Problems in Mathematical Analysis. Edited by B. Demidovich. Translated From the Russian by G. Yankovsky},
    author={Demidovich, B.P.},
    series={Russian Monographs and Texts on Advanced Mathematics and Physics},
    url={https://books.google.com/books?id=XdmpwgEACAAJ},
    year={1964},
    publisher = {Mir Publishers}  
}

@book{hata2007problems,
  title={Problems and Solutions in Real Analysis},
  author={Hata, M.},
  isbn={9789812776013},
  lccn={2008295629},
  series={Series on number theory and its applications},
  url={https://books.google.com/books?id=vSxkRgQe0AcC},
  year={2007},
  publisher={World Scientific}
}

@misc{dubey2024llama3herdmodels,
      title={The Llama 3 Herd of Models}, 
      author={Meta},
      year={2024},
      eprint={2407.21783},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.21783}, 
}

@misc{yang2024qwen2technicalreport,
      title={Qwen2 Technical Report}, 
      author={Alibaba},
      year={2024},
      eprint={2407.10671},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.10671}, 
}

@techreport{GPT4o,
    author = {OpenAI},
    title = {GPT-4o System Card},
    year = {2024}, 
    url = {https://cdn.openai.com/gpt-4o-system-card.pdf}
}

@techreport{OpenAIo1,
    author = {OpenAI},
    title = {OpenAI o1 System Card},
    year = {2024}, 
    url = {https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
MathAnalysis_Llama_Classifier		MathAnalysis_Llama_Classifier
MathAnalysis_Llama_ProblemSolver		MathAnalysis_Llama_ProblemSolver
MathAnalysis_Qwen_Classifier		MathAnalysis_Qwen_Classifier
MathAnalysis_Qwen_ProblemSolver		MathAnalysis_Qwen_ProblemSolver
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
RealAnalysis_Final_Code.ipynb		RealAnalysis_Final_Code.ipynb
benchmark_data.csv		benchmark_data.csv
pretraining.ipynb		pretraining.ipynb
pretraining_data.csv		pretraining_data.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Large Language Models for Mathematical Analysis

Table of Contents

Overview

Motivation

Dataset

Framework and Model

To Run the Models

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ziye2chen/LLMs-for-Mathematical-Analysis

Folders and files

Latest commit

History

Repository files navigation

Large Language Models for Mathematical Analysis

Table of Contents

Overview

Motivation

Dataset

Framework and Model

To Run the Models

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages