Welcome to the official implementation of the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning." This repository contains all the necessary code and resources to replicate the findings of our research.
In recent years, large language models have made significant strides in understanding and generating human-like text. However, scaling these models for reasoning tasks presents unique challenges. Our paper introduces a novel approach using reinforcement learning to enhance the reasoning capabilities of diffusion models.
This repository provides the official code implementation to facilitate further research and experimentation in this area.
To get started, you can download the necessary files from our Releases section. Follow the instructions below to set up the environment and run the code.
-
Clone the repository:
git clone https://github.com/helauqwfdd/d1.git cd d1 -
Install dependencies:
We recommend using
pipfor managing your Python packages. You can install the required packages with:pip install -r requirements.txt
-
Download the model files:
Make sure to download the model files from the Releases section and place them in the appropriate directory as specified in the documentation.
To use the implementation, you can run the following command:
python main.py --config config.yamlThe configuration file config.yaml allows you to customize various parameters such as:
- Model architecture: Choose between different diffusion models.
- Training settings: Adjust batch size, learning rate, and epochs.
- Reinforcement learning settings: Configure the reward structure and training strategy.
Here’s a simple example of how to set up the model for training:
model:
type: diffusion
layers: 12
hidden_size: 768
training:
batch_size: 32
learning_rate: 0.001
epochs: 10We conducted extensive experiments to evaluate the performance of our model. The results demonstrate significant improvements in reasoning tasks compared to baseline models.
- Accuracy: Measures the correctness of predictions.
- F1 Score: Evaluates the balance between precision and recall.
- Inference Time: Time taken to generate predictions.
The full results can be found in the paper, along with detailed analysis and comparisons.
We welcome contributions from the community! If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them.
- Push to your fork and create a pull request.
Please ensure your code adheres to our coding standards and includes appropriate tests.
This project is licensed under the MIT License. See the LICENSE file for details.
For any inquiries or feedback, please reach out to us via the GitHub Issues page or contact the authors directly.
Thank you for your interest in our work! We hope you find this implementation useful. For more information, visit our Releases section to download the necessary files and get started.