Skip to content
/ d1 Public

Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"

License

Notifications You must be signed in to change notification settings

helauqwfdd/d1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning 🚀

Welcome to the official implementation of the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning." This repository contains all the necessary code and resources to replicate the findings of our research.

Download Releases

Table of Contents

Introduction

In recent years, large language models have made significant strides in understanding and generating human-like text. However, scaling these models for reasoning tasks presents unique challenges. Our paper introduces a novel approach using reinforcement learning to enhance the reasoning capabilities of diffusion models.

This repository provides the official code implementation to facilitate further research and experimentation in this area.

Getting Started

To get started, you can download the necessary files from our Releases section. Follow the instructions below to set up the environment and run the code.

Installation

  1. Clone the repository:

    git clone https://github.com/helauqwfdd/d1.git
    cd d1
  2. Install dependencies:

    We recommend using pip for managing your Python packages. You can install the required packages with:

    pip install -r requirements.txt
  3. Download the model files:

    Make sure to download the model files from the Releases section and place them in the appropriate directory as specified in the documentation.

Usage

To use the implementation, you can run the following command:

python main.py --config config.yaml

Configuration

The configuration file config.yaml allows you to customize various parameters such as:

  • Model architecture: Choose between different diffusion models.
  • Training settings: Adjust batch size, learning rate, and epochs.
  • Reinforcement learning settings: Configure the reward structure and training strategy.

Example

Here’s a simple example of how to set up the model for training:

model:
  type: diffusion
  layers: 12
  hidden_size: 768

training:
  batch_size: 32
  learning_rate: 0.001
  epochs: 10

Results

We conducted extensive experiments to evaluate the performance of our model. The results demonstrate significant improvements in reasoning tasks compared to baseline models.

Evaluation Metrics

  • Accuracy: Measures the correctness of predictions.
  • F1 Score: Evaluates the balance between precision and recall.
  • Inference Time: Time taken to generate predictions.

The full results can be found in the paper, along with detailed analysis and comparisons.

Contributing

We welcome contributions from the community! If you would like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push to your fork and create a pull request.

Please ensure your code adheres to our coding standards and includes appropriate tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any inquiries or feedback, please reach out to us via the GitHub Issues page or contact the authors directly.

Thank you for your interest in our work! We hope you find this implementation useful. For more information, visit our Releases section to download the necessary files and get started.

About

Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages