This repository hosts the dataset accompanying the paper "On the Use of LLMs to Generate a Dataset of Neural Networks".
The dataset contains PyTorch-based neural network code automatically generated using GPT-5 according to specific requirements.
The dataset was generated to support research on neural network code verification, refactoring, and migration, with a focus on improving the reliability and adaptability of network implementations.
Each network is generated based on a set of requirements describing:
- Architecture
- Task
- Input type and scale
- Complexity level
All networks are implemented in PyTorch.
You can clone and explore the dataset locally:
git clone https://github.com/BESSER-PEARL/LLM-Generated-NN-Dataset.git
cd LLM-Generated-NN-DatasetBefore generating data, install the required dependencies using the provided requirements.txt file:
pip install -r requirements.txtThen create an .env file in the project’s root directory containing your OpenAI API key:
OPENAI_API_KEY=your_api_key_hereThis key is used by the generation script to access the GPT-5 API.
To generate a neural network implementation using GPT-5, run:
python generate_nn.pyThe generated NN architecture is stored in the dataset_nns/ directory.
Each .py file in dataset_nns/ starts with the prompt that was used for its generation, followed by the produced PyTorch implementation.
The verify_nn.py script validates the generated networks against their specification.
It checks compliance with specified architecture, task, input type and scale, and complexity requirements to ensure consistency and correctness.
To run validation:
python verify_nn.pyThe repository includes the analysis_depth.py script, which reproduces the plot of NN depth as presented in the paper.
Run it with:
python analysis_depth.pyFour NNs have been trained and evaluated on benchmark datasets.
The scripts are available in train_test_benchmark_nns/ repository.
To train and evaluate the NN used with tabular California Housing benchmark dataset, run:
python train_test_benchmark_nns/tabular_nn_selected.py