This project generates code samples based on provided function signatures, descriptions, and input code using a language model. It supports both Python and Java prompts and leverages the transformers
library for generating code samples.
.
├── .gitignore
├── main.py
├── module.py
├── prompt_gen/
│ ├── CoderEval-Input4Models/
│ │ ├── CEJavaHumanLabel.jsonl
│ │ ├── CEJavaRaw.jsonl
│ │ ├── CEPythonHumanLabel.jsonl
│ │ ├── CEPythonRaw.jsonl
│ ├── load_python_prompts.py
│ ├── prompts/
│ │ ├── java_generated_prompts.jsonl
│ │ ├── python_generated_prompts.jsonl
├── README.md
├── requirements.txt
├── results/
└── test.py
main.py
: The main script to load prompts, set up the model, and generate code samples.module.py
: Contains utility functions for loading prompts and extracting Python code.prompt_gen/
: Directory containing input JSONL files and scripts for generating prompts.CoderEval-Input4Models/
: Contains raw and human-labeled JSONL files for Python and Java.load_python_prompts.py
: Script to generate prompts from raw JSONL files.prompts/
: Directory containing pre-generated prompts for Python and Java.
requirements.txt
: Lists the dependencies required for the project.results/
: Directory to store the generated results.test.py
: Script for testing purposes.
-
Clone the repository:
git clone <repository-url> cd CoderEval-Prompt-Inference
-
Install the required dependencies:
pip install -r requirements.txt
-
Ensure you have the necessary model files cached or downloaded.
-
Generate prompts from raw JSONL files:
python prompt_gen/load_python_prompts.py
-
Run the main script to generate code samples:
python main.py
-
The generated code samples will be saved in the
generated_outputs
directory.
-
load_prompts() -> tuple
Loads and returns the Python and Java prompts from pre-generated JSONL files. -
extract_python_code(code_string: str) -> str
Extracts Python code from the given content, removing all comments and docstrings.
generate_prompts(file_path, language)
Generates prompts from a JSONL file and stores them in a dictionary.
The models and tokenizers are loaded and cached in the ./models
directory.
The project uses the logging
module to log information and errors during execution. Logs are printed to the console.