aiXcoder FIM Code Generation LLM Evaluation

This is a tool for evaluating FIM code generation for generating and evaluating tasks on datasets in four languages: java/python/cpp/jsvascript.

Introduction to datasets and evaluation metrics

For Java, Python, CPP, and JSVASCRIPT languages, to provide the upper and lower information of the code block, you need to predict the middle filling. Evaluation metrics include:Exact Match、BLEU-4、CODE-BLEU、Length(Pred/Ref)

Exact Match
BLEU-4
CODE-BLEU
Length(Pred/Ref)

Environment Requirements

To run the model inference and evaluatiaon code, you'll need the following environment setup:

Python 3.8 or higher
PyTorch 2.1.0 or higher
sentencepiece 0.2.0 or higher
transformers 4.34.1 or higher (if run inference by transformers library)

Please ensure all dependencies are installed using the following command:

conda create -n aixcoder-evaluation python=3.11
conda activate aixcoder-evaluation
pip install -r requirements.txt

requirements.txt listed all necessary libraries and their versions.

To achieve faster inference speeds, especially for large models, we recommend installing flash attention. Flash attention is an optimized attention mechanism that significantly reduces computation time for transformer-based models without sacrificing accuracy.

Before proceeding, ensure your environment meets the CUDA requirements as flash attention leverages GPU acceleration. Follow these steps to install flash attention:

git clone git@github.com:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install

Usage

Datasets preparation

cd datasets
tar zxvf *.tar.gz

Generation

Here's an example of a generate task. python run_inference.py --model aiXcoder/aixcoder-7b-base --language java

--modelmodel name on huggingface,

Currently, FIM generation can be performed for four models on the huggingface

- deepseek-ai/deepseek-coder-6.7b-base
- aiXcoder/aixcoder-7b-base
- codellama/CodeLlama-7b-hf
- bigcode/starcoder2-7b
You can also set the model weight file that has been downloaded locally

--languageDataset language
- Support Python Java Cplus JavaScript four languages, you can set a language separately, you can also set multiple languages at the same time, and multiple languages are separated by spaces
--output_dirThe output path of the generated result is saved in the output_dir folder in the current directory by default
--deviceSet the cuda used, default cuda
--torch_dtypeSet the precision, default bf16, can be set to:"fp32", "fp16", "bf16"
--attn_implementationThe setting uses FlashAttention, default True, if you don't support FlashAttention, set this to False
--gen_lenSet max generate length, default 512
--max_lenSet max_new_tokens, default 16384

Evaluation

Here's an example of a evaluate task. python run_evaluate.py

--languageThe language to be evaluated
- Support Python Java Cplus JavaScript four languages, you can set a language separately, you can also set multiple languages at the same time, and multiple languages are separated by spaces
--result_pathBy default, the output path of the evaluation results is stored in the output_dir folder in the current directory Two files are generated with the suffix _scored.jsonl and _statistics.txt The results of each assessment for each Task Type and the average of the total results are recorded in the _statistics.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
codebleu		codebleu
datasets		datasets
README.md		README.md
README_CN.md		README_CN.md
requirements.txt		requirements.txt
run_evaluate.py		run_evaluate.py
run_inference.py		run_inference.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

aiXcoder FIM Code Generation LLM Evaluation

Introduction to datasets and evaluation metrics

Environment Requirements

Usage

Datasets preparation

Generation

Currently, FIM generation can be performed for four models on the huggingface

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

aixcoder-plugin/aiXcoder-FIM-Evaluation

Folders and files

Latest commit

History

Repository files navigation

aiXcoder FIM Code Generation LLM Evaluation

Introduction to datasets and evaluation metrics

Environment Requirements

Usage

Datasets preparation

Generation

Currently, FIM generation can be performed for four models on the huggingface

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages