EffiReasonTrans: RL-Optimized Reasoning for Code Translation

We introduce EfffReasonTrans, a training framework for code translation that aims to enhance translation accuracy while balancing inference latency.

EffiReasonTrans consists of the following three components:

Data synthesis: We construct a reasoning-augmented dataset through two steps—first collecting clean source programs with reliable test cases, and then generating (source code, reasoning, target code) triplets using a reasoning-capable LLM (DeepSeek-R1), filtered by automated syntax checks and functional validation. The final dataset we constructed is data_synthesis/training_data/filtered_training_data.jsonl.
Supervised fine-tuning: Based on the synthesized data, we perform supervised fine-tuning to provide the model with a strong initialization.
Reinforcement learning: To further enhance translation performance, we apply reinforcement learning using the GRPO algorithm, guided by a custom dual-objective reward that balances execution correctness (test case pass rate) and output conciseness (length tolerance).

Source code

Environment

You can create and activate the environment using the provided environment.yml:

conda env create -f environment.yml

Data synthesis

Step 1: Generate Raw Reasoning Outputs

First, use data_synthesis/generate_reasoning_data.py to generate raw outputs containing reasoning-augmented translations.

python data_synthesis/generate_reasoning_data.py \
  --apikey YOUR_API_KEY \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --k 8 \
  --start 1 \
  --model deepseek-reasoner \
  --data_path data_synthesis/raw_data/raw_dataset.jsonl \
  --output_dir outputs

The generated raw data will be saved under: outputs/deepseek-reasoner.

Step 2: Process Generated Code into Executable Scripts

Then, use data_synthesis/process_generated_data.py to extract and convert the generated target code into executable scripts.

python data_synthesis/process_generated_data.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --model deepseek-reasoner \
  --output_dir outputs

The processed executable scripts will be saved under: outputs/deepseek-reasoner

Step 3: Filter and Collect Valid Reasoning Contents

Finally, run data_synthesis/collect_trans_conversation.py to execute the scripts, discard incorrect generations, and collect only valid reasoning pairs.

python data_synthesis/collect_trans_conversation.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --timeout 1 \
  --model deepseek-reasoner \
  --input_dir outputs\
  --output_dir data_synthesis/training_data/raw

Training

Stage 1: Supervied Fine-Tuning In this stage, we perform supervised fine-tuning on a reasoning-augmented dataset to initialize the model.

python training_scripts/sft_dsr1_distill_qw_1.5b.py \
  --lr $LR \
  --sched $SCHEDULER \
  --epochs $EPOCHS \
  --bs $BATCH_SIZE \
  --gs $GRAD_ACC_STEPS \
  --model_path "$MODEL_PATH" \
  --data_path "$DATA_PATH" \
  --output_path "$OUTPUT_PATH"

You can use --help to see the description of each parameter.

Stage 2: Reinforcement Learning In this stage, we fine-tune the model using reinforcement learning to further optimize performance.

python training_scripts/rl_grpo.py path/to/config.yaml

A configuration template can be found at: training_scripts/grpo_config_files/config_template.yaml

Evaluation

After training, we evaluate model performance using both accuracy-based metrics and efficiency-related metrics.

bash evaluation/run_eval.sh
python evaluation/evaluation_token_per_sec.py

Implementation detail

The training parameters are all provided in training_scripts/grpo_config_files (e.g., learning rate 2.0e-6, cosine scheduler with min lr, three epochs, batch size 8, gradient accumulation 4, reward weights: execution = 2.0, length = 0.5). Decoding followed the default model settings (temperature 0.6, top_p 0.95).

The latency protocol is implemented in evaluation/evaluation_token_per_sec.py and in the training scripts under the training_scripts directory.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Figure		Figure
data_synthesis		data_synthesis
evaluation		evaluation
open_r1		open_r1
test_dataset		test_dataset
training_scripts		training_scripts
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EffiReasonTrans: RL-Optimized Reasoning for Code Translation

Source code

Environment

Data synthesis

Training

Evaluation

Implementation detail

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/EffiReasonTrans

Folders and files

Latest commit

History

Repository files navigation

EffiReasonTrans: RL-Optimized Reasoning for Code Translation

Source code

Environment

Data synthesis

Training

Evaluation

Implementation detail

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages