We introduce EfffReasonTrans, a training framework for code translation that aims to enhance translation accuracy while balancing inference latency.
EffiReasonTrans consists of the following three components:
- 
Data synthesis: We construct a reasoning-augmented dataset through two steps—first collecting clean source programs with reliable test cases, and then generating (source code, reasoning, target code) triplets using a reasoning-capable LLM (DeepSeek-R1), filtered by automated syntax checks and functional validation. The final dataset we constructed is data_synthesis/training_data/filtered_training_data.jsonl.
- 
Supervised fine-tuning: Based on the synthesized data, we perform supervised fine-tuning to provide the model with a strong initialization. 
- 
Reinforcement learning: To further enhance translation performance, we apply reinforcement learning using the GRPO algorithm, guided by a custom dual-objective reward that balances execution correctness (test case pass rate) and output conciseness (length tolerance). 
You can create and activate the environment using the provided environment.yml:
conda env create -f environment.ymlStep 1: Generate Raw Reasoning Outputs
First, use data_synthesis/generate_reasoning_data.py to generate raw outputs containing reasoning-augmented translations.
python data_synthesis/generate_reasoning_data.py \
  --apikey YOUR_API_KEY \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --k 8 \
  --start 1 \
  --model deepseek-reasoner \
  --data_path data_synthesis/raw_data/raw_dataset.jsonl \
  --output_dir outputsThe generated raw data will be saved under: outputs/deepseek-reasoner.
Step 2: Process Generated Code into Executable Scripts
Then, use data_synthesis/process_generated_data.py to extract and convert the generated target code into executable scripts.
python data_synthesis/process_generated_data.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --model deepseek-reasoner \
  --output_dir outputsThe processed executable scripts will be saved  under: outputs/deepseek-reasoner
Step 3: Filter and Collect Valid Reasoning Contents
Finally, run data_synthesis/collect_trans_conversation.py to execute the scripts, discard incorrect generations, and collect only valid reasoning pairs.
python data_synthesis/collect_trans_conversation.py \
  --src_lang SRC_LANG \
  --tgt_lang TGT_LANG \
  --timeout 1 \
  --model deepseek-reasoner \
  --input_dir outputs\
  --output_dir data_synthesis/training_data/raw
Stage 1: Supervied Fine-Tuning In this stage, we perform supervised fine-tuning on a reasoning-augmented dataset to initialize the model.
python training_scripts/sft_dsr1_distill_qw_1.5b.py \
  --lr $LR \
  --sched $SCHEDULER \
  --epochs $EPOCHS \
  --bs $BATCH_SIZE \
  --gs $GRAD_ACC_STEPS \
  --model_path "$MODEL_PATH" \
  --data_path "$DATA_PATH" \
  --output_path "$OUTPUT_PATH"
You can use --help to see the description of each parameter.
Stage 2: Reinforcement Learning In this stage, we fine-tune the model using reinforcement learning to further optimize performance.
python training_scripts/rl_grpo.py path/to/config.yaml A configuration template can be found at: training_scripts/grpo_config_files/config_template.yaml
After training, we evaluate model performance using both accuracy-based metrics and efficiency-related metrics.
bash evaluation/run_eval.sh
python evaluation/evaluation_token_per_sec.pyThe training parameters are all provided in training_scripts/grpo_config_files (e.g., learning rate 2.0e-6, cosine scheduler with min lr, three epochs, batch size 8, gradient accumulation 4, reward weights: execution = 2.0, length = 0.5). Decoding followed the default model settings (temperature 0.6, top_p 0.95).
- The latency protocol is implemented in evaluation/evaluation_token_per_sec.py and in the training scripts under the training_scriptsdirectory.
