This repository contains artifacts of AlphaTrans from the paper "Repository-Level Compositional Code Translation and Validation".
We provide a Dockerfile
which installs all necessary dependencies to reproduce the results of AlphaTrans. Please execute the following to create a docker image and execute the container in interactive mode:
docker build --no-cache -t alphatrans .
docker run -it alphatrans bash
Please refer to Reproduce AlphaTrans Results for instructions on how to reproduce the results of AlphaTrans. If you are interested in translating more projects, please refer to Translate New Java Projects.
AlphaTrans currently supports prompting OpenAI models (e.g., GPT-4o-2024-11-20
) and open-source models (e.g., deepseek-ai/deepseek-coder-33b-instruct
) served by vLLM (please see the vLLM Project on how to start an engine). We have created a .env
file to store API keys and model endpoints. If prompting with vLLM, please simply paste in your VLLM_API_KEY
(set to EMPTY
if you don't have a key) and VLLM_API_URL
(e.g., http://0.0.0.0:5000/v1
when the engine IP
is 0.0.0.0
and PORT
is 5000
). If prompting with OpenAI models, you only need to paste in your key in OPENAI_API_KEY
.
vim .env
For all ten projects, we provide the project skeletons and partial translations. Please execute the following to start translating projects (e.g., commons-cli
with deepseek-coder-33b-instruct
model with temperature=0.0
):
bash scripts/translate_fragment.sh commons-cli 0.0 deepseek-coder-33b-instruct
This script will translate the project fragment by fragment in reverse-call graph order and store translations in JSON files along with validation results (e.g., syntactical correctness, GraalVM correctness, test execution correctness, etc.). If you want to create standalone python projects, simply recompose all translations with the following script:
bash scripts/recompose.sh commons-cli 0.0 deepseek-coder-33b-instruct
If you are interested in building on top of AlphaTrans and add more projects, please follow the following steps:
AlphaTrans requires CodeQL for database creation and static analysis. Please install CodeQL CLI and then follow the steps below:
- Clone the vscode-codeql-starter repository into the root directory of this repository. Pull the
ql
submodule of this repository as directed in the README of the repository. - Place the project in
<project_directory>
. The<project_directory>
can bejava_projects/original_projects
. - Create project database with CodeQL. Please see
create_database_java
function insetup.sh
as reference. - Copy all the contents of the
queries
directory into thevscode-codeql-starter/codeql-custom-queries-java
directory.cd
into this directory and executebash simplify_script.sh <project_name> <project_name>
. Again, change the project name based on your new project. - Once all queries are executed, query outputs will be stored under
data/query_outputs_decomposed_tests
.
Execute the following from the root directory of the repository to perform program transformation on the projects.
bash scripts/program_transformation.sh <project_dir> <project_name>
Execute the following for source decomposition from the root directory of the repository.
bash scripts/create_schema.sh
bash scripts/extract_call_graph.sh
Execute the following for test decomposition from the root directory of the repository.
bash scripts/decompose_test.sh
Execute the following from the root directory of the repository to perform type translation on the projects.
bash scripts/extract_types.sh
bash scripts/crawl_type_desc.sh
bash scripts/translate_types.sh <type>
The <type>
can be either simple
or source_description
. The former prompts the model with vanilla prompt, while the latter prompts the model with source PL type description.
Execute the following from the root directory of the repository to generate skeletons of projects and check their syntactical correctness
bash scripts/get_dependencies.sh
bash scripts/create_skeleton.sh
This command should create proper skeletons in target language under data/skeletons/<project_name>
.
Execute the following from the root directory of the project to run the Graal-based semantic check of generated skeletons.
python src/compositional_glue_tests/semantic_check.py --project <project_name> [--class=<class_name>] [--method=<method_name>]
Note
GraalVM must be installed on the system and the default Java path must be set to GraalVM's installation directory.
If a pom.xml
does not already exist for the project, the script will copy the original pom.xml
to the project directory and throw an exception. You are required to manually check that the Java version in the pom.xml
is set to at least 8 and that GraalVM is included in the dependencies. Once this is done, you can run the script again.
Execute the following from the root directory of the repository to perform compositional translation and validation on the projects.
bash scripts/extract_coverage.sh <project_name>
bash scripts/translate_fragment.sh <project_name> <temperature> <model>
We look forward to hearing your feedback. Please open an issue for any questions or comments 🙏.