Introduction

This repository contains the code for the paper "Can Large Language Models Replace Data Scientists in Biomedical Research?".

Prepare the environment

Configure the pipenv virtual environment, by taking pipenv shell to get into the virtual environment.
Prepare the API key to access different LLMs, and put them in the base director of the code repository.

OpenAI: openai.key
Azure OpenAI: azure_openai_credentials.json
AWS Bedrock for Claude models: aws_credentials.json
Google VertexAI for Gemini: vertexai.json

The example credential files can be found in example_credentials/.

Prepare the coding tasks

We have preprocessed python and R coding tasks in benchmark_datasets/python/coding_tasks.csv and in benchmark_datasets/R/coding_tasks.csv, respectively. Each row has a coding question, reference answers, testing cases, and the string dataset schema description.
If you need to process for other coding questions, check benchmark_datasets/preprocess_python_tasks.py and benchmark_datasets/preprocess_R_tasks.py for examples.
If you need to execute the generated Python and R code on the patient data. Go sandbox/docker_container, and execute build_sandbox.sh to create the docker container. Also, you need to download the raw patient-level data for Python and R tasks. Put them under benchmark_datasets/python and benchmark_datasets/R, respectively.

Run code generation

scripts/run_code_generation_python.py, scripts/run_code_generation_R.py for python code and R code generation, respectively, the following adaptations are implemented:

vanilla prompt
manual prompt
chain of thought prompt
autoprompt (dependent on dspy)
fewshot prompt (dependent on dspy)

Run code improvement

scripts/run_code_improvement_python.py has the implementation to request LLM to self-reflect and improve the code.

Execute the generated code

The generated code can be executed and see the execution results if

the docker sandbox has been set up
the raw patient-level data has been prepared

See scripts/run_code_execution.py for the implementations.

Reference

@misc{wang2024largelanguagemodelsreplace,
      title={Can Large Language Models Replace Data Scientists in Biomedical Research?}, 
      author={Zifeng Wang and Benjamin Danek and Ziwei Yang and Zheng Chen and Jimeng Sun},
      year={2024},
      eprint={2410.21591},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.21591}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmark_datasets		benchmark_datasets
benchmark_evaluation		benchmark_evaluation
dspy_config		dspy_config
example_credentials		example_credentials
sandbox		sandbox
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Prepare the environment

Prepare the coding tasks

Run code generation

Run code improvement

Execute the generated code

Reference

About

Releases

Packages

Languages

License

RyanWangZf/BioDSBench

Folders and files

Latest commit

History

Repository files navigation

Introduction

Prepare the environment

Prepare the coding tasks

Run code generation

Run code improvement

Execute the generated code

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages