Skip to content

PavelPll/RNA_agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA_agents

LLM Agent–Driven ncRNA Design via Intrinsic Features and Structure-Guided Feedback

Objective

Develop an LLM agent–driven pipeline that generates and analyzes non coding RNA sequences, under guidance from human-designed prompts. Each pipeline automatically integrates RNA structural features, conditional diffusion-based generative model and internet search

Description

  • Tool Inputs for non-coding RNA sequence design and analysis:
    • 3D structural reconstructions
    • Conditional diffusion model outputs
    • Web search data
    • PubMed abstract retrieval, followed by targeted web searches
  • Tool-Calling Agent implementation
  • ReAct Agent implementation
  • Click here for quick presentation of preliminary results

Getting Started

Dependencies

Installing

I adapted the same conda environment for both LLM agents and RiboDiffusion model. However, I installed DRfold2 in a Docker container running Ubuntu 22.04 because ARENA package requires Linux for compilation (see RNA_agents/Dockerfile). The Large Language Model (LLM) requires the key, please get it here. I use an NVIDIA GeForce RTX 4060 with 8 GB VRAM and 32 GB of RAM to run DRfold2 and Ribodiffusion models. A single simulation step takes about 5–10 minutes.

  • Clone the repository::
git clone https://github.com/PavelPll/RNA_agents.git
cd RNA_agents
  • Install DRfold2 inside a Docker container:
cd RNA_agents
git clone https://github.com/leeyang/DRfold2.git drfold2
git clone https://github.com/pylelab/Arena.git drfold2/Arena
cd drfold2
mkdir file_exchange\fasta_input && mkdir file_exchange\pdb_output
docker build -t drfold_image ../
docker run --gpus all -it --name drfold_container -v .:/opt/drfold2 drfold_image bash
Run inside container:
wget --header="User-Agent: Mozilla/5.0" https://zhanglab.comp.nus.edu.sg/DRfold2/res/model_hub.tar.gz
tar -xzvf model_hub.tar.gz
rm -rf model_hub.tar.gz
cd Arena
make Arena
exit
Go back to RNA_RAG folder:
cd ..
  • Install RiboDiffusion:
cd RNA_agents
git clone https://github.com/ml4bio/RiboDiffusion
cd RiboDiffusion
Model checkpoint can be downloaded from here. 
https://drive.google.com/drive/folders/10BNyCNjxGDJ4rEze9yPGPDXa73iu1skx
Another checkpoint trained on the full dataset (with extra 0.1 Gaussian noise for coordinates) can be downloaded from here.
https://drive.google.com/file/d/1-IfWkLa5asu4SeeZAQ09oWm4KlpBMPmq/view
Download and put the checkpoint files in the RiboDiffusion/ckpts folder.
  • Set up a conda environment:
conda create -n rna_agents python=3.11.11
conda activate rna_agents

pip install -q torch==2.5.1 torchvision --index-url https://download.pytorch.org/whl/cu121
pip install requests matplotlib
pip install transformers sentence-transformers langchain langchain-community langchain_anthropic
pip install torch_geometric==2.3.1 torch_scatter==2.1.1 torch_cluster==1.6.1
pip install fair_esm==2.0.0 ml_collections==0.1.1
conda install -c conda-forge dm-tree=0.1.7
pip install biopython==1.80
pip install -U ddgs
pip install wikipedia
pip install easy-entrez
pip install langgraph
  • Install ViennaRNA from here

Executing program

  • Large Language Model (LLM) ReAct Agent–Driven RNA Design via Structural Features and conditional ncRNA Diffusion Model

    notebooks/RNA_reactAgent.ipynb
    
    Possible prompts/queries are summarized in the PDF presentation
  • A speculative attempt to model RNA evolution using an LLM-based tool agent

    notebooks/RNA_toolAgent.ipynb
    
    • Model input: FASTA file with initial RNA sequence (e.g. trnaGlycine_Asgard_group_archaeon.fasta from data/processed/rna_evolution_seed folder;
    • Model output: ancestral_sequence in FASTA and PDB formats with detailed description of each evolution step (evolution_steps.txt) in data/processed/rna_evolution folder.

License

This project is licensed under the [NAME HERE] License - see the LICENSE.md file for details

Note

For more information see short presentation