LLM Agent–Driven ncRNA Design via Intrinsic Features and Structure-Guided Feedback
Develop an LLM agent–driven pipeline that generates and analyzes non coding RNA sequences, under guidance from human-designed prompts. Each pipeline automatically integrates RNA structural features, conditional diffusion-based generative model and internet search
- Tool Inputs for non-coding RNA sequence design and analysis:
- 3D structural reconstructions
- Conditional diffusion model outputs
- Web search data
- PubMed abstract retrieval, followed by targeted web searches
- Tool-Calling Agent implementation
- ReAct Agent implementation
- Click here for quick presentation of preliminary results
- Large Language Models (Claude Sonnet 4 (20250514) Anthropic.
- LangGraph (2024): Low-level orchestration framework for building, managing, and deploying long-running, stateful agents
- RiboDiffusion model (paper, GitHub)
- DRfold2 model (paper, GitHub)
- DSSR computational tool paper
- RNA-FM: Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions paper
- RNAcentral DATABASE of non-coding RNA (ncRNA) sequences
- NIST Chemistry WebBook to get some physicochemical properties
- ViennaRNA predicting and comparing RNA secondary structures
- IUPAC code for nucleotides and amino acids
- Docker
- Ubuntu 22.04, Windows 11
- Visual Studio Code
- Torch
I adapted the same conda environment for both LLM agents and RiboDiffusion model. However, I installed DRfold2 in a Docker container running Ubuntu 22.04 because ARENA package requires Linux for compilation (see RNA_agents/Dockerfile). The Large Language Model (LLM) requires the key, please get it here. I use an NVIDIA GeForce RTX 4060 with 8 GB VRAM and 32 GB of RAM to run DRfold2 and Ribodiffusion models. A single simulation step takes about 5–10 minutes.
- Clone the repository::
git clone https://github.com/PavelPll/RNA_agents.git
cd RNA_agents
- Install DRfold2 inside a Docker container:
cd RNA_agents
git clone https://github.com/leeyang/DRfold2.git drfold2
git clone https://github.com/pylelab/Arena.git drfold2/Arena
cd drfold2
mkdir file_exchange\fasta_input && mkdir file_exchange\pdb_output
docker build -t drfold_image ../
docker run --gpus all -it --name drfold_container -v .:/opt/drfold2 drfold_image bash
Run inside container:
wget --header="User-Agent: Mozilla/5.0" https://zhanglab.comp.nus.edu.sg/DRfold2/res/model_hub.tar.gz
tar -xzvf model_hub.tar.gz
rm -rf model_hub.tar.gz
cd Arena
make Arena
exit
Go back to RNA_RAG folder:
cd ..
- Install RiboDiffusion:
cd RNA_agents
git clone https://github.com/ml4bio/RiboDiffusion
cd RiboDiffusion
Model checkpoint can be downloaded from here.
https://drive.google.com/drive/folders/10BNyCNjxGDJ4rEze9yPGPDXa73iu1skx
Another checkpoint trained on the full dataset (with extra 0.1 Gaussian noise for coordinates) can be downloaded from here.
https://drive.google.com/file/d/1-IfWkLa5asu4SeeZAQ09oWm4KlpBMPmq/view
Download and put the checkpoint files in the RiboDiffusion/ckpts folder.
- Set up a conda environment:
conda create -n rna_agents python=3.11.11
conda activate rna_agents
pip install -q torch==2.5.1 torchvision --index-url https://download.pytorch.org/whl/cu121
pip install requests matplotlib
pip install transformers sentence-transformers langchain langchain-community langchain_anthropic
pip install torch_geometric==2.3.1 torch_scatter==2.1.1 torch_cluster==1.6.1
pip install fair_esm==2.0.0 ml_collections==0.1.1
conda install -c conda-forge dm-tree=0.1.7
pip install biopython==1.80
pip install -U ddgs
pip install wikipedia
pip install easy-entrez
pip install langgraph
- Install ViennaRNA from here
-
Large Language Model (LLM) ReAct Agent–Driven RNA Design via Structural Features and conditional ncRNA Diffusion Model
Possible prompts/queries are summarized in the PDF presentationnotebooks/RNA_reactAgent.ipynb
-
notebooks/RNA_toolAgent.ipynb
- Model input: FASTA file with initial RNA sequence (e.g. trnaGlycine_Asgard_group_archaeon.fasta from data/processed/rna_evolution_seed folder;
- Model output: ancestral_sequence in FASTA and PDB formats with detailed description of each evolution step (evolution_steps.txt) in data/processed/rna_evolution folder.
This project is licensed under the [NAME HERE] License - see the LICENSE.md file for details
Note
For more information see short presentation